![Page 1: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/1.jpg)
Neural Machine Translation
Techniques Used By GoogleSophia Mitchellette
![Page 2: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/2.jpg)
Outline
● Introduction
● Neural Networks
● Residual Connections
● Attention Network
● Summary
![Page 3: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/3.jpg)
Outline
● Introduction
○ What Is Machine Translation?
○ Human Translation vs. Machine Translation
○ How Machine Translation Systems See Words
● Neural Networks
● Residual Connections
● Attention Network
● Summary
![Page 4: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/4.jpg)
Introduction - What Is Machine Translation?
Machine Translation - translation from one language to another performed by
a machine instead of a human
Google Translate
Google’s Machine Translation Service
Behind the scenes: Google’s Neural Machine Translation System (GMNT)
Over 500 million people use every day
Hello! Bonjour!
![Page 5: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/5.jpg)
Introduction - Human Translation
● Humans translate by finding words, sentences and phrases that have the
same meaning in both languages.
squirrel
écureuil
grenouille
chien
Same
meaning
Not the
same
meaning
Not the
same
meaning
http://www.clipartpanda.com/clipart_images/squirrel-clip-art-hd-5521910 http://clipartix.com/frog-clip-art-image-10263/
http://cliparting.com/free-dog-clipart-617/
![Page 6: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/6.jpg)
Introduction - Machine Translation
● Look at statistical probabilities for the best translation
● Words and sentences are not a communication of meaning
“squirrel”
“écureuil”
“grenouille”
“chien”
96%
3%
1%
![Page 7: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/7.jpg)
Introduction - How Machine Translation Systems See Words
● Words and sentences are vectors.
● Word segmentation - each word is represented as one vector
The squirrels eat the
acorns .
x31
x32
x33
…
x3n
x3
x11
x12
x13
…
x1n
x1
x21
x22
x23
…
x2n
x2
x41
x42
x43
…
x4n
x4
x51
x52
x53
…
x5n
x5
x61
x62
x63
…
x6n
x6
![Page 8: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/8.jpg)
Introduction - How Machine Translation Systems See Words
● The points on the coordinate plane represent two-dimensional word vectors.
0.8
-0.3
![Page 9: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/9.jpg)
Outline
● Introduction
● Neural Networks
○ An Overview
○ The Structure of a Node
○ A Neural Network
○ Activation Functions
○ Training a Neural Network
○ Training Error vs. Testing Error
● Residual Connections
● Attention Network
● Summary
![Page 10: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/10.jpg)
Neural Networks - An Overview
● Nodes - the building blocks of
neural networks.
● Nodes map inputs to outputs.
● mapping = function
Pictured right: A neural network.
Each gray circle is a node. Gray
squares are inputs.
Hidden layersInput layer
Output
layer
![Page 11: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/11.jpg)
Neural Networks - The Structure of a Node
Steps of a node:
1. Inputs are multiplied by their weights.
2. Weighted inputs are summed.
3. Summation is run through activation function.
4. Result of activation function is the output.
x1
xn
x2
Inputs
f y
Activation
Function Output
...y = f(x1*w1 + x2*w2 + … + xn*wn)
w1
w2
wn
![Page 12: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/12.jpg)
Neural Networks - Activation Functions
● Need non-linear activation functions for more complex data patterns.
● Hyperbolic tangent (tanh)
and sigmoid (σ) scale
values.
![Page 13: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/13.jpg)
Neural Networks - A Neural Network
● Our neural network will map from the English words, to their French counterparts.
![Page 14: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/14.jpg)
Neural Networks - A Neural Network
● Network has two nodes
● Takes in English word
● Outputs French word
y1
y2
tanh(x1*w1)
tanh(x2*w2)
x2
y-axisy2
x1
x-axisy1
tanh
tanh
w1
w2
English word vector French word vector
![Page 15: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/15.jpg)
Neural Networks - A Neural Network
y1
y2
tanh(x1*w1)
tanh(x2*w2)
x2
y-axisy2
x1
x-axisy1
tanh
tanh
w1?
w2?
● Network has two nodes
● Takes in English word
● Outputs French word
English word vector French word vector
![Page 16: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/16.jpg)
Neural Networks - Training a Neural Network
● Neural network is given examples of what the output should be for given inputs.
English Word English Word Vector French Word Vector French Word
squirrel0.2
0.2
0.3
-0.4écureuils
acorns0.6
-0.4
0.9
0.8glands
eat-0.2
0.4
-0.3
-0.8mangent
the-0.4
-0.4
-0.6
0.8les
Inputs Desired Outputs
![Page 17: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/17.jpg)
Neural Networks - Training a Neural Network
● Goal: inputs map to desired vectors
● Network starts with initial weights
● Error evaluated. New weights tried.
● Iteration - an attempt
Iteration w1 w2
1
2
3
4
![Page 18: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/18.jpg)
Neural Networks - Training a Neural Network
Iteration w1 w2
1 1.059 1.059
2
3
4
● Goal: inputs map to desired vectors
● Network starts with initial weights
● Error evaluated. New weights tried.
● Iteration - an attempt
![Page 19: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/19.jpg)
Neural Networks - Training a Neural Network
Iteration w1 w2
1 1.059 1.059
2 1 -1.059
3
4
● Goal: inputs map to desired vectors
● Network starts with initial weights
● Error evaluated. New weights tried.
● Iteration - an attempt
![Page 20: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/20.jpg)
Neural Networks - Training a Neural Network
Iteration w1 w2
1 1.059 1.059
2 1 -1.059
3 1.252 -2.222
4
● Goal: inputs map to desired vectors
● Network starts with initial weights
● Error evaluated. New weights tried.
● Iteration - an attempt
![Page 21: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/21.jpg)
Neural Networks - Training a Neural Network
Iteration w1 w2
1 1.059 1.059
2 1 -1.059
3 1.252 -2.222
4 1.683 -2.515
● Goal: inputs map to desired vectors
● Network starts with initial weights
● Error evaluated. New weights tried.
● Iteration - an attempt
![Page 22: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/22.jpg)
Neural Networks - Testing Error vs. Training Error
● Training error - the network’s
error when run over the data it
was trained on.
English WordEnglish Word
Vector
French Word
VectorFrench Word
squirrel0.2
0.2
0.3
-0.4écureuils
acorns0.6
-0.4
0.9
0.8glands
eat-0.2
0.4
-0.3
-0.8mangent
the-0.4
-0.4
-0.6
0.8les
![Page 23: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/23.jpg)
Neural Networks - Testing Error vs. Training Error
● Testing error - the network’s
error when run over new and
unfamiliar data.
English WordEnglish Word
Vector
French Word
VectorFrench Word
frogs0.6
0.2
0.9
-0.3grenouilles
loudly0.6
-0.3
0.9
0.6bruyamment
croak0
-0.4
0
-0.8croassent
two-0.2
-0.2
-0.3
0.1deux
![Page 24: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/24.jpg)
Outline
● Introduction
● Neural Networks
● Residual Connections
○ Definitions
○ Multi-layer networks
○ Difficulty of Training Deep Neural Networks
○ Plain vs. Residual Mapping
○ Evaluations
● Attention Network
● Summary
![Page 25: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/25.jpg)
Residual Connections - Plain vs. Residually Connected Networks
● Plain Neural Network - does not have residual connections
● Residually Connected Neural Network - has residual connections
![Page 26: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/26.jpg)
Residual Connections - Multi-layer networks
x2
y2
x1
y1
tanh
tanh
w1
w2
![Page 27: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/27.jpg)
Residual Connections - Multi-layer networks
x2
y2
x1
y1
tanh
tanh
w1
w2
Network Unit
![Page 28: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/28.jpg)
Network Unit
Residual Connections - Multi-layer networks
x yInput layer Output
Output layer
![Page 29: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/29.jpg)
Network Unit
Residual Connections - Multi-layer networks
x
y
Input layer
Output
● In a single-layer network:
○ output layer input = input layer
○ output layer output = network output
Output layer
![Page 30: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/30.jpg)
Residual Connections - Multi-layer networks
x0
y
Input layer
Output
● In a plain multi-layer network:
○ 1st hidden layer input = input layer
○ 1st h. layer output = 2nd h. layer input
○ etc...
○ output layer output = network output
Hidden Layer 1 Network Unit
x1
Network Unit
x2
Network Unit
Hidden Layer 2
Output layer
![Page 31: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/31.jpg)
Residual Connections - The Difficulty of Training Plain DNN
● Deep Neural Networks (DNN)
○ More than one hidden layer
○ More accurate
○ Can account for more
complex data patterns.
● Plain DNNs are more difficult to
train.
● Residually connected DNNs have
residual connections to help with
training.
y
Input vector
Output vector
Network Unit
Network Unit
Network Unit
Network Unit
xn-1
...
Hidden Layer 1
Hidden Layer 2
Hidden Layer 3
Output layer
x0
x1
x2
![Page 32: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/32.jpg)
Residual Connections - The Difficulty of Training Plain DNN
● He et al.’s (2015) testing revealed that plain DNN suffered from higher
training error, and as a result, a higher testing error.
Training error Testing error
20% -
10% -
0% -
20% -
10% -
0% -
![Page 33: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/33.jpg)
Residual Connections - The Difficulty of Training Deep Neural Networks
y
Input vector
Output vector
● Plain DNN are more difficult to train
because:
○ Different parts of output lose accuracy
○ Difficult to achieve identity mappings
○ Identity mapping - output = input
Network Unit
Network Unit
Network Unit
Network Unit
xn-1
...
Hidden Layer 1
Hidden Layer 2
Hidden Layer 3
Output layer
x0
x1
x2
![Page 34: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/34.jpg)
Residual Connections - Plain vs. Residual Mappings
● Perfectly retyping the original paragraph = identity mapping.
Parks are lovely
places to feed the
birds. Some people
have picnics in parks.
Frisbee is a common
activity to play in the
park. If the park has a
hill, then people might
go sledding in the
winter season.
Parks are lvely
places to feed the
birds. Some people
have picnics in parks.
Frisbeee is a common
activity to play in the
park. If the park has a
hill, then peeple might
go sledding in the
winter season.
Original paragraph Re-typed paragraph/
Plain Neural Network
![Page 35: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/35.jpg)
Residual Connections - The Difficulty of Training Deep Neural Networks
● Example:
○ x-axis value is perfect
○ y-value needs improvement
x
y
y
Input vector
Output vector
Network Unit
Network Unit
Network Unit
Network Unit
xn-1
...
Hidden Layer 1
Hidden Layer 2
Hidden Layer 3
Output layer
x0
x1
x2
![Page 36: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/36.jpg)
Residual Connections - The Difficulty of Training Deep Neural Networks
● Example:
○ x-axis value is perfect
○ y-value needs improvement
○ Goal: Maintain x, improve y
x
y
x
y
y
Input vector
Output vector
Network Unit
Network Unit
Network Unit
Network Unit
xn-1
...
Hidden Layer 1
Hidden Layer 2
Hidden Layer 3
Output layer
x0
x1
x2
![Page 37: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/37.jpg)
Residual Connections - The Difficulty of Training Deep Neural Networks
● Example:
○ x-axis value is perfect
○ y-value needs improvement
○ Goal: Maintain x, improve y
○ Reality: x degrades, y improves
x
y
x
y
y
Input vector
Output vector
Network Unit
Network Unit
Network Unit
Network Unit
xn-1
...
Hidden Layer 1
Hidden Layer 2
Hidden Layer 3
Output layer
x0
x1
x2
![Page 38: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/38.jpg)
Residual Connections - Plain vs. Residual Mappings
● More room for error when re-typing a paragraph than when copy-and-pasting
Parks are lovely
places to feed the
birds. Some people
have picnics in parks.
Frisbee is a common
activity to play in the
park. If the park has a
hill, then people might
go sledding in the
winter season.
Parks are lvely
places to feed the
birds. Some people
have picnics in parks.
Frisbe is a common
activity to play in the
park. If the park has a
hill, then peeple might
go sledding in the
winter season.
Parks are lovely
places to feed the
birds. Some people
have picnics in parks.
Frisbee is a common
activity to play in the
park. If the park has a
hill, then people might
go sledding in the
winter season.
Original paragraph Re-typed paragraph/
Plain Neural Network
Copy-and-pasted paragraph/
Residually Connected Network
![Page 39: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/39.jpg)
Residual Connections - Structure
Residual Connections
● layer i mapping: xi - xi-1
● xi = xi-1 when weights = 0
Network Unit
xi-1
Network Unit
xi
mi
Network Unit
xi-1
Network Unit
xi
With Residual Connection Without Residual Connection
Layer i + 1
Layer i
Residual
Connection
![Page 40: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/40.jpg)
Residual Connections - Structure
Network Unit
xi-1
Network Unit
xi
mi
Network Unit
xi-1
Network Unit
xi
With Residual Connection Without Residual Connection
Layer i + 1
Layer i
Residual
Connection
Residual Connections
● layer i mapping: xi - xi-1
● xi = xi-1 when weights = 0
Retyping
← Result →
![Page 41: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/41.jpg)
Residual Connections - Structure
Network Unit
xi-1
Network Unit
xi
mi
Network Unit
xi-1
Network Unit
xi
With Residual Connection Without Residual Connection
Layer i + 1
Layer i
Retyping
← Result →
Residual
Connection
Residual Connections
● layer i mapping: xi - xi-1
● xi = xi-1 when weights = 0
← Copy
![Page 42: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/42.jpg)
Residual Connections - Structure
Network Unit
xi-1
Network Unit
xi
mi
Network Unit
xi-1
Network Unit
xi
With Residual Connection Without Residual Connection
Layer i + 1
Layer i
← Copy
← Paste
Retyping
← Result →
Residual
Connection
Residual Connections
● layer i mapping: xi - xi-1
● xi = xi-1 when weights = 0
![Page 43: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/43.jpg)
Residual Connections - Structure
Network Unit
xi-1
Network Unit
xi
mi
Network Unit
xi-1
Network Unit
xi
With Residual Connection Without Residual Connection
Layer i + 1
Layer i
Edits → Residual
Connection
Residual Connections
● layer i mapping: xi - xi-1
● xi = xi-1 when weights = 0
← Copy
← Paste
Retyping
← Result →
![Page 44: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/44.jpg)
Residual Connections - Structure
Network Unit
xi-1
Network Unit
xi
mi
Network Unit
xi-1
Network Unit
xi
With Residual Connection Without Residual Connection
Layer i + 1
Layer i
Edits → Retyping
← Result →
Residual
Connection
Residual Connections
● layer i mapping: xi - xi-1
● xi = xi-1 when weights = 0
← Copy
← Paste
![Page 45: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/45.jpg)
Residual Connections - Evaluations
● Google comparisons not
available.
● He et al. (2015) compared
four networks:○ an 18-layer plain network
○ a 34-layer plain network
○ an 18-layer residual network
○ and a 34-layer residual network
18-layer plain 18-layer residual
34-layer plain 34-layer residual
![Page 46: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/46.jpg)
Residual Connections - Evaluations
Testing Error plain residual
18 layers 27.94% 27.88%
34 layers 28.54% 25.03%
60%
50%
40%
30%
20%
60%
50%
40%
30%
20%
Plain Residually Connected
He et al. (2015)
![Page 47: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/47.jpg)
Outline
● Introduction
● Neural Networks
● Residual Connections
● Attention Network
○ Recurrent Neural Networks
○ Encoder-Decoder Model
○ Attention Mechanism
○ Evaluations
● Summary
![Page 48: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/48.jpg)
Attention Networks - Recurrent Neural Networks
● Recurrent Neural Networks (RNN) loop over the same unit multiple times.○ Each iteration of the loop is known as a time step.
“The squirrels
eat the acorns.”
“Les écureuils
mangent les
glands.”
RNN
“The”
“Les”
RNN1
“squirrels”
“écureuils”
“eat”
“mangent”
“the”
“les”
“acorns”
“glands”
RNN2 RNN3 RNN4 RNN5
“.”
“.”
RNN5
![Page 49: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/49.jpg)
Attention Networks - Recurrent Neural Networks
● Bidirectional RNN - information travels forwards and backwards, giving words
the context of words before and after them.
“The squirrels
eat the acorns.”
“Les écureuils
mangent les
glands.”
RNN
“The”
“Les”
RNN1
“squirrels”
“écureuils”
“eat”
“mangent”
“the”
“les”
“acorns”
“glands”
RNN2 RNN3 RNN4 RNN5
“.”
“.”
RNN5
![Page 50: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/50.jpg)
Attention Networks - Encoder-Decoder
● Two RNN
● Encoder network - produces
context vector
● context vector - contains sentence
information
● Decoder network - produces
translated sentence
The squirrels eat the acorns.
Les écureuils mangent les glands.
[c1,c2,c3,...,cn]
Encoder Network
Decoder Network
Context
vector
![Page 51: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/51.jpg)
Attention Networks - Encoder-Decoder
● Issues with longer sentences
● Context vector is a fixed size
The squirrels eat the acorns.
Les écureuils mangent les glands.
Encoder Network
Decoder Network
Context
vector
[c1,c2,c3,...,cn]
![Page 52: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/52.jpg)
Attention Networks - Encoder-Decoder
An admitting privilege is the right of a doctor to admit a patient to a hospital or a medical centre to carry out a diagnosis
or a procedure, based on his status as a health care worker at a hospital.
Un privilège d’admission est le droit d’un médecin de reconnaître un patient à l’hôpital ou un centre médical d’un
diagnostic ou de prendre un diagnostic en fonction de son état de santé.
[c1,c2,c3,...,cn]
Encoder Network
Decoder Network
Context
vector
● Issues with longer sentences
● Context vector is a fixed size
![Page 53: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/53.jpg)
Attention Networks - Translation Comparison
An admitting privilege is the right of a doctor to admit a patient to a hospital or a medical centre to carry
out a diagnosis or a procedure, based on his status as a health care worker at a hospital.
Input Sentence
Encoder-Decoder Translation
Attention Network Translation
Un privilège d’admission est le droit d’un médecin de reconnaître un patient à l’hôpital ou un centre
médical d’un diagnostic ou de prendre un diagnostic en fonction de son état de santé.
Un privilège d’admission est le droit d’un médecin d’admettre un patient à un hôpital ou un centre médical
pour effectuer un diagnostic ou une procédure, selon son statut de travailleur des soins de santé à l’hôpital.
[based on his state of health]
[Examples from Bahdanau et al. (2015)]
![Page 54: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/54.jpg)
Attention Networks - Addition of an Attention Mechanism
“The squirrels eat the acorns.”
“Les écureuils mangent les glands.”
Encoder Network
Decoder Network
Context vector
at time step t
h5h3h2h1 h4
Hidden state
vectors
ct
Attention
Network
w1w2 w3 w4 w5
● Attention network model extends
encoder-decoder model.
● Three networks:
○ Encoder Network
○ Attention Network
○ Decoder Network
![Page 55: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/55.jpg)
“Les écureuils mangent les glands.”
Decoder Network
Context vector
at time step tct
Attention
Network
w1w2 w3 w4 w5
Attention Networks - Addition of an Attention Mechanism
● Encoder produces hidden state
vectors.
● Hidden state vectors
○ One for each input word
○ Includes information about the whole
input sentence
○ Focuses strongly on what surrounds its
word
“The squirrels eat the acorns.”
Encoder Network
h5h3h2h1 h4
Hidden state
vectors
![Page 56: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/56.jpg)
“Les écureuils mangent les glands.”
Decoder Network
Context vector
at time step tct
Attention
Network
w1w2 w3 w4 w5
Attention Networks - Addition of an Attention Mechanism
“The squirrels eat the acorns.”
Encoder Network
h5h3h2h1 h4
Hidden state
vectors
● Encoder produces hidden state
vectors.
● Hidden state vectors
○ One for each input word
○ Includes information about the whole
input sentence
○ Focuses strongly on what surrounds its
word
![Page 57: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/57.jpg)
“The squirrels eat the acorns.”
Encoder Network
h5h3h2h1 h4
Hidden state
vectors
Attention
Network
w1w2 w3 w4 w5
Attention Networks - Addition of an Attention Mechanism
● Context vector for each output word
● Decoder uses tth context vector to
translate tth word
“Les écureuils mangent les glands.”
Decoder Network
Context vector
at time step tct
![Page 58: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/58.jpg)
“The squirrels eat the acorns.”
Encoder Network
h5h3h2h1 h4
Hidden state
vectors
Attention
Network
w1w2 w3 w4 w5
Attention Networks - Addition of an Attention Mechanism
“Les écureuils mangent les glands.”
Decoder Network
Context vector
at time step 3c3
● Context vector for each output word
● Decoder uses tth context vector to
translate tth word
![Page 59: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/59.jpg)
Attention
Network
“The squirrels eat the acorns.”
“Les écureuils mangent les glands.”
Encoder Network
Attention Networks - Addition of an Attention Mechanism
Hidden state
vectors
● tth context vector = sum of the
weighted hidden state vectors.
● Weights are recalculated for each
time step.
○ determine strength of input word’s
effect on output word.
Decoder Network
Context vector
at time step t
h5h3h2h1 h4
ct
w1w2 w3 w4 w5
![Page 60: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/60.jpg)
Attention Mechanisms - Word Influence
The squirrels eat the
acorns .
Les écureuils mangent les
glands .
![Page 61: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/61.jpg)
“The squirrels eat the acorns.”
“Les écureuils mangent les glands.”
Encoder Network
Attention Networks - Addition of an Attention Mechanism
● Attention mechanism
○ Gives decoder more information for
longer sentences, and less information
for shorter sentences
● Attention Network
○ Determines the weights of the hidden
state vectors
○ Takes in information from decoder and
hidden state vectors.Decoder Network
Context vector
at time step t
h5h3h2h1 h4
Hidden state
vectors
ct
Attention
Network
w1w2 w3 w4 w5
![Page 62: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/62.jpg)
Attention Networks - Evaluations
● Google comparisons not available.
● Bahdanau et al. (2015) evaluated an encoder-decoder model against an
attention network model using the ACL WMT ‘14 dataset.
● Higher BLEU scores mean the translation is closer to human translation.
Model BLEU Score
RNNencdec-30 13.93
RNNencdec-50 17.82
RNNsearch-30 21.50
RNNsearch-50 26.75
Encoder-Decoder
Model
Attention
Network Model
![Page 63: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/63.jpg)
Summary
Network Unit
xi-1
Network Unit
xi
mi
Residual Connections
Layer i + 1
Layer i
Residual
Connection
“The squirrels eat the acorns.”
“Les écureuils mangent les glands.”
Encoder Network
Decoder Network
h5h3h2h1 h4
ct
Attention
Network
w1w2 w3 w4 w5
Attention Network Model
![Page 64: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/64.jpg)
Acknowledgements
Thank you for your time and attention!
Thank you to my advisor Elena Machkasova, KK Lamberty
and Mitchell Finzel for your guidance and feedback.
![Page 65: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/65.jpg)
References
[1] D. Bahdanau, K. Cho, and Y. Bengio. Neural Machine Translation by jointly learning to align and
translate.IEEE Conference on Computer Vision and Pattern Recognition, 2015.
[2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.IEEE
Conference on Computer Vision and Pattern Recognition, 2015.
[3] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi,W. Macherey, M. Krikun, Y. Cao, Q.
Gao,K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo,H.
Kazawa, K. Stevens, G. Kurian, N. Patil,W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick,O. Vinyals,
G. Corrado, M. Hughes, and J. Dean.Google’s neural machine translation system: Bridging the gap
between human and Machine Translation. arXiv preprint arXiv:1609.08144, 2016.
[4] M. Sundermeyer, H. Ney, and R. Schluter. From feedforward to recurrent lstm neural
networks for language modeling.IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE
PROCESSING, VOL. 23, NO. 3, MARCH 2015, 2015.
![Page 66: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/66.jpg)
Questions?
![Page 67: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/67.jpg)
![Page 68: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffd5805f55bba0c747069a9/html5/thumbnails/68.jpg)
Attention Mechanisms - Word Influence
English Sentence:
The girls played tag and then they became tired.
Incorrect French Translation:
Les filles ont joué une étiquette et ensuite ils sont devenus fatigués.
Correct French Translation:
Les filles ont joué une étiquette et ensuite elles sont devenues fatiguées.