sequence to sequence models - deep learningandrew ng sequence to sequence model jane visite...
TRANSCRIPT
AndrewNg
Sequence to sequence modelJane visite l’Afrique en septembre
Jane is visiting Africa in September.
!"#$
%"&$ %"'($
⋯
[Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]
%"&$ %"*$ %"+$ %",$ %"-$
."&$ ."*$ ."+$ .",$ ."-$ ."/$
[Sutskever et al., 2014. Sequence to sequence learning with neural networks]
AndrewNg
Image captioningA cat sitting on a chair."&$ ."*$ ."+$ .",$ ."-$
55×55 ×96 27×27 ×96 27×27 ×256 13×13 ×256
11× 11s = 4
3× 3s = 2
MAX-POOL
5× 5same
3× 3s = 2
MAX-POOL
13×13 ×384
3× 3same
3× 3
=
13×13 ×384 13×13 ×256 6×6 ×256
3× 3 3× 3s = 2
MAX-POOL
⋮
9216
Softmax1000
⋮
4096
⋮
4096
[Mao et. al., 2014. Deep captioning with multimodal recurrent neural networks][Vinyals et. al., 2014. Show and tell: Neural image caption generator][Karpathy and Li, 2015. Deep visual-semantic alignments for generating image descriptions]
."/$
.9"':$
%
.9"&$ .9"*$
⋯
AndrewNg
Machine translation as building a conditional language model
Language model:
Machine translation: !"#$
%"&$
'("&$
%"*+$
'("*,$
⋯⋯
!"#$
%"&$
'("&$ '(".$ '("*,$
⋯%".$
AndrewNg
Finding the most likely translation
Jane visite l’Afrique en septembre. /('"&$, … , '"*,$|%)
Jane is visiting Africa in September.
Jane is going to be visiting Africa in September.
In September, Jane will visit Africa.
Her African friend welcomed Jane in September.
argmax:;<=,…,:;>,=
/('"&$,… , '"*,$|%)
AndrewNg
Why not a greedy search?
Jane is visiting Africa in September.
Jane is going to be visiting Africa in September.
!"#$
%"&$
'("&$
%"*+$
'("*,$
⋯⋯
AndrewNg
Beam search algorithm
!"#$
%"&$
'("&$
%"*+$
⋯
a
in
jane
september
zulu
⋮
⋮
⋮
⋮
10000
0('"&$|%)Step 1
AndrewNg
Beam search algorithm
a
in
jane
september
zulu
⋮
⋮
⋮
⋮
Step 1
10000
!"#$
%"&$ %"*+$
⋯
!"#$
%"&$ %"*+$
⋯
!"#$
%"&$ %"*+$
⋯
Step 2
AndrewNg
Beam search (4 = 3)in september
jane is
jane visits
!"#$
%"&$ %"*+$
'("7$
⋯septemberin
0('"&$, '"9$|%) jane visits africa in september. <EOS>
!"#$
%"&$ %"*+$
'("7$
⋯isjane
!"#$
%"&$ %"*+$
'("7$
⋯visitsjane
AndrewNg
Length normalization
argmax'
() *+,- ., *+0-, … , *+,20-)45
,60
argmax'
7 log) *+,- ., *+0-, … , *+,20-)45
,60
7 log) *+,- ., *+0-, … , *+,20-)45
,60
AndrewNg
Beam search discussion
Beam width B?
Unlike exact search algorithms like BFS (Breadth First Search) or DFS (Depth First Search), Beam Search runs faster but is not guaranteed to find exact maximum forargmax
')(*|.).
AndrewNg
ExampleJane visite l’Afrique en septembre.
Human: Jane visits Africa in September.
Algorithm: Jane visited Africa last September.
!"#$
%"&$ %"'($⋯
AndrewNg
Error analysis on beam searchHuman: Jane visits Africa in September. (+∗)Algorithm: Jane visited Africa last September. (+.)Case 1:
Beam search chose +.. But +∗ attains higher / + % .Conclusion: Beam search is at fault.
Case 2:
+∗is a better translation than +.. But RNN predicted / +∗ % < / +. % .Conclusion: RNN model is at fault.
AndrewNg
Error analysis process
Jane visits Africa in September.
Jane visited Africa last September.
Human Algorithm / +∗ % / +. % At fault?
Figures out what faction of errors are “due to” beam search vs. RNN model
AndrewNg
Evaluating machine translation
French: Le chat est sur le tapis.
Reference 1: The cat is on the mat.
Reference 2: There is a cat on the mat.
MT output: the the the the the the the.
Precision: Modified precision:
[Papineni et. al., 2002. Bleu: A method for automatic evaluation of machine translation]
AndrewNg
Bleu score on bigramsExample: Reference 1: The cat is on the mat.
Reference 2: There is a cat on the mat.
MT output: The cat the cat on the mat.
the cat
cat the
cat on
on the
the mat
[Papineni et. al., 2002. Bleu: A method for automatic evaluation of machine translation]
AndrewNg
Bleu score on unigramsExample: Reference 1: The cat is on the mat.
Reference 2: There is a cat on the mat.
MT output: The cat the cat on the mat.
[Papineni et. al., 2002. Bleu: A method for automatic evaluation of machine translation]
!" =
%
&'()*+,∈./
0123456(7
(239:;<=)
%
&'()*+,∈./
01234
(239:;<=)
!' =
%
')*+,∈./
0123456(7
(3:;<=)
%
')*+,∈./
01234
(3:;<=)
AndrewNg
Bleu details
!' = Bleu score on n-grams only
Combined Bleu score:
BP =1 if MT_output_length > reference_output_length
exp(1 − MT_output_length/reference_output_length) otherwise
[Papineni et. al., 2002. Bleu: A method for automatic evaluation of machine translation]
AndrewNg
The problem of long sequences
Jane s'est rendue en Afrique en septembre dernier, a apprécié la culture et a rencontré beaucoup de gens merveilleux; elle est revenue en parlant comment son voyage était merveilleux, et elle me tente d'y aller aussi.
Jane went to Africa last September, and enjoyed the culture and met many wonderful people; she came back raving about how wonderful her trip was, and is tempting me to go too.
10 20 30 40 50 Sentence length
Bleu score
!"#$
%"&$
'("&$
%"*+$
'("*,$
⋯⋯
AndrewNg
Attention model intuition
%"&$ %".$ %"/$ %"0$%"1$
jane visite l’Afrique en septembre
[Bahdanau et. al., 2014. Neural machine translation by jointly learning to align and translate]
'("&$ '(".$ '("0$'("/$ '("1$
!"#$
AndrewNg
Attention model
!"#$ !"%$ !"&$ !"'$!"($jane visite l’Afrique en septembre
[Bahdanau et. al., 2014. Neural machine translation by jointly learning to align and translate]
)"*$
AndrewNg[Bahdanau et. al., 2014. Neural machine translation by jointly learning to align and translate]
Computing attention +",,,.$
[Xu et. al., 2015. Show, attend and tell: Neural image caption generation with visual attention]
+",,,.$ = amount of attention /",$ should pay to )",.$
+",,,.$ = 123(567,7.8)
∑7.;<=> 123(567,7.8)
?",@#$
)",.$A",,,.$
+
/C",@#$ /C",$
?",@#$ ?",$
)"*$
!"#$
⋯
!"%$ !"E>$!"E>@#$
AndrewNg
Attention examples
July 20th 1969 1969 − 07 − 20
23April, 1564 1564 − 04 − 23
Visualization of +",,,.$:
AndrewNg
Attention model for speech recognition
+
“h”
%&'( %&)(
“T”
⋯
+&,(
!&'(⋯
!&)( !&',,,(!&---(⋯
AndrewNg
CTC cost for speech recognition
[Graves et al., 2006. Connectionist Temporal Classification: Labeling unsegmented sequence data with recurrent neural networks]
(Connectionist temporal classification)
“the quick brown fox”
Basic rule: collapse repeated characters not separated by “blank”
+&,(
!&'(
#.&',,,(
⋯!&)( !&',,,(
#.&'( #.&)(
AndrewNg
What is trigger word detection?
Amazon Echo(Alexa)
Baidu DuerOS(xiaodunihao)
Apple Siri(Hey Siri)
Google Home(Okay Google)
AndrewNg
Specialization outline
1. Neural Networks and Deep Learning
2. Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
3. Structuring Machine Learning Projects
4. Convolutional Neural Networks
5. Sequence Models
AndrewNg
Deep learning is a super power
Please buy this from shutterstockand replace in final video.