neural turing machines(slides)
TRANSCRIPT
![Page 1: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/1.jpg)
Neural Turing Machines
Can neural nets learn programs? Alex Graves Greg Wayne Ivo Danihelka
![Page 2: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/2.jpg)
![Page 3: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/3.jpg)
![Page 4: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/4.jpg)
Contents
1. IntroducBon 2. FoundaBonal Research 3. Neural Turing Machines 4. Experiments 5. Conclusions
![Page 5: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/5.jpg)
IntroducBon
• First applicaBon of Machine Learning to logical flow and external memory
![Page 6: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/6.jpg)
IntroducBon
• First applicaBon of Machine Learning to logical flow and external memory
• Extend the capabiliBes of neural networks by coupling them to external memory
![Page 7: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/7.jpg)
IntroducBon
• First applicaBon of Machine Learning to logical flow and external memory
• Extend the capabiliBes of neural networks by coupling them to external memory
• Analogous to TM coupling a finite state machine to infinite tape
![Page 8: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/8.jpg)
IntroducBon
• First applicaBon of Machine Learning to logical flow and external memory
• Extend the capabiliBes of neural networks by coupling them to external memory
• Analogous to TM coupling a finite state machine to infinite tape
• RNN’s have been shown to be Turing-‐Complete, Siegelmann et al ‘95
![Page 9: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/9.jpg)
IntroducBon
• First applicaBon of Machine Learning to logical flow and external memory
• Extend the capabiliBes of neural networks by coupling them to external memory
• Analogous to TM coupling a finite state machine to infinite tape
• RNN’s have been shown to be Turing-‐Complete, Siegelmann et al ‘95
• Unlike TM, NTM is completely differenBable
![Page 10: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/10.jpg)
FoundaBonal Research
• Neuroscience and Psychology – Concept of “working memory”: short-‐term memory storage and rule based manipulaBon
– Also known as “rapidly created variables”
![Page 11: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/11.jpg)
FoundaBonal Research
• Neuroscience and Psychology – Concept of “working memory”: short-‐term memory storage and rule based manipulaBon
– Also known as “rapidly created variables” – ObservaBonal neuroscience results in the pre-‐frontal cortex and basal ganglia of monkeys
![Page 12: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/12.jpg)
FoundaBonal Research
• Neuroscience and Psychology • CogniBve Science and LinguisBcs – AI and CogniBve Science were contemporaneous in 1950’s-‐1970’s
![Page 13: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/13.jpg)
FoundaBonal Research
• Neuroscience and Psychology • CogniBve Science and LinguisBcs – AI and CogniBve Science were contemporaneous in 1950’s-‐1970’s
– Two fields parted ways when neural nets received criBcism, Fodor et al. ‘88 • Incapable of “variable-‐binding”
– eg “Mary spoke to John”
• Incapable of handling variable sized input
![Page 14: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/14.jpg)
FoundaBonal Research
• Neuroscience and Psychology • CogniBve Science and LinguisBcs – AI and CogniBve Science were contemporaneous in 1950’s-‐1970’s
– Two fields parted ways when neural nets received criBcism, Fodor et al. ’88
– MoBvated Recurrent Networks research to handle variable binding and variable length input
![Page 15: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/15.jpg)
FoundaBonal Research
• Neuroscience and Psychology • CogniBve Science and LinguisBcs – AI and CogniBve Science were contemporaneous in 1950’s-‐1970’s
– Two fields parted ways when neural nets received criBcism, Fodor et al. ’88
– MoBvated Recurrent Networks research to handle variable binding and variable length input
– Recursive processing hot debate topic in role inhuman evoluBon (Pinker vs Chomsky)
![Page 16: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/16.jpg)
FoundaBonal Research
• Neuroscience and Psychology • CogniBve Science ad LinguisBcs • Recurrent Neural networks – Broad class of machines with distributed and dynamic state
![Page 17: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/17.jpg)
FoundaBonal Research
• Neuroscience and Psychology • CogniBve Science ad LinguisBcs • Recurrent Neural networks – Broad class of machines with distributed and dynamic state
– Long Short Term Memory RNN’s designed to handle vanishing and exploding gradient
![Page 18: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/18.jpg)
FoundaBonal Research
• Neuroscience and Psychology • CogniBve Science ad LinguisBcs • Recurrent Neural networks – Broad class of machines with distributed and dynamic state
– Long Short Term Memory RNN’s designed to handle vanishing and exploding gradient
– NaBvely handle variable length structures
![Page 19: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/19.jpg)
Neural Turing Machines
![Page 20: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/20.jpg)
Neural Turing Machines
![Page 21: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/21.jpg)
Neural Turing Machines
1. Reading – Mt is NxM matrix of memory at Bme t
![Page 22: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/22.jpg)
Neural Turing Machines
1. Reading – Mt is NxM matrix of memory at Bme t – wt
![Page 23: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/23.jpg)
Neural Turing Machines
1. Reading 2. WriBng involves both erasing and adding
![Page 24: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/24.jpg)
Neural Turing Machines
1. Reading 2. WriBng involves both erasing and adding
![Page 25: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/25.jpg)
Neural Turing Machines
1. Reading 2. WriBng involves both erasing and adding 3. Addressing
![Page 26: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/26.jpg)
Neural Turing Machines
• 3. Addressing – 1. Focusing by Content • Each head produces key vector kt of length M • Generated a content based weight wt
c based on similarity measure, using ‘key strength’ βt
![Page 27: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/27.jpg)
Neural Turing Machines
• 3. Addressing – 2. InterpolaBon • Each head emits a scalar interpolaBon gate gt
![Page 28: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/28.jpg)
Neural Turing Machines
• 3. Addressing – 3. ConvoluBonal shif • Each head emits a distribuBon over allowable integer shifs st
![Page 29: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/29.jpg)
Neural Turing Machines
• 3. Addressing – 4. Sharpening • Each head emits a scalar sharpening parameter γt
![Page 30: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/30.jpg)
Neural Turing Machines
• 3. Addressing (puhng it all together)
![Page 31: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/31.jpg)
Neural Turing Machines
• 3. Addressing (puhng it all together) – This can operate in three complementary modes • A weighBng can be chosen by the content system without any modificaBon by the locaBon system
![Page 32: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/32.jpg)
Neural Turing Machines
• 3. Addressing (puhng it all together) – This can operate in three complementary modes • A weighBng can be chosen by the content system without any modificaBon by the locaBon system • A weighBng produced by the content addressing system can be chosen and then shifed
![Page 33: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/33.jpg)
Neural Turing Machines
• 3. Addressing (puhng it all together) – This can operate in three complementary modes • A weighBng can be chosen by the content system without any modificaBon by the locaBon system • A weighBng produced by the content addressing system can be chosen and then shifed • A weighBng from the previous Bme step can be rotated without any input from the content-‐based addressing system
![Page 34: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/34.jpg)
Neural Turing Machines
• Controller Network Architecture – Feed Forward vs Recurrent
![Page 35: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/35.jpg)
Neural Turing Machines
• Controller Network Architecture – Feed Forward vs Recurrent – The LSTM version of RNN has own internal memory complementary to M
![Page 36: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/36.jpg)
Neural Turing Machines
• Controller Network Architecture – Feed Forward vs Recurrent – The LSTM version of RNN has own internal memory complementary to M
– Hidden LSTM layers are ‘like’ registers in processor
![Page 37: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/37.jpg)
Neural Turing Machines
• Controller Network Architecture – Feed Forward vs Recurrent – The LSTM version of RNN has own internal memory complementary to M
– Hidden LSTM layers are ‘like’ registers in processor
– Allows for mix of informaBon across mulBple Bme-‐steps
![Page 38: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/38.jpg)
Neural Turing Machines
• Controller Network Architecture – Feed Forward vs Recurrent – The LSTM version of RNN has own internal memory complementary to M
– Hidden LSTM layers are ‘like’ registers in processor
– Allows for mix of informaBon across mulBple Bme-‐steps
– Feed Forward has bejer transparency
![Page 39: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/39.jpg)
Experiments
• Test NTM’s ability to learn simple algorithms like copying and sorBng
![Page 40: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/40.jpg)
Experiments
• Test NTM’s ability to learn simple algorithms like copying and sorBng
• Demonstrate that soluBons generalize well beyond the range of training
![Page 41: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/41.jpg)
Experiments
• Test NTM’s ability to learn simple algorithms like copying and sorBng
• Demonstrate that soluBons generalize well beyond the range of training
• Tests three architectures – NTM with feed forward controller – NTM with LSTM controller – Standard LSTM network
![Page 42: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/42.jpg)
Experiments
• 1. Copy – Tests whether NTM can store and retrieve data – Trained to copy sequences of 8 bit vectors – Sequences vary between 1-‐20 vectors
![Page 43: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/43.jpg)
Experiments
• 1. Copy
![Page 44: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/44.jpg)
Experiments
• 1. Copy – NTM
![Page 45: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/45.jpg)
Experiments
• 1. Copy – LSTM
![Page 46: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/46.jpg)
Experiments
• 1. Copy
![Page 47: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/47.jpg)
Experiments
• 2. Repeat Copy – Tests whether NTM can learn simple nested funcBon
– Extend copy by repeatedly copying input specified number of Bmes
– Training is a random-‐length sequence of 8 bit binary inputs plus a scalar value for # of copies
– Scalar value is random between 1-‐10
![Page 48: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/48.jpg)
Experiments
• 2. Repeat Copy
![Page 49: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/49.jpg)
Experiments
• 2. Repeat Copy
![Page 50: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/50.jpg)
Experiments
• 2. Repeat Copy
![Page 51: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/51.jpg)
Experiments
• 3. AssociaBve Recall – Tests NTM’s ability to associate data references – Training input is list of items, followed by a query item
– Output is subsequent item in list – Each item is a three sequence 6-‐bit binary vector – Each ‘episode’ has between two and six items
![Page 52: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/52.jpg)
Experiments
• 3. AssociaBve Recall
![Page 53: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/53.jpg)
Experiments
• 3. AssociaBve Recall
![Page 54: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/54.jpg)
Experiments
• 3. AssociaBve Recall
![Page 55: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/55.jpg)
Experiments
• 4. Dynamic N-‐Grams – Test whether NTM could rapidly adapt to new predicBve distribuBons
– Trained on 6-‐gram binary pajern on 200 bit sequences
– Can NTM learn opBmal esBmator
![Page 56: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/56.jpg)
Experiments
• 4. Dynamic N-‐Grams
![Page 57: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/57.jpg)
Experiments
• 4. Dynamic N-‐Grams
![Page 58: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/58.jpg)
Experiments
• 4. Dynamic N-‐Grams
![Page 59: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/59.jpg)
Experiments
• 5. Priority Sort – Tests whether NTM can sort data – Input is sequence of 20 random binary vectors, each with a scalar raBng drawn from [-‐1, 1]
– Target sequence is 16-‐highest priority vectors
![Page 60: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/60.jpg)
Experiments
• 5. Priority Sort
![Page 61: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/61.jpg)
Experiments
• 5. Priority Sort
![Page 62: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/62.jpg)
Experiments
• 5. Priority Sort
![Page 63: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/63.jpg)
Experiments
• 6. Details – RMSProp algorithm – Momentum 0.9 – All LSTM’s had three stacked hidden layers
![Page 64: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/64.jpg)
Experiments
• 6. Details
![Page 65: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/65.jpg)
Experiments
• 6. Details
![Page 66: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/66.jpg)
Experiments
• 6. Details
![Page 67: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/67.jpg)
Conclusion
• Introduced an neural net architecture with external memory that is differenBable end-‐to-‐end
• Experiments demonstrate that NTM are capable of leaning simple algorithms and are capable of generalizing beyond training regime
![Page 68: Neural Turing Machines(Slides)](https://reader036.vdocument.in/reader036/viewer/2022062401/5892efc01a28abb9788b53e1/html5/thumbnails/68.jpg)