neural architectures with memory - university of illinoisslazebni.cs.illinois.edu › spring17 ›...
TRANSCRIPT
NeuralArchitectureswithMemory
NitishGupta,ShreyaRajpal25th April,2017
1
StoryComprehension
2
Joe went to the kitchen. Fred went to the kitchen. Joe picked up the milk. Joe travelled to his office. Joe left the milk. Joe went to the bathroom.
Q1 : Where is Joe?
Q2 : Where is the milk now?
Q3 : Where was Joe before the office?
Questions from Joe’s angry mother:
DialogueSystem
3
Hello! What can I do for you today?
I’d like to reserve a table for 6.
Sure! When would you like that reservation?
At 7 PM, please.
Okay. What cuisine would you like?
Actually make that 7:30 PM
Updated! What cuisine?
Is there anything better than a medium rare steak?
Nothing at all! Blackdog has a 4.7 on Yelp.
Sounds perfect! Also, add one more person.
Reservation done for 7, 7:30pm at Blackdog. Enjoy!
Mac
hine
Human
MLmodelsneedmemory!
DeeperAItasksrequireexplicitmemoryandmulti-hopreasoningoverit
•RNNshaveshortmemory•Cannotincreasememorywithoutincreasingnumberofparameters•Needforcompartmentalizedmemory•Read/Writeshouldbeasynchronous
4
MemoryNetworks(MemNN)
5
• ClassofModelswithmemory𝑚 - Arrayofobjects𝑚"
FourComponents:I- InputFeatureMap:InputmanipulationG- Generalization:MemoryManipulationO- OutputFeatureMap:OutputrepresentationgeneratorR- Response:ResponseGenerator
MemoryNetworks,Westonet.al.,ICLR2015
mi
Eachmemoryhereisadensevector
MemNN
6
1. InputFeatureMap• Imagineinputasasequenceofsentences𝑥"
2. UpdateMemories
MemoryNetworks,Westonet.al.,ICLR2015
xi Input Feature Map I(xi)
I(xi)
Generalization
MemNN
7
3. OutputRepresentation• Sayif𝑞 isaquestion,computeoutputrepresentation
4. GenerateAnswerResponse
MemoryNetworks,Westonet.al.,ICLR2015
Output Feature Map o
Response
o r
I(q)
SimpleMemNNforText
8
1. InputFeatureMap - Bag-of-Words representation
MemoryNetworks,Westonet.al.,ICLR2015
xi
Sentence
Bag-of-Words
𝐼(𝑥")
SimpleMemNNforText
9
2. Generalization:Storeinputinnewmemory
MemoryNetworks,Westonet.al.,ICLR2015
mi = I(xi)
Memoriestillnow(i=4) 𝐼(𝑥()
Memoriesafter5inputs
m m
SimpleMemNNforText
10
3. Output: Using𝑘 = 2memoryhopswithquery𝑥
4. Response - SingleWordAnswer
MemoryNetworks,Westonet.al.,ICLR2015
Scoreallmemoriesagainstinput1st Maxscoring
memoryindex
Scoreallwordsagainstqueryand2supportingmemories
Maxscoringword
i = 1, . . . , N
o1 = O1(x,m) = argmax sO(I(x),mi)
i = 1, . . . , N
o2 = O2(x,m) = argmax sO([I(x),mo1 ],mi)
r = argmax
w2W
s
R
([I(x),mo1 ,mo2 ], w)
Scoreallmemoriesagainstinput&𝑜-
2nd Maxscoringmemoryindex
ScoringFunction
• ScoringFunctionisanembeddingmodel
11
s(x, y) = x
TU
TUy
• Whatis𝑈𝑦 ?
Word Embedding Matrix
Sum of Word Embeddings
=
Uy = Uy
ScoringFunctionisjustdot-productbetweensumofwordembeddings!!!
MemoryNetworks,Westonet.al.,ICLR2015
12
MemoryNetworks,Westonet.al.,ICLR2015
Joe went to the kitchen.Fred went to the kitchen.Joe picked up the milk.Joe travelled to his office.Joe left the milk. Joe went to the bathroom.
InputSentences Memories
Where is the milk now?
Question 1st supportingmemory
Where is the milk now?
Question 2nd supportingmemory
Where is the milk now?
Question
Office
Response
TrainingObjective
13
X
f 6=mo1
max(0, � � sO(x,mo1) + sO(x, f)) +
Scorefortrue1st memory
Scoreforanegativememory
MemoryNetworks,Westonet.al.,ICLR2015
TrainingObjective
14
X
f 6=mo1
max(0, � � sO(x,mo1) + sO(x, f)) +
X
f 0 6=mo2
max(0, � � sO([x,mo1 ],mo2) + sO([x,mo1 ], f0)) +
Scorefortrue2nd memory
Scoreforanegativememory
MemoryNetworks,Westonet.al.,ICLR2015
TrainingObjective
15
X
f 6=mo1
max(0, � � sO(x,mo1) + sO(x, f)) +
X
f 0 6=mo2
max(0, � � sO([x,mo1 ],mo2) + sO([x,mo1 ], f0)) +
Scorefortrueresponse
Scoreforanegativeresponse
X
r 6=r
max(0, � � s
R
([x,mo1 ,mo2 ], r) + s
R
([x,mo1 ,mo2 ], r))
MemoryNetworks,Westonet.al.,ICLR2015
Experiment
16
• Large– ScaleQA• 14MStatements– (subject,relation,object)•MemoryHops;𝑘 = 1• Onlyre-rankedcandidatesfromothersystem
Method F1
Faderet.al.2013 0.54
Bordeset.al. 2014b 0.73
MemoryNetworks (Thiswork) 0.72
WhydoesMemoryNetworkperformexactlyaspreviousmodel?
MemoryNetworks,Westonet.al.,ICLR2015
Storedasmemories
Outputishighestscoringmemory
Experiment
17
• Large– ScaleQA• 14MStatements– (subject,relation,object)•MemoryHops;𝑘 = 1• Onlyre-rankedcandidatesfromothersystem
Method F1
Faderet.al.2013 0.54
Bordeset.al. 2014b 0.73
MemoryNetworks (Thiswork) 0.72
WhydoesMemoryNetworksnotperformaswell?
UsefulExperiment
18
• SimulatedWorldQA• 4characters,3objects,5rooms• 7kstatements,3kquestionsfortrainingandsamefortesting• Difficulty1(5)– Entityinquestionismentionedinlast1(5)sentences• For𝑘 = 2,annotationhasintermediatebestmemoriesaswell
MemoryNetworks,Westonet.al.,ICLR2015
Limitations
19
• SimpleBOWrepresentation
• SimulatedQuestionAnsweringdatasetistootrivial
• Strongsupervisioni.e.forintermediatememoriesisneeded
MemoryNetworks,Westonet.al.,ICLR2015
End-to-EndMemoryNetworks(MemN2N)
20
•Whatiftheannotationis:• Inputsentences𝑥-, 𝑥2,… , 𝑥4• Query𝑞• Answer𝑎
•Modelperformsby:• Generatingmemoriesfrominputs• Transformingqueryintosuitablerepresentation• Processqueryandmemoriesjointlyusingmultiplehopstoproducetheanswer• Backpropagatethroughthewholeprocedure
End-To-EndMemoryNetworks,Sukhbaataret.al.,NIPS2015
Joe went to the kitchen. Fred went to the kitchen. Joe picked up the milk. Joe travelled to his office. Joe left the milk. Joe went to the bathroom.
Where is the milk now?
Office
MemN2N
21
1. Convertinputtomemories𝑥" → 𝑚"
End-To-EndMemoryNetworks,Sukhbaataret.al.,NIPS2015
mi = Axi BOWinput
Word-EmbeddingMatrix
Sumofword-embeddings
2. Transformquery𝑞 intosamerepresentationspace
u = Bq
3. OutputVectors𝑥" → 𝑐"ci = Cxi
MemN2N
22
3. Scoringmemoriesagainstquery
End-To-EndMemoryNetworks,Sukhbaataret.al.,NIPS2015
Memories
Query(transformed)
Scoreforinput/memory
4. Generateoutput
pi = Softmax(uTmi)
o =X
i
pici
Weightedaverageofallinputs(transformed)
MemN2N
23
5. GeneratingResponse
End-To-EndMemoryNetworks,Sukhbaataret.al.,NIPS2015
a = Softmax(W (u+ o))
TrainingObjective– MaximumLikelihood/CrossEntropy
Distributionoverresponsewords
Averaged-outputQuery
ˆ
⇥ = argmax
NX
s=1
logP (as)
24End-To-EndMemoryNetworks,Sukhbaataret.al.,NIPS2015
Generatememories
TransformQuery
Generateoutputs
Scorememories
Makeaveragedoutput
Response
25End-To-EndMemoryNetworks,Sukhbaataret.al.,NIPS2015
Multi-hopMemN2N
u
k+1 = u
k + o
k
Hop1
Hop2
Hop3
DifferentMemoriesandOutputsforeachHop
Experiments
26End-To-EndMemoryNetworks,Sukhbaataret.al.,NIPS2015
• SimulatedWorldQA
• 20TasksfrombAbIdataset- 1Kand10Kinstancespertask
• Vocabulary=177wordsonly!!!!!
• 60epochs
• LearningRateannealing
• LinearStartwithdifferentlearningrate
• “Modeldivergedveryoften,hencetrainedmultiplemodels”
27End-To-EndMemoryNetworks,Sukhbaataret.al.,NIPS2015
MemNN MemN2N
Error%(1k) 6.7 12.4
Error%(10k) 3.2 7.5
MovieTriviaTime!
28
• WhichwasStanleyKubricks’sfirstmovie?
• Whendid2001:ASpaceOdysseyrelease?
• AfterTheShining,whichmoviediditsdirectordirect?
FearandDesire
1968
FullMetalJacket
(2001:a_space_odyssey,directed_by,stanley_kubrick)(fear_and_dark, directed_by,stanley_kubrick)
…(fear_and_dark, released_in,1953)
(full_metal_jacket, released_in, 1987)
…(2001:a_space_odyssey, released_in,1968)
…(the_shining, directed_by,stanley_kubrick)
…(AI:artificial_intelligence,written_by,stanley_kubrick)
Subject Relation Object
KnowledgeBase
KnowledgeBase?
29
Incomplete! TooChallenging!
CombineusingMemoryNetworks?
TextualKnowledge?(2001:a_space_odyssey,directed_by,stanley_kubrick)
(fear_and_dark, directed_by,stanley_kubrick)
…(fear_and_dark, released_in,1953)
(full_metal_jacket, released_in, 1987)
…(2001:a_space_odyssey, released_in,1968)
…(the_shining, directed_by,stanley_kubrick)
…(AI:artificial_intelligence,written_by,stanley_kubrick)
Key-ValueMemNNs forReadingDocuments
30
• StructuredMemoriesasKey-ValuePairs• RegularMemNNs havesinglevectorforeachmemory• Keymorerelatedtoquestionandvaluestoanswer
Key-ValueMemoryNetworksforDirectlyReadingDocuments,Milleret.al.,EMNLP2016
Memories = (k1, v1), (k2, v2), . . . , (kN , vN )
KeysandValuescanbeWords,Sentences,Vectorsetc.
(𝑘:Kubrick’sfirstmoviewas,𝑣: FearandDark)
KV-MemNN
31Key-ValueMemoryNetworksforDirectlyReadingDocuments,Milleret.al.,EMNLP2016
1. RetrieverelevantmemoriesusingHashingTechniques
q
(k1, v1)
(k2, v2)
(kN , vN ). . .
(kh1 , vh1)
(kh2 , vh2)
(khN , vhN ). . .
Useinvertedindex,localitysensitivehashing, somethingsensible
AllMemoriesRetrievedRelevantMemories
KV-MemNN
32Key-ValueMemoryNetworksforDirectlyReadingDocuments,Milleret.al.,EMNLP2016
2. ScoreMemory-Keys
phi = Softmax(A�Q(q) ·A�K(khi))
KeyBOWSumof
Embeddings
Dot-ProdDistributionoverMemory-Keys
3. GenerateOutput
o =X
i
phiA�V (vhi)WeightedaverageofMemory-values
pi = Softmax(uTmi)
o =X
i
pici
KV-MemNN- MultipleHops
33Key-ValueMemoryNetworksforDirectlyReadingDocuments,Milleret.al.,EMNLP2016
Inthe𝑗<= hop:
Queryrepresentation:
KeyAddressing
qj = Rj(A�Q(qj�1) + o)
GenerateResponse
phi = Softmax(A�Q(qj) ·A�K(khi))
FinalHop
a = Softmax(A�Q(qH+1) ·B�Y (yi))
KV-MemNN– Whattostoreinmemories?
34Key-ValueMemoryNetworksforDirectlyReadingDocuments,Milleret.al.,EMNLP2016
1. KBBased:
Key:(subject,relation);Value:ObjectK:(2001:a_space_odyssey,directed_by);V:stanley_kubrick
2. DocumentBased
Foreachentityindocument,extract5-wordwindowaroundit
Key:window;Value:Entity
K:screenplaywrittenbyand;V:Hampton
KV-MemNN– Experiments
35Key-ValueMemoryNetworksforDirectlyReadingDocuments,Milleret.al.,EMNLP2016
• WikiMovies Benchmark• Total100KQA-pairs• 10%fortesting
Method KB Doc
E2EMemoryNetwork 78.5 69.9
Key-ValueMemoryNetwork 93.9 76.2
KV-MemNN
36Key-ValueMemoryNetworksforDirectlyReadingDocuments,Milleret.al.,EMNLP2016
RetrieverelevantMemories
ScorerelevantMemory-Keys
GenerateOutputusingAveragedMemory-Values
GenerateResponse
KV-MemNN
37Key-ValueMemoryNetworksforDirectlyReadingDocuments,Milleret.al.,EMNLP2016
⌘
CNN:ComputerVision ::________:NLP
38Key-ValueMemoryNetworksforDirectlyReadingDocuments,Milleret.al.,EMNLP2016
RNN
DynamicMemoryNetworks– TheBeast
39AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessing,Kumaret.al.ICML2016
UseRNNs,specificallyGRUsforeverymodule
DMN
40AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessing,Kumaret.al.ICML2016
ct = GRU(wit, c
i�1t )FinalGRUOutput
for𝑡<= sentence
DMN
41AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessing,Kumaret.al.ICML2016
q = GRU(qiw, qi�1)
DMN
42AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessing,Kumaret.al.ICML2016
ei = hiTC
𝑖 = 1
hit = gitGRU(ct, h
it�1) + (1� git)h
it�1𝐻𝑜𝑝 = 𝑖
DMN
43AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessing,Kumaret.al.ICML2016
ei = hiTC
hit = gitGRU(ct, h
it�1) + (1� git)h
it�1
𝑖 = 2
𝐻𝑜𝑝 = 𝑖
DMN
44AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessing,Kumaret.al.ICML2016
mi = GRU(ei,mi�1)
DMN
45AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessing,Kumaret.al.ICML2016
yt = Softmax(W (a)↵t)
↵t = GRU([yt�1, q],↵t�1)
↵0 = mTm
DMN
46AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessing,Kumaret.al.ICML2016
HowmanyGRUswereusedwith2hops?
DMN– QualitativeResults
47AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessing,Kumaret.al.ICML2016
48
AlgorithmLearning
NeuralTuringMachine
CopyTask:ImplementtheAlgorithm
Givenalistofnumbersatinput,reproducethelistatoutput
NeuralTuringMachineLearns:1. Whattowritetomemory2. Whentowritetomemory3. Whentostopwriting4. Whichmemorycelltoreadfrom5. Howtoconvertresultofreadintofinaloutput
49
NeuralTuringMachines
50
Controller
ExternalInput ExternalOutput
ReadHeads
Memory
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
‘Blurry’
WriteHeads
NeuralTuringMachines
‘Blurry’MemoryAddressing(attimeinstant‘t’)
51
N
M
Mt
wt
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
SoftAttention(Lectures2,3,20,24)
wt(0)=0.1 wt(1)=0.2 wt(2)=0.5 wt(3)=0.1 wt(4)=0.1
NeuralTuringMachines
Moreformally,
BlurryReadOperation
Given:Mt(memorymatrix)ofsizeNxMwt (weightvector)oflengthNt(timeindex)
52NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachines:BlurryWrites
BlurryWriteOperationDecomposedintoblurryerase +blurryadd
Given:Mt(memorymatrix)ofsizeNxMwt (weightvector)oflengthNt(timeindex)et(erasevector)oflengthMat(addvector)oflengthM
53
EraseComponent AddComponentNeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachines:Erase
54
5 7 9 2 12
11 6 3 1 2
3 7 3 10 6
4 2 5 9 9
3 5 12 8 4
M0
w1(0)=0.1 w1(1)=0.2 w1(2)=0.5 w1(3)=0.1 w1(4)=0.1
1.0
0.7
0.2
0.5
0.0
e1
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
1xN Mx1
NeuralTuringMachines:Erase
55
4.5 5.6 4.5 1.8 10.8
10.23 5.16 1.95 0.93 1.86
2.94 6.72 2.7 9.8 5.88
3.8 1.8 3.75 8.55 8.55
3 5 12 8 4
w1(0)=0.1 w1(1)=0.2 w1(2)=0.5 w1(3)=0.1 w1(4)=0.1
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachines:Addition
56
4.5 5.6 4.5 1.8 10.8
10.23 5.16 1.95 0.93 1.86
2.94 6.72 2.7 9.8 5.88
3.8 1.8 3.75 8.55 8.55
3 5 12 8 4
w1(0)=0.1 w1(1)=0.2 w1(2)=0.5 w1(3)=0.1 w1(4)=0.1
3
4
-2
0
2
a1
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachines:BlurryWrites
57
4.8 6.2 6 2.1 11.1
10.63 5.96 3.95 1.33 2.26
2.74 6.32 1.7 9.6 5.68
3.8 1.8 3.75 8.55 8.55
3.2 5.4 13 8.2 4.2
M1
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachines:Demo
58
Demonstration:TrainingonCopyTask
FigurefromSnipsAI'sMediumPost
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachines:AttentionModel
59NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
Generatingwt
ContentBased
Example:QATask
- ScoresentencesbysimilaritywithQuestion
- Weightsassoftmax ofsimilarityscores
LocationBased
Example:CopyTask
- Movetoaddress(i+1)afterwritingtoindex(i)
- Weights≈Transitionprobabilities
NeuralTuringMachine:AttentionModel
60
CA
Stepsforgeneratingwt1. ContentAddressing2. Peaking3. Interpolation4. ConvolutionalShift(Location
Addressing)5. Sharpening
I
CS
S
Prev.State
ControllerOutputs
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachine:AttentionModel
Prev.State
ControllerOutputs
61
:Vector(lengthM)producedbyController
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachine:AttentionModel
62
CA
Step1:ContentAddressing(CA)
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
Prev.State
ControllerOutputs
NeuralTuringMachine:AttentionModel
63
CA
Step2:PeakingPrev.State
ControllerOutputs
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachine:AttentionModel
64
CA
Step3:Interpolation(I)
I
Prev.State
ControllerOutputs
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachine:AttentionModel
65
CA
Step4:ConvolutionalShift(CS)• Controlleroutputs,anormalized
distributionoverallNpossibleshifts• Rotation-shiftedweightscomputedas:
I
CS
Prev.State
ControllerOutputs
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachine:AttentionModel
66
CA
Step5:Sharpening(S)• Usestosharpenas:
I
CS
S
Prev.State
ControllerOutputs
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachine:ControllerDesign
67
• Feed-forward:faster,moretransparency&interpretabilityaboutfunctionlearnt
• LSTM:moreexpressivepower,doesn’tlimitthenumberofcomputationspertimestep
Bothareend-to-enddifferentiable!1. Reading/Writing->ConvexSums2. wt generation->Smooth3. ControllerNetworks
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachine:NetworkOverview
68
UnrolledFeed-forwardController
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
FigurefromSnipsAI'sMediumPost
NeuralTuringMachinesvs.MemNNs
MemNNs•Memoryisstatic,withfocusonretrieving(reading)informationfrommemory
NTMs•Memoryiscontinuouslywrittentoandreadfrom,withnetworklearningwhentoperformmemoryreadandwrite
69
NeuralTuringMachines:Experiments
TaskNetworkSize NumberofParameters
NTMw/LSTM* LSTM NTMw/LSTM LSTM
Copy 3x100 3x256 67K 1.3M
RepeatCopy 3x100 3x512 66K 5.3M
Associative 3x100 3x256 70K 1.3M
N-grams 3x100 3x128 61K 330K
PrioritySort 2x100 3x128 269K 385K
70NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachines:‘Copy’LearningCurve
Trainedon8-bitsequences,1<=sequencelength<=20
71NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachines:‘Copy’Performance
72
LSTM
NTM
NeuralTuringMachines,Graveset.al.,arXiv:1410.5401
NeuralTuringMachinestriggeredanoutbreakofMemory
Architectures!
73
StackAugmentedRecurrentNetworks
Learnalgorithmsbasedonstackimplementations(e.g.learningfixedsequencegenerators)
Usesastackdatastructuretostorememory(asopposedtoamemorymatrix)
74InferringAlgorithmicPatternswithStack-AugmentedRecurrentNets,Joulin et.al.,arXiv:1503.01007
DynamicNeuralTuringMachines
Experimentedwithaddressingschemes• DynamicAddresses:Addressesofmemorylocationslearntintraining– allowsnon-linearlocation-basedaddressing
• Leastrecentlyusedweighting:Preferleastrecentlyusedmemorylocations+interpolatewithcontent-basedaddressing
• DiscreteAddressing:Samplethememorylocationfromthecontent-baseddistributiontoobtainaone-hotaddress
• Multi-stepAddressing:Allowsmultiplehopsovermemory
Results:bAbI QATask
75DynamicNeuralTuringMachinewithSoftandHardAddressingSchemes,Gulchere et.al.,arXiv:1607.00036
Location NTM Content NTM Soft DNTM DiscreteDNTM
1-step 31.4% 33.6% 29.5% 27.9%
3-step 32.8% 32.7% 24.2% 21.7%
StackAugmentedRecurrentNetworks
•Blurry‘push’and‘pop’onstack.E.g.:
• Someresults:
76InferringAlgorithmicPatternswithStack-AugmentedRecurrentNets,Joulin et.al.,arXiv:1503.01007
DifferentiableNeuralComputers
Advancedaddressingmechanisms:
• ContentBasedAddressing
• TemporalAddressing• Maintainsnotionofsequenceinaddressing• TemporalLinkMatrixL(sizeNxN),L[i,j]=degreetowhichlocation Iwaswrittentoafterlocation j.
• UsageBasedAddressing
77Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
DNC:UsageBasedAddressing
•Writingincreasesusageofcell,readingdecreasesusageofcell
• Leastusedlocationhashighestusage-basedweighting
• Interpolateb/wusage&contentbasedweightsforfinalwriteweights
78Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
DNC:Example
79Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
DNC:ImprovementsoverNTMs
NTM
• Largecontiguousblocksofmemoryneeded
• Nowaytofreeupmemorycellsafterwriting
DNC
•Memorylocationsnon-contiguous,usage-based
• Regularde-allotmentbasedonusage-tracking
80Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
DNC:Experiments
GraphTasks
GraphRepresentation:(source,edge,destination)tuples
Typesoftasks:
- Traversal:Performwalkongraphgivensource,listofedges
- ShortestPath:Givensource,destination
- Inference:Givensource,relationoveredges;finddestination
81Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
DNC:Experiments
GraphTasks
Trainingover3phases:
• Graphdescriptionphase:(source,edge,destination)tuplesfedintothegraph
• Queryphase:Shortestpath(source,____,destination),Inference(source,hybridrelation,___),Traversal(source,relation,relation…,___)
• Answerphase:Targetresponsesprovidedatoutput
Trainedonrandomgraphsofmaximumsize1000
82Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
DNC:Experiments
GraphTasks:LondonUnderground
83Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
DNC:Experiments
GraphTasks:LondonUnderground
84Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
InputPhase
DNC:Experiments
GraphTasks:LondonUnderground
85Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
TraversalTask
QueryPhase AnswerPhase
DNC:Experiments
GraphTasks:LondonUnderground
86Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
ShortestPathTask
QueryPhase AnswerPhase
DNC:Experiments
87Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
GraphTasks:Freya’sFamilyTree
Conclusion
•MachineLearningmodelsrequirememoryandmulti-hopreasoningtoperformAItasksbetter
•MemoryNetworksforTextareaninterestingdirectionbutverysimple
• Genericarchitectureswithmemory,suchasNeuralTuringMachine,limitedapplications shown
• FuturedirectionsshouldbefocusingonapplyinggenericneuralmodelswithmemorytomoreAITasks.
88Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory,Graveset.al.,Naturevol.538
ReadingList
• KarolKurach,MarcinAndrychowicz &IlyaSutskever NeuralRandom-AccessMachines,ICLR,2016
• EmilioParisotto &Ruslan Salakhutdinov NeuralMap:StructuredMemoryforDeepReinforcementLearning,ArXiv,2017
• Pritzel et.al. NeuralEpisodicControl,ArXiv,2017• OriolVinyals,Meire Fortunato, Navdeep Jaitly PointerNetworks,ArXiv,2017
• JackWRaeetal.,ScalingMemory-AugmentedNeuralNetworkswithSparseReadsandWrites,ArXiv 2016
• AntoineBordes, Y-LanBoureau, JasonWeston,LearningEnd-to-EndGoal-OrientedDialog,ICLR2017
• JunhyukOh, ValliappaChockalingam, SatinderSingh, HonglakLee,ControlofMemory,ActivePerception,andActioninMinecraft,ICML2016
• WojciechZaremba, IlyaSutskever,ReinforcementLearningNeuralTuringMachines,ArXiv 2016
89