variational autoencoders write poetryllcao.net/cu-deeplearning17/pp/class7_ elsbeth_fei-tzin.pdf ·...

Post on 19-Jul-2020

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

VariationalAutoencoders Write Poetry

(Generating Sentences from a Continuous Space)

Elsbeth Turcan andFei-TzinLee

PaperbySamBowman,LukeVilnis etal.

2016

Motivation

– Generativemodelsfornaturallanguagesentences

– Machinetranslation

– Imagecaptioning

– Datasetsummarization

– Chatbots

– Etc.

– Wanttocapturehigh-levelfeaturesoftextsuchastopicandstyleandkeepthemconsistentwhengeneratingtext

Related work - RNNLM

– InthewordsofBowmanetal.,“AstandardRNNlanguagemodelpredictseachwordofasentenceconditionedonthepreviouswordandanevolvinghiddenstate.”

– Inotherwords,itonlylooksattherelationshipsbetweenconsecutivewords,andsodoesnotcontainorobserveanyglobalfeatures

– Butwhatifwewantglobalinformation?

Other related work

– Skip-thought

– Generatesentencecodesinthestyleofwordembeddings topredictcontextsentences

– Paragraphvector

– Avectorrepresenting theparagraphisincorporatedintosingle-wordembeddings

Autoencoders

– TypicallycomposedoftwoRNNs

– ThefirstRNNencodesasentenceintoanintermediatevector

– ThesecondRNNdecodestheintermediaterepresentationbackintoasentence,ideallythesameastheinput

Variational Autoencoders (VAEs)– Regularautoencoders learnonlydiscretemappingsfrompointtopoint

– However,ifwewanttolearnholisticinformationaboutthestructureofsentences,weneedtobeabletofillsentencespacebetter

– InaVAE,wereplacethehiddenvectorz withaposteriorprobabilitydistributionq(z|x)conditionedontheinput,andsampleourlatentz fromthatdistributionateachstep

– Weensurethatthisdistributionhasatractableformbyenforcingitssimilaritytoadefinedpriordistribution,typicallysomeformofGaussian

Modified loss function

– Theregularautoencoder’s lossfunctionwouldencouragetheVAEtolearnposteriorsasclosetodiscreteaspossible– inotherwords,Gaussiansthatareclusteredextremelytightlyaroundtheirmeans

– Inordertoenforceourposterior’ssimilaritytoawell-formedGaussian,weintroduceaKLdivergencetermintoourloss,asbelow:

Reparameterization trick

– Intheoriginalformulation,theencodernetencodesthesentence intoaprobabilitydistribution(usuallyGaussian);practicallyspeaking,itencodesthesentence intotheparametersofthedistribution(i.e.µandσ)

– However,thisposeschallengesforuswhilebackpropagating:wecan’tbackpropagate overthejumpfromµandσ toz,sinceit’srandom

– Solution:extracttherandomnessfromtheGaussianbyreformulatingitasafunctionofµ,σ,andanotherseparaterandomvariable

FromStackOverflow.

Specific architecture

– Single-layerLSTMforencoderanddecoder

Issues and fixes

– Decodertoostrong,withoutanylimitationsjustdoesn’tusez atall

– Fix:KLannealing

– Fix:worddropout

Experiments – Language modeling– UsedVAEtocreatelanguagemodelsonthePennTreebankdataset,with

RNNLMasbaseline

– Task:trainanLMonthetrainingsetandhaveitdesignatethetestsetashighlyprobable

– RNNLMoutperformedtheVAEinthetraditionalsetting

– However,whenhandicapswereimposedonbothmodels(inputless decoder),theVAEwassignificantlybetterabletoovercomethem

Experiments – Imputing missing words– Task:infermissingwordsinasentencegivensomeknownwords(imputation)

– PlacetheunknownwordsattheendofthesentencefortheRNNLM

– RNNLMandVAEperformedbeamsearch(VAEdecodingbrokenintothreesteps)toproducethemostlikelywordstocompleteasentence

– Preciseevaluationoftheseresultsiscomputationallydifficult

Adversarial evaluation

– Instead,createanadversarialclassifier,trainedtodistinguishrealsentencesfromgeneratedsentences,andscorethemodelonhowwellitfoolstheadversary

– Adversarialerrorisdefinedasthegapbetweenchanceaccuracy(50%)andtherealaccuracyoftheadversary– ideallythiserrorwillbeminimized

Experiments - Other

– SeveralotherexperimentsintheappendixshowedtheVAEtobeapplicabletoavarietyoftasks

– Textclassification

– Paraphrasedetection

– Questionclassification

Analysis

– Worddropout

– Keepratetoolow:sentencestructuresuffers

– Keepratetoohigh:nocreativity,stiflesthevariation

– Effectsoncostfunctioncomponents:

Extras: sampling from the posterior and homotopies– Samplingfromtheposterior:examplesofsentencesadjacentinsentencespace

– Homotopies:linearinterpolationsinsentencespacebetweenthecodesfortwosentences

Even more homotopies

Thanks for listening!

– Anyquestions?

top related