variational autoencoders write poetryllcao.net/cu-deeplearning17/pp/class7_ elsbeth_fei-tzin.pdf ·...

VariationalAutoencoders Write Poetry

(Generating Sentences from a Continuous Space)

Elsbeth Turcan andFei-TzinLee

PaperbySamBowman,LukeVilnis etal.

Motivation

– Generativemodelsfornaturallanguagesentences

– Machinetranslation

– Imagecaptioning

– Datasetsummarization

– Chatbots

– Etc.

– Wanttocapturehigh-levelfeaturesoftextsuchastopicandstyleandkeepthemconsistentwhengeneratingtext

Related work - RNNLM

– InthewordsofBowmanetal.,“AstandardRNNlanguagemodelpredictseachwordofasentenceconditionedonthepreviouswordandanevolvinghiddenstate.”

– Inotherwords,itonlylooksattherelationshipsbetweenconsecutivewords,andsodoesnotcontainorobserveanyglobalfeatures

– Butwhatifwewantglobalinformation?

Other related work

– Skip-thought

– Generatesentencecodesinthestyleofwordembeddings topredictcontextsentences

– Paragraphvector

– Avectorrepresenting theparagraphisincorporatedintosingle-wordembeddings

Autoencoders

– TypicallycomposedoftwoRNNs

– ThefirstRNNencodesasentenceintoanintermediatevector

– ThesecondRNNdecodestheintermediaterepresentationbackintoasentence,ideallythesameastheinput

Variational Autoencoders (VAEs)– Regularautoencoders learnonlydiscretemappingsfrompointtopoint

– However,ifwewanttolearnholisticinformationaboutthestructureofsentences,weneedtobeabletofillsentencespacebetter

– InaVAE,wereplacethehiddenvectorz withaposteriorprobabilitydistributionq(z|x)conditionedontheinput,andsampleourlatentz fromthatdistributionateachstep

– Weensurethatthisdistributionhasatractableformbyenforcingitssimilaritytoadefinedpriordistribution,typicallysomeformofGaussian

Modified loss function

– Theregularautoencoder’s lossfunctionwouldencouragetheVAEtolearnposteriorsasclosetodiscreteaspossible– inotherwords,Gaussiansthatareclusteredextremelytightlyaroundtheirmeans

– Inordertoenforceourposterior’ssimilaritytoawell-formedGaussian,weintroduceaKLdivergencetermintoourloss,asbelow:

Reparameterization trick

– Intheoriginalformulation,theencodernetencodesthesentence intoaprobabilitydistribution(usuallyGaussian);practicallyspeaking,itencodesthesentence intotheparametersofthedistribution(i.e.µandσ)

– However,thisposeschallengesforuswhilebackpropagating:wecan’tbackpropagate overthejumpfromµandσ toz,sinceit’srandom

– Solution:extracttherandomnessfromtheGaussianbyreformulatingitasafunctionofµ,σ,andanotherseparaterandomvariable

FromStackOverflow.

Specific architecture

– Single-layerLSTMforencoderanddecoder

Issues and fixes

– Decodertoostrong,withoutanylimitationsjustdoesn’tusez atall

– Fix:KLannealing

– Fix:worddropout

Experiments – Language modeling– UsedVAEtocreatelanguagemodelsonthePennTreebankdataset,with

RNNLMasbaseline

– Task:trainanLMonthetrainingsetandhaveitdesignatethetestsetashighlyprobable

– RNNLMoutperformedtheVAEinthetraditionalsetting

– However,whenhandicapswereimposedonbothmodels(inputless decoder),theVAEwassignificantlybetterabletoovercomethem

Experiments – Imputing missing words– Task:infermissingwordsinasentencegivensomeknownwords(imputation)

– PlacetheunknownwordsattheendofthesentencefortheRNNLM

– RNNLMandVAEperformedbeamsearch(VAEdecodingbrokenintothreesteps)toproducethemostlikelywordstocompleteasentence

– Preciseevaluationoftheseresultsiscomputationallydifficult

Adversarial evaluation

– Instead,createanadversarialclassifier,trainedtodistinguishrealsentencesfromgeneratedsentences,andscorethemodelonhowwellitfoolstheadversary

– Adversarialerrorisdefinedasthegapbetweenchanceaccuracy(50%)andtherealaccuracyoftheadversary– ideallythiserrorwillbeminimized

Experiments - Other

– SeveralotherexperimentsintheappendixshowedtheVAEtobeapplicabletoavarietyoftasks

– Textclassification

– Paraphrasedetection

– Questionclassification

Analysis

– Worddropout

– Keepratetoolow:sentencestructuresuffers

– Keepratetoohigh:nocreativity,stiflesthevariation

– Effectsoncostfunctioncomponents:

Extras: sampling from the posterior and homotopies– Samplingfromtheposterior:examplesofsentencesadjacentinsentencespace

– Homotopies:linearinterpolationsinsentencespacebetweenthecodesfortwosentences

Even more homotopies

Thanks for listening!

– Anyquestions?

variational autoencoders write poetryllcao.net/cu-deeplearning17/pp/class7_ elsbeth_fei-tzin.pdf ·...

Documents

variational autoencoders: a hands-o approachto volatility

piotr mirowski - review autoencoders (deep learning) -...

semi-supervised learning using denoising autoencoders for

cortical receptive fields using deep autoencoders

predictive coding with topographic variational autoencoders

autoencoders& kernels

winner-take-all autoencoders

stacked denoising autoencoders: learning useful

designing variational autoencoders for image retrieval

stacked denoising autoencoders: learning useful...

introduction to variational autoencoders · introduction to...

variational autoencoders

structuring autoencoders...structuring autoencoders marco...

autoencoders - university at buffalo

deep convolutional autoencoders for reconstructing

hierarchy denoising recursive autoencoders for 3d scene...

constrained graph variational autoencoders for molecule...

multiresolution convolutional autoencoders

autoencoders, minimum description length and helmholtz free

item recommendation with variational autoencoders and