generative adversarial networks (gans) for discrete...

GenerativeAdversarialNetworks(GANs)forDiscreteData

Lantao YuShanghaiJiaoTongUniversity

http://lantaoyu.comJuly 26, 2017

SelfIntroduction– Lantao Yu• Position

• Research Assistant atCSDept.ofSJTU2015-now• ApexDataandKnowledgeManagementLab• Researchondeep learning, reinforcement learning andmulti-agent systems

• Education• Third-year Undergraduate Student, Computer ScienceDept., ShanghaiJiaoTongUniversity,China,2014-now

• Contact• lantaoyu@hotmail.com• yulantao@apex.sjtu.edu.cn

ApexData&KnowledgeManagementLab

• Students• 8PhDs• 8Masters• 24Undergrads

• www.apexlab.orgProfessor YongYu A.P.WeinanZhang

• Machinelearninganddatascience• withapplicationsofrecommendersystems,computationalads,socialnetworks,crowdsourcing,urbancomputingetc.

Content

• Fundamentals- GenerativeAdversarialNetworks• Connection and difference between generating discretedata and continuous data with GANs

• Advances- GANsforDiscreteData• SeqGAN:SequenceGenerationviaGANswithPolicyGradient

• IRGAN:AMinimaxGameforUnifyingGenerativeandDiscriminativeInformationRetrievalModels

GenerativeAdversarialNetworks(GANs)[Goodfellow,I.,etal.2014.Generativeadversarialnets.InNIPS2014.]

ProblemDefinition• Givenadataset,buildamodelofthedatadistributionthatfitsthetrueone

• Traditionalobjective:maximumlikelihoodestimation(MLE)

• Checkwhetheratruedataiswithahighmassdensityofthelearnedmodel

D = {x} q(x)p(x)

log q(x) ⇡ max

Ex⇠p(x)[log q(x)]

Problems of MLE

• Checkwhetheratruedataiswithahighmassdensityofthelearnedmodel

• Approximatedby

Training/evaluation Use (Turing Test)

• Checkwhetheramodel-generateddataisconsideredastrueaspossible

• Morestraightforwardbutitishardorimpossibletodirectlycalculate

• InconsistencyofEvaluationandUse

Ex⇠p(x)[log q(x)] max

Ex⇠q(x)[log p(x)]

log q(x)

Problems of MLE• Equivalent to minimizing asymmetric :

• When but , theintegrandinsidetheKLgrowsquicklytoinfinity, which means MLE assigns anextremely high cost to such situation

• When but , the value inside the KLdivergence goes to 0, which means MLE pays extremely lowcost for generating fake looking samples

[Arjovsky M,Bottou L.Towardsprincipledmethodsfortraininggenerativeadversarialnetworks.arXiv,2017.]

KL(p(x)||q(x))

p(x) > 0 q(x) ! 0

p(x) ! 0 q(x) > 0

KL(p(x)||q(x)) =Z

Xp(x) log

Problems of MLE• Exposure bias in sequential data generation:

Training InferenceWhen generating the nexttoken , sample from:yt

✓EY⇠ptrue

logG✓(yt|Y1:t�1) G✓(yt|Y1:t�1)

Update the model as follows:

The real prefix The guessed prefix

GenerativeAdversarialNets(GANs)

• Whatwereallywant

• Butwecannotdirectlycalculate p(x)p(x)

• Idea:whatifwebuildadiscriminatortojudgewhetheradatainstanceistrueorfake(artificiallygenerated)?

• Leveragethestrongpowerofdeeplearningbaseddiscriminativemodels

Ex⇠q(x)[log p(x)]

GenerativeAdversarialNets(GANs)

• Discriminatortriestocorrectlydistinguishthetruedataandthefakemodel-generateddata

• Generatortriestogeneratehigh-qualitydatatofooldiscriminator

• G&Dcanbeimplementedvianeuralnetworks• Ideally,whenDcannotdistinguishthetrueandgenerateddata,Gnicelyfitsthetrueunderlyingdatadistribution

RealWorld

Generator

Discriminator

GANs:AMinimaxGame• The most general form:

ProbabilityDensityFunction

V (D,G)

= Ex⇠pdata(x)[logD(x)] + E

x⇠G(x)[log(1�D(x))]

pdata(x) · log(D(x)) +G(x) · log(1�D(x))dx

Generatingcontinuousdata

• Thegenerativemodelisadifferentiable mappingfromthepriornoisespacetodataspace

• Firstsamplefromasimpleprior,thenapplyadeterministicfunction

• NoexplicitProbabilityDensityFunctionfordata

z ⇠ p(z)G : Z ! X

• WithoutexplicitP.D.F.,wecanrewritetheminimaxgameas:

• The most general form:min

V (D,G) = Ex⇠pdata(x)[logD(x)] + E

V (D,G)

z⇠pz(z)[log(1�D(G(z)))]

pdata(x) · log(D(x))dx+

(z) · log(1�D(G(z)))dx

GANsforcontinuousdata

Directlyoptimizethedifferentiablemapping!

• Inordertotakegradientonthegeneratorparameter,x hastobecontinuous

1.Generation

2.Discrimination3.Gradientongenerateddata

4.Furthergradientongenerator

minGmaxDJ (D )min

GmaxDJ (D ) max

DJ (D )max

DJ (D )Generator Discriminator

P (True|x) = D(x;�(D))

x = G(z; ✓(G))

(D)= E

x⇠pdata(x)[logD(x)] + Ez⇠pz(z)[log(1�D(G(z)))]

GANsforcontinuousdata

Discriminator

Generator

V (D,G)

z⇠pz(z)[log(1�D(G(z)))]

pdata(x) · log(D(x))dx+

(z) · log(1�D(G(z)))dx

Directlyoptimizethedifferentiablemapping!

IdealFinalEquilibrium

• Generatorgeneratesperfectdatadistribution

• Discriminatorcannotdistinguishthetrueandgenerateddata

TrainingGANsTrainingdiscriminator

TrainingGANs

Traininggenerator

OptimalStrategyforDiscriminator

• Optimalforany andisalways Discriminator

Generator

D(x)pdata(x) pG(x)

D(x) =pdata(x)

pdata(x) + pG(x)

ReformulatetheMinimaxGame

issomethingbetween and

[Huszár, Ferenc. "How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?." arXiv (2015).]

(D)= E

x⇠pdata(x)[logD(x)] + Ez⇠pz(z)[log(1�D(G(z)))]

x⇠pG(x)[log(1�D(x))]

= Ex⇠pdata(x) log

pdata(x)

pdata(x) + p

+ Ex⇠pG(x) log

pdata(x) + p

= log(4) +KL(pdata||pdata + p

) +KL(p

||pdata + p

CaseStudy

Generatingdiscretedata

• Thegenerativemodelcomputesaprobabilitydistributionoverthecandidatechoices

• FirstcomputestheProbabilityDensityFunctionofdata(e.g.softmax overthevocabulary),thensamplefromit.

• WithexplicitP.D.F.wecansimplystartwiththe mostgeneral formoftheminimaxgame:

• Now,insteadofoptimizingatransformation,wecandirectlyoptimizetheP.D.F.withtheguidanceofthediscriminator.

• Notethatevenfordiscretedata,isdifferentiable!

GANsfordiscretedata

V (D,G)

GANsforDiscreteData• Wecoulddirectbuildaparametricdistributionfordiscretedata

• Forexampleofthediscretedata

• ThedataP.D.F.couldbedefinedas

A1 A2 A3 A4 A5

Ai Probability

wherethescoringfunctioncouldbedefinedbasedondomainknowledge,e.g.,aneuralnetworkwithAiembeddingastheinput

{A1;A2;A3;A4;A5}

p(Ai) =ef(Ai)

BorrowtheIdeafromRL• Foragenerator• Intuition

• lowertheprobabilityofthechoicethatleadstolowvalue/reward• highertheprobabilityofthechoicethatleadstohighvalue/reward

• Theone-field5-categoryexample

A1 A2 A3 A4 A5

Ai Probability

A1 A2 A3 A4 A5

Ai Probability

A1 A2 A3 A4 A5

Ai Probability

2.GenerateA2Observepositivereward(fromD)

4.GenerateA3Observenegativereward(fromD)

1.Initializeθ 3.Updateθ bypolicygradient 5.Updateθ bypolicygradient

G(Ai) = P (Ai; ✓(G))

Advances:GANsforDiscreteData• Lantao Yu,WeinanZhang,JunWang,YongYu.SeqGAN:Sequence

GenerativeAdversarialNetswithPolicyGradient. AAAI2017.• JunWang,Lantao Yu,WeinanZhang,YuGong,Yinghui Xu,Benyou

Wang,PengZhangandDellZhang.IRGAN:AMinimaxGameforUnifyingGenerativeandDiscriminativeInformationRetrievalModels.SIGIR2017.

SeqGAN:SequenceGenerationviaGANswithPolicyGradient[Lantao Yu,WeinanZhang,JunWang,YongYu.SeqGAN:SequenceGenerativeAdversarialNetswithPolicyGradient.AAAI2017.]

ProblemforDiscreteData• Oncontinuousdata,thereisdirectgradient

• Guidethegeneratorto(slightly)modifytheoutput

• Nodirectgradientondiscretedata• Textgenerationexample

• “床前明月光”• “Icaughtapenguininthepark”• FromIanGoodfellow:“Ifyououtputtheword‘penguin’,youcan'tchangethatto"penguin+.001"onthenextstep,becausethereisnosuchwordas"penguin+.001".Youhavetogoallthewayfrom"penguin"to"ostrich".”

P (t rue)P (t rue)

[https://www.reddit.com/r/MachineLearning/comments/40ldq6/generative_adversarial_networks_for_text/]

r✓(G)

log(1�D(G(z(i))))

SeqGAN

• Generatorisareinforcementlearningpolicyofgeneratingasequence

• decidethenextwordtogenerategiventhepreviousones• Discriminatorprovidesthereward(i.e.theprobabilityofbeingtruedata)forthewholesequence

G(yt|Y1:t�1)

D(Y n1:T )

SequenceGenerator• Objective:tomaximizetheexpectedreward

• State-actionvaluefunction istheexpectedaccumulativerewardthat

• Startfromstates• Takingaction a• Andfollowingpolicy Guntiltheend

• Rewardisonlyoncompletedsequence(noimmediatereward)

J(✓) = E[RT |s0, ✓] =X

y12YG✓(y1|s0) ·QG✓

D�(s0, y1)

QG✓D�

(s, a)

QG✓D�

(s = Y1:T�1, a = yT ) = D�(Y1:T )

State-ActionValueSetting• Rewardisonlyoncompletedsequence

• Noimmediatereward• Thenthelast-stepstate-actionvalue

• Forintermediatestate-actionvalue• UseMonteCarlosearchtoestimate

• Followingaroll-outpolicy GG

QG✓D�

(s = Y1:T�1, a = yT ) = D�(Y1:T )

QG✓D�

(s = Y1:t�1, a = yt) =⇢

PNn=1 D�(Y n

1:T ), Y n1:T 2 MC

G�(Y1:t;N) for t < T

D�(Y1:t) for t = T ,

�Y 11:T , . . . , Y

= MCG� (Y1:t;N)

TrainingSequenceGenerator• Policygradient(REINFORCE)

[RichardSuttonetal.PolicyGradientMethodsforReinforcementLearningwithFunctionApproximation.NIPS1999.]

✓ ✓ + ↵hr✓J(✓)

r✓J(✓) =TX

EY1:t�1⇠G✓ [

yt2Yr✓G✓(yt|Y1:t�1) ·QG✓

D�(Y1:t�1, yt)]

yt2Yr✓G✓(yt|Y1:t�1) ·QG✓

D�(Y1:t�1, yt)

yt2YG✓(yt|Y1:t�1)r✓ logG✓(yt|Y1:t�1) ·QG✓

D�(Y1:t�1, yt)

Eyt⇠G✓(yt|Y1:t�1)[r✓ logG✓(yt|Y1:t�1) ·QG✓D�

(Y1:t�1, yt)],

TrainingSequenceDiscriminator• Objective:standardbi-classification

��EY⇠pdata [logD�(Y )]� EY⇠G✓ [log(1�D�(Y ))]

OverallAlgorithm

SequenceGeneratorModel

• RNNwithLSTMcells

[Hochreiter,S.,andSchmidhuber,J.1997.Longshort-termmemory.Neuralcomputation9(8):1735–1780.]

Shanghai is incredibly

is incrediblySoftmax samplingovervocabulary

SequenceDiscriminatorModel

[Kim,Y.2014.Convolutionalneuralnetworksforsentenceclassification.EMNLP2014.]

ExperimentsonSyntheticData• EvaluationmeasurewithOracle

• An oracle model (e.g. the randomly initialized LSTM)• Firstly, the oracle model produces some sequences astraining data for the generative model

• Secondly the oracle model can be considered as thehuman observer to accurately evaluate the perceptualquality of the generative model

oracle

= �EY1:T⇠G✓

logGoracle

(yt|Y1:t�1

ExperimentsonSyntheticData• EvaluationmeasurewithOracle

oracle

= �EY1:T⇠G✓

logGoracle

(yt|Y1:t�1

ExperimentsonReal-WorldData• Chinesepoemgeneration

• Obamapoliticalspeechtextgeneration

• Midimusicgeneration

南陌春风早，东邻去日斜。

紫陌追随日，青门相见时。

胡风不开花，四气多作雪。

山夜有雪寒，桂里逢客时。

此时人且饮，酒愁一节梦。

四面客归路，桂花开青竹。

Canyoudistinguishwhichpartisfromhumanormachine?

南陌春风早，东邻去日斜。

紫陌追随日，青门相见时。

胡风不开花，四气多作雪。

山夜有雪寒，桂里逢客时。

此时人且饮，酒愁一节梦。

四面客归路，桂花开青竹。

Human Machine

ObamaSpeechTextGeneration• i stoodheretodayi haveoneandmostimportantthingthatnotonviolencethroughoutthehorizonisOTHERSamericanfireandOTHERSbutweneedyouareastrongsource

• forthisbusinessleadershipwillremembernowi cantaffordtostartwithjustthewayoureuropean supportfortherightthingtoprotectthoseamericanstoryfromtheworldand

• i wanttoacknowledgeyouweregoingtobeanoutstandingjobtimesforstudentmedicaleducationandwarmtherepublicanswholikemytimesifhesaidisthatbroughtthe

• WhenhewastoldofthisextraordinaryhonorthathewasthemosttrustedmaninAmerica

• ButwealsorememberandcelebratethejournalismthatWalterpracticed-- astandardofhonestyandintegrityandresponsibilitytowhichsomanyofyouhavecommittedyourcareers.It'sastandardthat'salittlebithardertofindtoday

• Iamhonoredtobeheretopaytributetothelifeandtimesofthemanwhochronicledourtime.

Human Machine

IRGAN:AMinimaxGameforInformationRetrievalJunWang,Lantao Yu,WeinanZhang,YuGong,Yinghui Xu,Benyou Wang,PengZhangandDellZhang.IRGAN:AMinimaxGameforUnifyingGenerativeandDiscriminativeInformationRetrievalModels.SIGIR2017.

Two schools of thinking in IR modelingGenerative Retrieval Discriminative Retrieval

Query1

document1

document2

document3

document4

(Query1, document1)

relevance

• Assumethereisanunderlyingstochasticgenerativeprocessbetweendocumentsandqueries

• Generate/Select relevant documentsgiven a query

• Learnsfromlabeledrelevantjudgments

• Predict therelevancegivenaquery-documentpair

ThreeparadigmsinLearningtoRank(LTR)

• Pointwise: learntoapproximatetherelevanceestimationofeachdocumenttothehumanrating

• Pairwise:distinguishthemore-relevantdocumentfromadocumentpair

• Listwise:learntooptimise the(smoothed)lossfunctiondefinedoverthewholerankinglistforeachquery

IRGAN:Aminimaxgameunifyingbothmodels

• Takeadvantageofbothschoolsofthinking:• Thegenerativemodellearnstofittherelevancedistributionoverdocumentsviathesignalfromthediscriminativemodel.

• Thediscriminativemodelisabletoexploittheunlabeleddataselectedbythegenerativemodeltoachieveabetterestimationfordocumentranking.

IRGANFormulation

• Theunderlyingtruerelevancedistributiondepictstheuser’s relevancepreferencedistributionoverthecandidatedocumentswithrespecttohissubmittedquery

• Trainingset:Asetofsamplesfrom• Generativeretrievalmodel

• Goal:approximatethetruerelevancedistribution

• Discriminativeretrievalmodel• Goal:distinguishbetweenrelevantdocumentsandnon-relevantdocuments

ptrue(d|q, r)

ptrue(d|q, r)p✓(d|q, r)

f�(q, d)

IRGANFormulation

• OverallObjective

JG⇤,D⇤= min

✓max

⇣Ed⇠ptrue(d|qn,r) [logD(d|qn)] +

Ed⇠p✓(d|qn,r) [log(1�D(d|qn))]⌘

D(d|q) = �(f�(d, q)) =exp(f�(d, q))

1 + exp(f�(d, q))where

IRGANFormulation

• OptimizingDiscriminativeRetrieval

�⇤= argmax

⇣Ed⇠ptrue(d|qn,r) [log(�(f�(d, qn))] +

Ed⇠p✓⇤ (d|qn,r) [log(1� �(f�(d, qn)))]⌘

IRGANFormulation

• OptimizingGenerativeRetrieval• Samplesdocumentsfromthewholedocumentsettofoolitsopponent

• REINFORCE(AdvantageFunction)

RewardTerm

IRGANFormulation

• Algorithm

IRGANFormulation

• ExtensiontoPairwiseCase• It is common that the dataset is a set of ordereddocument pairs for each query rather than a set ofrelevant documents.

• Capture relativepreferencejudgements rather thanabsoluterelevancejudgements

• Now, for each query ,we have asetoflabelleddocumentpairs

qnRn = {hdi, dji|di � dj}

IRGANFormulation

• ExtensiontoPairwiseCase• Discriminator would try to predict if a document pair iscorrectly ranked, which can be implemented as manypairwise ranking loss function:

• RankNet:• Ranking SVM (Hinge Loss):• RankBoost:

z = f�(du, q)� f�(dv, q)

(1� z)+

log(1 + exp(�z))

exp(�z)

IRGANFormulation

• ExtensiontoPairwiseCase• Generator wouldtrytogeneratedocumentpairsthatare similartothosein ,i.e.,withthecorrectranking.

• A softmax function over the Cartesian Product of thedocument sets, where the logits is the advantage ofover in a document pair

didj (di, dj)

An Intuitive Explanation of IRGAN

• Thegenerativeretrieval modelisguidedbythesignalprovidedfromthediscriminative retrievalmodel,whichmakesitmorefavorablethanthenon-learning methodsortheMaximumLikelihoodEstimation (MLE)scheme.

• Thediscriminativeretrievalmodelcouldbeenhancedtobetter ranktopdocumentsviaastrategicnegativesamplingfromthe generator.

Experiments: Web Search

Experiments: Item Recommendation

Experiments: Question Answering

SummaryofIRGAN• We proposed IRGANframeworkthatunifiestwoschoolsofinformationretrievalmethodologies, viaadversarialtraininginaminimax game, which takesadvantage of both schools of thinking.

• Significant performance gains were observed inthree typical information retrieval tasks.

• Experiments suggest that different equilibriacouldbereachedintheenddependingonthetasks andsettings.

SummaryofThisTalk• GenerativeAdversarialNetworks(GANs)aresopopularindeepunsupervisedlearningresearch

Training/evaluation Use (Turing Test)

• Fordiscretedatageneration,onecoulddirectlydefinetheparametricdistributionandoptimizetheP.D.F.bypolicygradientmethods,e.g.REINFORCE

• GANsprovideanewtrainingframeworkclosertothetargetuseofthegenerativemodel

maxEx⇠p(x)[log q(x)] maxE

x⇠q(x)[log p(x)]

ThankYou

Lantao YuShanghaiJiaoTongUniversity

http://lantaoyu.com

• Lantao Yu,WeinanZhang,JunWang,YongYu.SeqGAN:SequenceGenerativeAdversarialNetswithPolicyGradient.AAAI2017.

• JunWang,Lantao Yu,WeinanZhang,YuGong,Yinghui Xu,Benyou Wang,PengZhangandDellZhang.IRGAN:AMinimaxGameforUnifyingGenerativeandDiscriminativeInformationRetrievalModels.SIGIR2017.

generative adversarial networks (gans) for discrete...

Documents

abstract - arxiv.orgstabilizing gans: a survey 2. generative...

i discriminator a g f extractor · 2020. 1. 6. · 2...

fm discriminator for ais satellite detection

generative adversarial networks (gans)

designing statistical model-based discriminator for

gans (1967)-rhineura.pdf

an all-digital --frequency discriminator of arbitrary

24007 protocol discriminator

learning integral objects with intra-class discriminator...

refinement of the nepheline discriminator: results of …

foster-seely and ratio detector (discriminator )

sept 25, 2008 phenix rpc review c.y. chi 1 rpc front end...

unsupervised cipher cracking using discrete gans

rob-gan: generator, discriminator, and adversarial...

video-to-video synthesis - developer.download.nvidia.com ·...

gans, gans & associates ha-director of operations-pd...

arxiv:1905.01164v2 [cs.cv] 4 sep 2019 · generative...

memory replay gans: learning to generate new categories...

arthotel blaue gans lookbook 2.0

nano and gans health apps2