generative adversarial networks (gans) for discrete...
Post on 24-Sep-2020
18 Views
Preview:
TRANSCRIPT
GenerativeAdversarialNetworks(GANs)forDiscreteData
Lantao YuShanghaiJiaoTongUniversity
http://lantaoyu.comJuly 26, 2017
SelfIntroduction– Lantao Yu• Position
• Research Assistant atCSDept.ofSJTU2015-now• ApexDataandKnowledgeManagementLab• Researchondeep learning, reinforcement learning andmulti-agent systems
• Education• Third-year Undergraduate Student, Computer ScienceDept., ShanghaiJiaoTongUniversity,China,2014-now
• Contact• lantaoyu@hotmail.com• yulantao@apex.sjtu.edu.cn
ApexData&KnowledgeManagementLab
• Students• 8PhDs• 8Masters• 24Undergrads
• www.apexlab.orgProfessor YongYu A.P.WeinanZhang
• Machinelearninganddatascience• withapplicationsofrecommendersystems,computationalads,socialnetworks,crowdsourcing,urbancomputingetc.
Content
• Fundamentals- GenerativeAdversarialNetworks• Connection and difference between generating discretedata and continuous data with GANs
• Advances- GANsforDiscreteData• SeqGAN:SequenceGenerationviaGANswithPolicyGradient
• IRGAN:AMinimaxGameforUnifyingGenerativeandDiscriminativeInformationRetrievalModels
GenerativeAdversarialNetworks(GANs)[Goodfellow,I.,etal.2014.Generativeadversarialnets.InNIPS2014.]
ProblemDefinition• Givenadataset,buildamodelofthedatadistributionthatfitsthetrueone
• Traditionalobjective:maximumlikelihoodestimation(MLE)
• Checkwhetheratruedataiswithahighmassdensityofthelearnedmodel
D = {x} q(x)p(x)
max
q
1
|D|X
x2D
log q(x) ⇡ max
q
Ex⇠p(x)[log q(x)]
Problems of MLE
• Checkwhetheratruedataiswithahighmassdensityofthelearnedmodel
• Approximatedby
Training/evaluation Use (Turing Test)
• Checkwhetheramodel-generateddataisconsideredastrueaspossible
• Morestraightforwardbutitishardorimpossibletodirectlycalculate
• InconsistencyofEvaluationandUse
max
q
Ex⇠p(x)[log q(x)] max
q
Ex⇠q(x)[log p(x)]
max
q
1
|D|X
x2D
log q(x)
p(x)
Problems of MLE• Equivalent to minimizing asymmetric :
• When but , theintegrandinsidetheKLgrowsquicklytoinfinity, which means MLE assigns anextremely high cost to such situation
• When but , the value inside the KLdivergence goes to 0, which means MLE pays extremely lowcost for generating fake looking samples
[Arjovsky M,Bottou L.Towardsprincipledmethodsfortraininggenerativeadversarialnetworks.arXiv,2017.]
KL(p(x)||q(x))
p(x) > 0 q(x) ! 0
p(x) ! 0 q(x) > 0
KL(p(x)||q(x)) =Z
Xp(x) log
p(x)
q(x)
dx
Problems of MLE• Exposure bias in sequential data generation:
Training InferenceWhen generating the nexttoken , sample from:yt
max
✓EY⇠ptrue
X
t
logG✓(yt|Y1:t�1) G✓(yt|Y1:t�1)
Update the model as follows:
The real prefix The guessed prefix
GenerativeAdversarialNets(GANs)
• Whatwereallywant
• Butwecannotdirectlycalculate p(x)p(x)
• Idea:whatifwebuildadiscriminatortojudgewhetheradatainstanceistrueorfake(artificiallygenerated)?
• Leveragethestrongpowerofdeeplearningbaseddiscriminativemodels
max
q
Ex⇠q(x)[log p(x)]
GenerativeAdversarialNets(GANs)
• Discriminatortriestocorrectlydistinguishthetruedataandthefakemodel-generateddata
• Generatortriestogeneratehigh-qualitydatatofooldiscriminator
• G&Dcanbeimplementedvianeuralnetworks• Ideally,whenDcannotdistinguishthetrueandgenerateddata,Gnicelyfitsthetrueunderlyingdatadistribution
GD
RealWorld
Generator
Discriminator
Data
GANs:AMinimaxGame• The most general form:
ProbabilityDensityFunction
min
G
max
D
V (D,G)
= Ex⇠pdata(x)[logD(x)] + E
x⇠G(x)[log(1�D(x))]
=
Z
x
pdata(x) · log(D(x)) +G(x) · log(1�D(x))dx
Generatingcontinuousdata
• Thegenerativemodelisadifferentiable mappingfromthepriornoisespacetodataspace
• Firstsamplefromasimpleprior,thenapplyadeterministicfunction
• NoexplicitProbabilityDensityFunctionfordata
z ⇠ p(z)G : Z ! X
x
• WithoutexplicitP.D.F.,wecanrewritetheminimaxgameas:
• The most general form:min
G
max
D
V (D,G) = Ex⇠pdata(x)[logD(x)] + E
x⇠G(x)[log(1�D(x))]
=
Z
x
pdata(x) · log(D(x)) +G(x) · log(1�D(x))dx
ProbabilityDensityFunction
min
G
max
D
V (D,G)
= Ex⇠pdata(x)[logD(x)] + E
z⇠pz(z)[log(1�D(G(z)))]
=
Z
x
pdata(x) · log(D(x))dx+
Z
z
p
z
(z) · log(1�D(G(z)))dx
GANsforcontinuousdata
Directlyoptimizethedifferentiablemapping!
• Inordertotakegradientonthegeneratorparameter,x hastobecontinuous
xx
zz
pp
1.Generation
2.Discrimination3.Gradientongenerateddata
4.Furthergradientongenerator
minGmaxDJ (D )min
GmaxDJ (D ) max
DJ (D )max
DJ (D )Generator Discriminator
P (True|x) = D(x;�(D))
x = G(z; ✓(G))
rL(p)
rx
rL(p)
rx
rx
r✓
(G)
J
(D)= E
x⇠pdata(x)[logD(x)] + Ez⇠pz(z)[log(1�D(G(z)))]
GANsforcontinuousdata
GANsforcontinuousdata
Discriminator
Data
Generator
min
G
max
D
V (D,G)
= Ex⇠pdata(x)[logD(x)] + E
z⇠pz(z)[log(1�D(G(z)))]
=
Z
x
pdata(x) · log(D(x))dx+
Z
z
p
z
(z) · log(1�D(G(z)))dx
Directlyoptimizethedifferentiablemapping!
IdealFinalEquilibrium
• Generatorgeneratesperfectdatadistribution
• Discriminatorcannotdistinguishthetrueandgenerateddata
TrainingGANsTrainingdiscriminator
TrainingGANs
Traininggenerator
OptimalStrategyforDiscriminator
• Optimalforany andisalways Discriminator
Data
Generator
D(x)pdata(x) pG(x)
D(x) =pdata(x)
pdata(x) + pG(x)
ReformulatetheMinimaxGame
issomethingbetween and
[Huszár, Ferenc. "How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?." arXiv (2015).]
J
(D)= E
x⇠pdata(x)[logD(x)] + Ez⇠pz(z)[log(1�D(G(z)))]
= Ex⇠pdata(x)[logD(x)] + E
x⇠pG(x)[log(1�D(x))]
= Ex⇠pdata(x) log
pdata(x)
pdata(x) + p
G
(x)
+ Ex⇠pG(x) log
p
G
(x)
pdata(x) + p
G
(x)
= log(4) +KL(pdata||pdata + p
G
2
) +KL(p
G
||pdata + p
G
2
)
CaseStudy
Generatingdiscretedata
• Thegenerativemodelcomputesaprobabilitydistributionoverthecandidatechoices
• FirstcomputestheProbabilityDensityFunctionofdata(e.g.softmax overthevocabulary),thensamplefromit.
x
• WithexplicitP.D.F.wecansimplystartwiththe mostgeneral formoftheminimaxgame:
• Now,insteadofoptimizingatransformation,wecandirectlyoptimizetheP.D.F.withtheguidanceofthediscriminator.
• Notethatevenfordiscretedata,isdifferentiable!
GANsfordiscretedata
ProbabilityDensityFunction
min
G
max
D
V (D,G)
= Ex⇠pdata(x)[logD(x)] + E
x⇠G(x)[log(1�D(x))]
=
Z
x
pdata(x) · log(D(x)) +G(x) · log(1�D(x))dx
G(x)
GANsforDiscreteData• Wecoulddirectbuildaparametricdistributionfordiscretedata
• Forexampleofthediscretedata
• ThedataP.D.F.couldbedefinedas
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
A1 A2 A3 A4 A5
Ai Probability
wherethescoringfunctioncouldbedefinedbasedondomainknowledge,e.g.,aneuralnetworkwithAiembeddingastheinput
{A1;A2;A3;A4;A5}
p(Ai) =ef(Ai)
Pj e
f(Aj)
BorrowtheIdeafromRL• Foragenerator• Intuition
• lowertheprobabilityofthechoicethatleadstolowvalue/reward• highertheprobabilityofthechoicethatleadstohighvalue/reward
• Theone-field5-categoryexample
0
0.1
0.2
0.3
A1 A2 A3 A4 A5
Ai Probability
0
0.1
0.2
0.3
0.4
A1 A2 A3 A4 A5
Ai Probability
0
0.1
0.2
0.3
0.4
A1 A2 A3 A4 A5
Ai Probability
2.GenerateA2Observepositivereward(fromD)
4.GenerateA3Observenegativereward(fromD)
1.Initializeθ 3.Updateθ bypolicygradient 5.Updateθ bypolicygradient
G(Ai) = P (Ai; ✓(G))
Advances:GANsforDiscreteData• Lantao Yu,WeinanZhang,JunWang,YongYu.SeqGAN:Sequence
GenerativeAdversarialNetswithPolicyGradient. AAAI2017.• JunWang,Lantao Yu,WeinanZhang,YuGong,Yinghui Xu,Benyou
Wang,PengZhangandDellZhang.IRGAN:AMinimaxGameforUnifyingGenerativeandDiscriminativeInformationRetrievalModels.SIGIR2017.
SeqGAN:SequenceGenerationviaGANswithPolicyGradient[Lantao Yu,WeinanZhang,JunWang,YongYu.SeqGAN:SequenceGenerativeAdversarialNetswithPolicyGradient.AAAI2017.]
ProblemforDiscreteData• Oncontinuousdata,thereisdirectgradient
• Guidethegeneratorto(slightly)modifytheoutput
• Nodirectgradientondiscretedata• Textgenerationexample
• “床前明月光”• “Icaughtapenguininthepark”• FromIanGoodfellow:“Ifyououtputtheword‘penguin’,youcan'tchangethatto"penguin+.001"onthenextstep,becausethereisnosuchwordas"penguin+.001".Youhavetogoallthewayfrom"penguin"to"ostrich".”
P (t rue)P (t rue)
G
D
[https://www.reddit.com/r/MachineLearning/comments/40ldq6/generative_adversarial_networks_for_text/]
r✓(G)
1
|m|
mX
i=1
log(1�D(G(z(i))))
SeqGAN
• Generatorisareinforcementlearningpolicyofgeneratingasequence
• decidethenextwordtogenerategiventhepreviousones• Discriminatorprovidesthereward(i.e.theprobabilityofbeingtruedata)forthewholesequence
G(yt|Y1:t�1)
D(Y n1:T )
SequenceGenerator• Objective:tomaximizetheexpectedreward
• State-actionvaluefunction istheexpectedaccumulativerewardthat
• Startfromstates• Takingaction a• Andfollowingpolicy Guntiltheend
• Rewardisonlyoncompletedsequence(noimmediatereward)
J(✓) = E[RT |s0, ✓] =X
y12YG✓(y1|s0) ·QG✓
D�(s0, y1)
QG✓D�
(s, a)
QG✓D�
(s = Y1:T�1, a = yT ) = D�(Y1:T )
State-ActionValueSetting• Rewardisonlyoncompletedsequence
• Noimmediatereward• Thenthelast-stepstate-actionvalue
• Forintermediatestate-actionvalue• UseMonteCarlosearchtoestimate
• Followingaroll-outpolicy GG
QG✓D�
(s = Y1:T�1, a = yT ) = D�(Y1:T )
QG✓D�
(s = Y1:t�1, a = yt) =⇢
1N
PNn=1 D�(Y n
1:T ), Y n1:T 2 MC
G�(Y1:t;N) for t < T
D�(Y1:t) for t = T ,
�Y 11:T , . . . , Y
N1:T
= MCG� (Y1:t;N)
TrainingSequenceGenerator• Policygradient(REINFORCE)
[RichardSuttonetal.PolicyGradientMethodsforReinforcementLearningwithFunctionApproximation.NIPS1999.]
✓ ✓ + ↵hr✓J(✓)
r✓J(✓) =TX
t=1
EY1:t�1⇠G✓ [
X
yt2Yr✓G✓(yt|Y1:t�1) ·QG✓
D�(Y1:t�1, yt)]
'TX
t=1
X
yt2Yr✓G✓(yt|Y1:t�1) ·QG✓
D�(Y1:t�1, yt)
=
TX
t=1
X
yt2YG✓(yt|Y1:t�1)r✓ logG✓(yt|Y1:t�1) ·QG✓
D�(Y1:t�1, yt)
=
TX
t=1
Eyt⇠G✓(yt|Y1:t�1)[r✓ logG✓(yt|Y1:t�1) ·QG✓D�
(Y1:t�1, yt)],
TrainingSequenceDiscriminator• Objective:standardbi-classification
min
��EY⇠pdata [logD�(Y )]� EY⇠G✓ [log(1�D�(Y ))]
OverallAlgorithm
SequenceGeneratorModel
• RNNwithLSTMcells
[Hochreiter,S.,andSchmidhuber,J.1997.Longshort-termmemory.Neuralcomputation9(8):1735–1780.]
Shanghai is incredibly
is incrediblySoftmax samplingovervocabulary
?
SequenceDiscriminatorModel
[Kim,Y.2014.Convolutionalneuralnetworksforsentenceclassification.EMNLP2014.]
ExperimentsonSyntheticData• EvaluationmeasurewithOracle
• An oracle model (e.g. the randomly initialized LSTM)• Firstly, the oracle model produces some sequences astraining data for the generative model
• Secondly the oracle model can be considered as thehuman observer to accurately evaluate the perceptualquality of the generative model
NLL
oracle
= �EY1:T⇠G✓
h TX
t=1
logGoracle
(yt|Y1:t�1
)
i
ExperimentsonSyntheticData• EvaluationmeasurewithOracle
NLL
oracle
= �EY1:T⇠G✓
h TX
t=1
logGoracle
(yt|Y1:t�1
)
i
ExperimentsonReal-WorldData• Chinesepoemgeneration
• Obamapoliticalspeechtextgeneration
• Midimusicgeneration
ExperimentsonReal-WorldData• Chinesepoemgeneration
南陌春风早,东邻去日斜。
紫陌追随日,青门相见时。
胡风不开花,四气多作雪。
山夜有雪寒,桂里逢客时。
此时人且饮,酒愁一节梦。
四面客归路,桂花开青竹。
Canyoudistinguishwhichpartisfromhumanormachine?
ExperimentsonReal-WorldData• Chinesepoemgeneration
南陌春风早,东邻去日斜。
紫陌追随日,青门相见时。
胡风不开花,四气多作雪。
山夜有雪寒,桂里逢客时。
此时人且饮,酒愁一节梦。
四面客归路,桂花开青竹。
Human Machine
ObamaSpeechTextGeneration• i stoodheretodayi haveoneandmostimportantthingthatnotonviolencethroughoutthehorizonisOTHERSamericanfireandOTHERSbutweneedyouareastrongsource
• forthisbusinessleadershipwillremembernowi cantaffordtostartwithjustthewayoureuropean supportfortherightthingtoprotectthoseamericanstoryfromtheworldand
• i wanttoacknowledgeyouweregoingtobeanoutstandingjobtimesforstudentmedicaleducationandwarmtherepublicanswholikemytimesifhesaidisthatbroughtthe
• WhenhewastoldofthisextraordinaryhonorthathewasthemosttrustedmaninAmerica
• ButwealsorememberandcelebratethejournalismthatWalterpracticed-- astandardofhonestyandintegrityandresponsibilitytowhichsomanyofyouhavecommittedyourcareers.It'sastandardthat'salittlebithardertofindtoday
• Iamhonoredtobeheretopaytributetothelifeandtimesofthemanwhochronicledourtime.
Human Machine
IRGAN:AMinimaxGameforInformationRetrievalJunWang,Lantao Yu,WeinanZhang,YuGong,Yinghui Xu,Benyou Wang,PengZhangandDellZhang.IRGAN:AMinimaxGameforUnifyingGenerativeandDiscriminativeInformationRetrievalModels.SIGIR2017.
Two schools of thinking in IR modelingGenerative Retrieval Discriminative Retrieval
Query1
document1
document2
document3
document4
(Query1, document1)
relevance
• Assumethereisanunderlyingstochasticgenerativeprocessbetweendocumentsandqueries
• Generate/Select relevant documentsgiven a query
• Learnsfromlabeledrelevantjudgments
• Predict therelevancegivenaquery-documentpair
ThreeparadigmsinLearningtoRank(LTR)
• Pointwise: learntoapproximatetherelevanceestimationofeachdocumenttothehumanrating
• Pairwise:distinguishthemore-relevantdocumentfromadocumentpair
• Listwise:learntooptimise the(smoothed)lossfunctiondefinedoverthewholerankinglistforeachquery
IRGAN:Aminimaxgameunifyingbothmodels
• Takeadvantageofbothschoolsofthinking:• Thegenerativemodellearnstofittherelevancedistributionoverdocumentsviathesignalfromthediscriminativemodel.
• Thediscriminativemodelisabletoexploittheunlabeleddataselectedbythegenerativemodeltoachieveabetterestimationfordocumentranking.
IRGANFormulation
• Theunderlyingtruerelevancedistributiondepictstheuser’s relevancepreferencedistributionoverthecandidatedocumentswithrespecttohissubmittedquery
• Trainingset:Asetofsamplesfrom• Generativeretrievalmodel
• Goal:approximatethetruerelevancedistribution
• Discriminativeretrievalmodel• Goal:distinguishbetweenrelevantdocumentsandnon-relevantdocuments
ptrue(d|q, r)
ptrue(d|q, r)p✓(d|q, r)
f�(q, d)
IRGANFormulation
• OverallObjective
JG⇤,D⇤= min
✓max
�
NX
n=1
⇣Ed⇠ptrue(d|qn,r) [logD(d|qn)] +
Ed⇠p✓(d|qn,r) [log(1�D(d|qn))]⌘
D(d|q) = �(f�(d, q)) =exp(f�(d, q))
1 + exp(f�(d, q))where
IRGANFormulation
• OptimizingDiscriminativeRetrieval
�⇤= argmax
�
NX
n=1
⇣Ed⇠ptrue(d|qn,r) [log(�(f�(d, qn))] +
Ed⇠p✓⇤ (d|qn,r) [log(1� �(f�(d, qn)))]⌘
IRGANFormulation
• OptimizingGenerativeRetrieval• Samplesdocumentsfromthewholedocumentsettofoolitsopponent
• REINFORCE(AdvantageFunction)
RewardTerm
IRGANFormulation
• Algorithm
IRGANFormulation
• ExtensiontoPairwiseCase• It is common that the dataset is a set of ordereddocument pairs for each query rather than a set ofrelevant documents.
• Capture relativepreferencejudgements rather thanabsoluterelevancejudgements
• Now, for each query ,we have asetoflabelleddocumentpairs
qnRn = {hdi, dji|di � dj}
IRGANFormulation
• ExtensiontoPairwiseCase• Discriminator would try to predict if a document pair iscorrectly ranked, which can be implemented as manypairwise ranking loss function:
• RankNet:• Ranking SVM (Hinge Loss):• RankBoost:
z = f�(du, q)� f�(dv, q)
(1� z)+
log(1 + exp(�z))
exp(�z)
where
IRGANFormulation
• ExtensiontoPairwiseCase• Generator wouldtrytogeneratedocumentpairsthatare similartothosein ,i.e.,withthecorrectranking.
• A softmax function over the Cartesian Product of thedocument sets, where the logits is the advantage ofover in a document pair
Rn
didj (di, dj)
An Intuitive Explanation of IRGAN
An Intuitive Explanation of IRGAN
• Thegenerativeretrieval modelisguidedbythesignalprovidedfromthediscriminative retrievalmodel,whichmakesitmorefavorablethanthenon-learning methodsortheMaximumLikelihoodEstimation (MLE)scheme.
• Thediscriminativeretrievalmodelcouldbeenhancedtobetter ranktopdocumentsviaastrategicnegativesamplingfromthe generator.
Experiments: Web Search
Experiments: Web Search
Experiments: Item Recommendation
Experiments: Item Recommendation
Experiments: Question Answering
SummaryofIRGAN• We proposed IRGANframeworkthatunifiestwoschoolsofinformationretrievalmethodologies, viaadversarialtraininginaminimax game, which takesadvantage of both schools of thinking.
• Significant performance gains were observed inthree typical information retrieval tasks.
• Experiments suggest that different equilibriacouldbereachedintheenddependingonthetasks andsettings.
SummaryofThisTalk• GenerativeAdversarialNetworks(GANs)aresopopularindeepunsupervisedlearningresearch
Training/evaluation Use (Turing Test)
• Fordiscretedatageneration,onecoulddirectlydefinetheparametricdistributionandoptimizetheP.D.F.bypolicygradientmethods,e.g.REINFORCE
• GANsprovideanewtrainingframeworkclosertothetargetuseofthegenerativemodel
maxEx⇠p(x)[log q(x)] maxE
x⇠q(x)[log p(x)]
ThankYou
Lantao YuShanghaiJiaoTongUniversity
http://lantaoyu.com
• Lantao Yu,WeinanZhang,JunWang,YongYu.SeqGAN:SequenceGenerativeAdversarialNetswithPolicyGradient.AAAI2017.
• JunWang,Lantao Yu,WeinanZhang,YuGong,Yinghui Xu,Benyou Wang,PengZhangandDellZhang.IRGAN:AMinimaxGameforUnifyingGenerativeandDiscriminativeInformationRetrievalModels.SIGIR2017.
top related