multi-objective training of generative adversarial
TRANSCRIPT
![Page 1: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/1.jpg)
Multi-objective training of Generative AdversarialNetworks with multiple discriminators
Isabela Albuquerque∗, Joao Monteiro∗, Thang Doan, BreandanConsidine, Tiago Falk, and Ioannis Mitliagkas
∗Equal contribution
1 / 11
![Page 2: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/2.jpg)
The multiple discriminators GAN setting
I Recent literature proposed to tackle GANs training instability*issues with multiple discriminators (Ds)
1. Generative multi-adversarial networks, Durugkar et al. (2016)2. Stabilizing GANs training with multiple random projections,
Neyshabur et al. (2017)3. Online Adaptative Curriculum Learning for GANs, Doan et al.
(2018)4. Domain Partitioning Network, Csaba et al. (2019)
*Mode-collapse or vanishing gradients
2 / 11
![Page 3: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/3.jpg)
The multiple discriminators GAN setting
3 / 11
![Page 4: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/4.jpg)
Our work
4 / 11
![Page 5: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/5.jpg)
Our work
minLG (z) = [l1(z), l2(z), ..., lK (z)]T
I Each lk = −Ez∼pz logDk(G (z)) is the loss provided by thek-th discriminator
4 / 11
![Page 6: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/6.jpg)
Our work
minLG (z) = [l1(z), l2(z), ..., lK (z)]T
I Multiple gradient descent (MGD) is a natural choice to solvethis problem
I But it might be too costly
I Alternative: maximize the hypervolume (HV) of a singlesolution
4 / 11
![Page 7: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/7.jpg)
Multiple gradient descent
I Seeks a Pareto-stationary solutionI Two steps:
1. Find a common descent direction ∀lk1.1 Minimum norm element within the convex hull of all ∇lk(x)
2. Update the parameters with xt+1 = xt − λ w∗t
||w∗t ||
, where
w∗t = argmin||w||2, w =
K∑k=1
αk∇lk(xt),
s.t.K∑
k=1
αk = 1, αk ≥ 0 ∀k
5 / 11
![Page 8: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/8.jpg)
Hypervolume maximization for training GANs
LD1
LD2
l1
l2
η∗
LG
η
η
6 / 11
![Page 9: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/9.jpg)
Hypervolume maximization for training GANs
LG = − log
(K∏
k=1
(η − lk)
)
LG = −K∑
k=1
log(η − lk) LD1
LD2
l1
l2
η∗
LG
η
η
∂LG∂θ
=K∑
k=1
1
η − lk
∂lk∂θ
6 / 11
![Page 10: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/10.jpg)
Hypervolume maximization for training GANs
LG = − log
(K∏
k=1
(η − lk)
)
LG = −K∑
k=1
log(η − lk) LD1
LD2
l1
l2
η∗
LG
η
η
∂LG∂θ
=K∑
k=1
1
η − lk
∂lk∂θ
ηt = δmaxk{l tk}, δ > 1
6 / 11
![Page 11: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/11.jpg)
MGD vs. HV maximization vs. Average loss minimization
I MGD seeks a Pareto-stationary solutionI xt+1 ≺ xt
I HV maximization seeks Pareto-optimal solutionsI HV(xt+1) > HV(xt)I For the single-solution case, central regions of the Pareto-front
are preferred
I Average loss minimization does not enforce equally goodindividual losses
I Might be problematic in case there is a trade-off betweendiscriminators
7 / 11
![Page 12: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/12.jpg)
MNIST
I Same architecture, hyperparameters, and initialization for allmethods
I 8 Ds, 100 epochs
I FID was calculated using a LeNet trained on MNIST until98% test accuracy
2400
2500
AVG GMAN HV MGDModel
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0
FID
- M
NIS
T
0 250 500 750 1000 1250 1500 1750Wall-clock time until best FID (minutes)
7
8
9
10
11
12
Best
FID
ach
ieve
d du
ring
trai
ning
HVGMANMGDAVG
8 / 11
![Page 13: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/13.jpg)
Upscaled CIFAR-10 - Computational cost
I Different GANs with both 1 and 24 Ds + HV
I Same architecture and initialization for all methods
I Comparison of minimum FID obtained during training, alongwith computation cost in terms of time and space
# Disc. FID-ResNet FLOPS∗ Memory
DCGAN1 4.22 8e10 1292
24 1.89 5e11 5671
LSGAN1 4.55 8e10 1303
24 1.91 5e11 5682
HingeGAN1 6.17 8e10 1303
24 2.25 5e11 5682∗Floating point operations per second
I Additional cost → performance improvement
9 / 11
![Page 14: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/14.jpg)
Cats 256× 256
10 / 11
![Page 15: Multi-objective training of Generative Adversarial](https://reader034.vdocument.in/reader034/viewer/2022052518/628c27f6e7cc44425a7efe8b/html5/thumbnails/15.jpg)
Thank you!
Questions? Come to our poster! #4
Code: https://github.com/joaomonteirof/hGAN
11 / 11