multimodal unsupervised image-to-image translation · 2019-09-22 · •multimodal unit (munit)...

Multimodal Unsupervised Image-to-Image Translation

Ming-Yu Liu

NVIDIA

Supervised/Paired/Aligned/Registered Unsupervised/Unpaired/Unaligned/Unregistered

Supervised vs UnsupervisedPaired vs UnpairedAligned vs UnalignedSupervised vs Unsupervised

Image Domain Transfer

Example Applications

Generator

……

Discriminator real/fake

Real data

Goodfellow et al. 2014

Generative Adversarial Networks (GANs)

Sample

TranslationNetwork𝑭𝟏→𝟐

……

Discriminator real/fake

Domain 2

……Domain 1

Plain GAN for Unsupervised Image-to-Image Translation

CycleGAN and UNIT

• CycleGAN (cycle consistency) [Zhu et al. 2017]

• UNIT (shared latent space) [Liu et al. 2017]

shared latent space

cycleconsistency

shared latent space ⟹ cycle consistency

Unimodality

Towards Multimodality

Cycle consistency does not allow multimodality

Domain 𝒳1 Domain 𝒳2

𝑥2′

𝑥2′′

Cycle consistency:𝒳1 → 𝒳2 →𝒳1

Cycle consistency:𝒳2 → 𝒳1 →𝒳2

𝑥2′ = 𝑥2

′′

𝑝(𝑥2|𝑥1) = 𝛿 𝑥2 − 𝑥2′

𝑝(𝑥2|𝑥1) = 𝛿(𝑥2 − 𝑥2′′)

Shared latent space does not allow multimodality

shared latent space

cycleconsistency

Disentangling the Latent Space

•UNIT • A single shared, domain-invariant latent space 𝒵

Disentangling the Latent Space

•Multimodal UNIT (MUNIT)• A content space 𝒞 that is shared, domain-invariant• Two style spaces 𝒮1, 𝒮2 that are unshared, domain-specific

Training

•Notations:• 𝑥: images• 𝑐: content• 𝑠: style

• Loss:• Bidirectional reconstruction loss

• Image reconstruction loss

• Latent reconstruction loss

• GAN lossWithin-domain reconstructionCross-domain translation

Bidirectional Reconstruction Loss:Image Reconstruction

Notations:•𝑥: images• 𝑐: content• 𝑠: style

Bidirectional Reconstruction Loss:Latent Reconstruction

GAN Loss

remove style

Background: Instance Normalization (IN)

Content feature: 𝑐normalization affine

transformation

“Instance Normalization: The Missing Ingredient for Fast Stylization”, Ulyanov et al. 2017

apply style

Feedforward transfer of a single style Content Output

Adaptive Instance Normalization (AdaIN)

Style feature: 𝑠

Content feature: 𝑐

Feedforward transfer of arbitrary styles

Content

AdaIN Decoder Output

remove style

apply style

AdaIN in a Generative Network

AdaIN in style transfer AdaIN in a generative network

AdaIN in a Generative Network

AdaIN in style transfer AdaIN in a generative network

Architectural Implementation

Sketches <-> Photo

Input Outputs

Cats ↔ Dogs

Input Outputs

Synthetic ↔ Real

Input Outputs

Summer ↔ Winter

Input Outputs

Example-guided Translation

Conclusion

• Translate one input image to multiple corresponding images in the target domain.

• Content and style decomposition via the AdaIN design

• ECCV 2018

• MUNIT code: https://github.com/nvlabs/munit/

• Paper: https://arxiv.org/abs/1804.04732

Xun HuangNVIDIA, Cornell

Serge BelongieCornell

Jan KautzNVIDIA

Ming-Yu LiuNVIDIA

multimodal unsupervised image-to-image translation · 2019-09-22 · •multimodal unit (munit)...

Documents

automatic image colorization via multimodal predictions

preview multimodal analysis image student edition

multimodal unsupervised image-to-image …translation...

multimodal named entity recognition with image attributes

mule munit

geometric feature-based multimodal image registration of...

multilabel image annotation using multimodal analysis

a multimodal interface for image...

multimodal task-driven dictionary learning for image

the method of multimodal mri brain image segmentation

installation instruction munit pole clamp...munit pole clamp...

recurrent multimodal interaction for referring image...

the multimodal brain tumor image segmentation benchmark...

toward multimodal image-to-image translation - arxiv ·...

a novel approach for multimodal medical image fusion using...

proceedings of the multimodal brain tumor image...

super-resolution line scan image sensor for multimodal...

munit junit test case

recurrent multimodal interaction for referring image

the multimodal brain tumor image segmentation benchmark