generating images part by part with composite generative ...alanwags/dlai2016/(kwak+) ijcai-16 dlai...

1
Generating Images Part by Part with Composite Generative Adversarial Networks Hanock Kwak and Byoung - Tak Zhang Department of Computer Science and Engineering, Seoul National University, { hnkwak , btzhang }@bi.snu.ac.kr Methods Biointelligence Lab, Seoul National University | Seoul 151-744, Korea (http://bi.snu.ac.kr) Experimental Results Backgrounds Images are composed of several different objects forming a hierarchical structure with various styles and shapes. Deep learning models are used to implicitly disentangle complex underlying patterns of data, forming distributed feature representations. Generative adversarial networks (GAN) are successful unsupervised learning models that can generate samples of natural images generalized from the training data. It is proven that if the GAN has enough capacity, data distribution formed by GAN can converge to the distribution over real data Composite generative adversarial network (CGAN) can generate images part by part. CGAN uses an alpha channel for opacity along with RGB channels to stack images iteratively with alpha blending process. The alpha blending process maintains previous image in some areas and overlap the new image perfectly in other areas. Key Ideas The structure of CGAN. The images are then combined sequentially by alpha blending process to form the final output . Examples of generated images from CGAN with three generators. The alpha blending combines two translucent images, producing a new blended image. The objective of generator (G) is to fit the true data distribution deceiving discriminator (D) by playing following minimax game Samples drawn from CGAN after trained on CelebA , Pororo , Oxford-102 Flowers , and MS COCO datasets respectively. 1 2 3 3 1 2 2 1 2 2 1 2 2 Conclusion & Discussion We found implicit possibilities of structure learning from images without any labels by constructing the hierarchical structures of the images. Our model could be extended to other domains such as video, text, and audio, or combination of them. Since most of the data has hierarchical structures, studies on decomposing the combined data are essential to finding correlation between multimodal data. In addition to the empirical results, theoretical analysis and quantitative evaluation are needed to verify the works and other generation tasks.

Upload: others

Post on 23-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generating Images Part by Part with Composite Generative ...alanwags/DLAI2016/(Kwak+) IJCAI-16 DLAI WS.pdfGenerating Images Part by Part with Composite Generative Adversarial Networks

Generating Images Part by Part with Composite Generative Adversarial Networks

Hanock Kwak and Byoung-Tak ZhangDepartment of Computer Science and Engineering, Seoul National University, {hnkwak, btzhang}@bi.snu.ac.kr

Methods

BiointelligenceLab, SeoulNationalUniversity | Seoul 151-744, Korea (http://bi.snu.ac.kr)

Experimental ResultsBackgrounds• Images are composed of several different objects forming a hierarchical

structure with various styles and shapes.

• Deep learning models are used to implicitly disentangle complexunderlying patterns of data, forming distributed feature representations.

• Generative adversarial networks (GAN) are successful unsupervisedlearning models that can generate samples of natural imagesgeneralized from the training data.

• It is proven that if the GAN has enough capacity, data distributionformed by GAN can converge to the distribution over real data

• Composite generative adversarial network (CGAN) can generate imagespart by part.

• CGAN uses an alpha channel for opacity along with RGB channels tostack images iteratively with alpha blending process.

• The alpha blending process maintains previous image in some areas andoverlap the new image perfectly in other areas.

Key Ideas

• The structure of CGAN. The images are then combined sequentially by

alpha blending process to form the final output 𝑂 𝑛 .

Examples of generated images from CGAN with three

generators.

• The alpha blending combines two translucent images, producing a newblended image.

• The objective of generator (G) is to fit the true data distributiondeceiving discriminator (D) by playing following minimax game

• Samples drawn from CGAN after trained on CelebA, Pororo, Oxford-102Flowers, and MS COCO datasets respectively.

𝐶1

𝐶2

𝐶3

𝑂 3

𝐶1

𝐶2

𝑂 2

𝐶1

𝐶2

𝑂 2

𝐶1

𝐶2

𝑂 2

Conclusion & Discussion

• We found implicit possibilities of structure learning from images withoutany labels by constructing the hierarchical structures of the images.

• Our model could be extended to other domains such as video, text, andaudio, or combination of them.

• Since most of the data has hierarchical structures, studies ondecomposing the combined data are essential to finding correlationbetween multimodal data.

• In addition to the empirical results, theoretical analysis and quantitativeevaluation are needed to verify the works and other generation tasks.