generating images part by part with composite generative ...alanwags/dlai2016/(kwak+) ijcai-16 dlai...
TRANSCRIPT
![Page 1: Generating Images Part by Part with Composite Generative ...alanwags/DLAI2016/(Kwak+) IJCAI-16 DLAI WS.pdfGenerating Images Part by Part with Composite Generative Adversarial Networks](https://reader035.vdocument.in/reader035/viewer/2022062914/5e82368e69089d01b669ee08/html5/thumbnails/1.jpg)
Generating Images Part by Part with Composite Generative Adversarial Networks
Hanock Kwak and Byoung-Tak ZhangDepartment of Computer Science and Engineering, Seoul National University, {hnkwak, btzhang}@bi.snu.ac.kr
Methods
BiointelligenceLab, SeoulNationalUniversity | Seoul 151-744, Korea (http://bi.snu.ac.kr)
Experimental ResultsBackgrounds• Images are composed of several different objects forming a hierarchical
structure with various styles and shapes.
• Deep learning models are used to implicitly disentangle complexunderlying patterns of data, forming distributed feature representations.
• Generative adversarial networks (GAN) are successful unsupervisedlearning models that can generate samples of natural imagesgeneralized from the training data.
• It is proven that if the GAN has enough capacity, data distributionformed by GAN can converge to the distribution over real data
• Composite generative adversarial network (CGAN) can generate imagespart by part.
• CGAN uses an alpha channel for opacity along with RGB channels tostack images iteratively with alpha blending process.
• The alpha blending process maintains previous image in some areas andoverlap the new image perfectly in other areas.
Key Ideas
• The structure of CGAN. The images are then combined sequentially by
alpha blending process to form the final output 𝑂 𝑛 .
Examples of generated images from CGAN with three
generators.
• The alpha blending combines two translucent images, producing a newblended image.
• The objective of generator (G) is to fit the true data distributiondeceiving discriminator (D) by playing following minimax game
• Samples drawn from CGAN after trained on CelebA, Pororo, Oxford-102Flowers, and MS COCO datasets respectively.
𝐶1
𝐶2
𝐶3
𝑂 3
𝐶1
𝐶2
𝑂 2
𝐶1
𝐶2
𝑂 2
𝐶1
𝐶2
𝑂 2
Conclusion & Discussion
• We found implicit possibilities of structure learning from images withoutany labels by constructing the hierarchical structures of the images.
• Our model could be extended to other domains such as video, text, andaudio, or combination of them.
• Since most of the data has hierarchical structures, studies ondecomposing the combined data are essential to finding correlationbetween multimodal data.
• In addition to the empirical results, theoretical analysis and quantitativeevaluation are needed to verify the works and other generation tasks.