abhishek aich , akash gupta , rameswar panda , m. salman ... · referencesii [5]piotr bojanowski et...

20
Non-Adversarial Video Synthesis with Learned Priors Abhishek Aich ,* , Akash Gupta , * , Rameswar Panda , Rakib Hyder , M. Salman Asif , Amit K. Roy-Chowdhury University of California, Riverside , IBM Research AI, Cambridge * Joint first authors

Upload: others

Post on 09-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Non-Adversarial Video Synthesis withLearned Priors

Abhishek Aich†,∗, Akash Gupta†,∗, Rameswar Panda‡,Rakib Hyder†, M. Salman Asif†, Amit K. Roy-Chowdhury†

University of California, Riverside†, IBM Research AI, Cambridge‡

Non-adversarial video reconstruction

Abhishek AichElectrical and Computer Engineering

[email protected]

∗Joint first authors

Page 2: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

OutlineProblem Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8Triplet Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Non-Adversarial Video Synthesis 2Multi-view video frame reconstruction/interpolation 1

Related Works

Page 3: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Problem Overview

Brief Statement:How can we synthesize videos from random noise vectors without visual cues?

Why is the problem important?• Understanding how videos are generated can help us in their decomposition of spatial

and temporal behavior.• Can be used as the building block for choosing and designing priors for different

problems.• Will help us in difficult problems like future frame prediction, feature learning for

videos, etc.

Non-Adversarial Video Synthesis 3Multi-view video frame reconstruction/interpolation 1

Related Works

Page 4: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Related WorksPrior works in this problem• VGAN [1] demonstrates that a video can be divided into foreground and background

using deep neural networks.

• TGAN [2] proposes to use a generator to capture temporal dynamics by generatingcorrelated latent codes for each video frame and then using an image generator tomap each of this latent code to a single frame for the whole video.

• MoCoGAN [3] presents a simple approach to separate content and motion latent codesof a video using adversarial learning.

• Video-VAE [4] extends the idea of image generation to video generation using Varia-tional Auto-Encoder (VAE) by proposing a structured latent space in conjunction withthe VAE architecture for videos synthesis using a given/generated input frame.

Non-Adversarial Video Synthesis 4Multi-view video frame reconstruction/interpolation 1

Related Works

Page 5: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Related Works

Methods SettingsAdversariallearning?

Inputframe?

Input latentvectors?

VGAN [1] X 7 X(random)TGAN [2] X 7 X(random)

MoCoGAN [3] X 7 X(random)Video-VAE [4] 7 X X(random)

Ours 7 7 X(learned)

Table 1: Categorization of prior works in video synthesis. Different from existing methods,our model doesn’t require a discriminator, or any reference input frame. However, since we havelearned latent vectors, we have control of the kind of videos the model should generate.

Non-Adversarial Video Synthesis 5Multi-view video frame reconstruction/interpolation 1

Related Works

Page 6: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Proposed Approach: MotivationAdversarial approaches [1–3] involve training a generative network using a discriminator.These are great for learning complex data-distribution, but come with following drawbacks:

• They are difficult to train given the saddle-point based learning surface.

• They suffer from mode collapse problem.

• They suffer from vanishing gradient problem due to the adversarial process.

Also, inspired from · · ·Non-adversarial learning of generative networks for image generation [5, 6] have shown thatproperties of GANs (convolutional networks) can be mimicked using simple reconstructionlosses while discarding the discriminator.

Non-Adversarial Video Synthesis 6Multi-view video frame reconstruction/interpolation 1

Related Works

Page 7: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Proposed Approach: NotationDefine a video clip V represented by L frames as V =

[v1, v2, · · · , vL

]. Corresponding to

each frame, let there be a point in latent space ZV ∈ RD×L such that

ZV =[z1, z2, · · · , zL

](1)

• We propose to disentangle a video into two parts: a static constituent, which capturesthe constant portion of the video common for all frames, and a transient constituentwhich represents the temporal dynamics between all the frames in the video.• Assuming that the video is of short length, we can fix z (s)

i = z (s) for all frames aftersampling only once. Therefore, (1) can be expressed as

ZV =[[

z (s)

z (t)1

],

[z (s)

z (t)2

], · · · ,

[z (s)

z (t)L

]](2)

Non-Adversarial Video Synthesis 7Multi-view video frame reconstruction/interpolation 1

Related Works

Page 8: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Proposed Architecture

�<latexit sha1_base64="UU9fvSbrjkVazf53FSAlbxCS9PY=">AAAChHicbZHbattAEIZXapum7slpLnOz1Jg6EIzktKRXJaQ3uSghhToJWK5YrUbOkj2I3VGIK/Qkfave9W26ckwOdgcWfr5/ZnZ3JiulcBhFf4PwydNnG883X3Revnr95m13692ZM5XlMOZGGnuRMQdSaBijQAkXpQWmMgnn2dXX1j+/BuuE0T9wXsJUsZkWheAMPUq7v/sJwg1mRf2rSTlNnFA0UQwvOZP1STOI9jyaKZby3U4/yWAmdJ1534qbpnNXa5uf9SDebVKVJCt0dEevc4Nuzf/W+p6Bzu8bJ6aUlUu7vWgYLYKui3gpemQZp2n3T5IbXinQyCVzbhJHJU5rZlFwCb5t5aBk/IrNYOKlZgrctF4MsaF9T3JaGOuPRrqgDytqppybq8xntvNxq14L/+dNKiw+T2uhywpB89uLikpSNLTdCM2FBY5y7gXjVvi3Un7JLOPo99bxQ4hXv7wuzkbDeH84+v6xd3i0HMcm2SHvyYDE5IAckmNySsaEB0HwIYiCONwI98L98NNtahgsa7bJowi//AMReMJ/</latexit>

...<latexit sha1_base64="GJdAJSlN295GCsGY0regmevVeew=">AAACI3icbVDLSgMxFM34rPVVdekmWARXZariAzdFN66kilWhM5RM5lZDM5MhuVMtQz/Ghd/iwo0VNy78F9MHYtUDgcM595F7gkQKg6774UxMTk3PzObm8vMLi0vLhZXVK6NSzaHGlVT6JmAGpIihhgIl3CQaWBRIuA5aJ33/ug3aCBVfYicBP2K3sWgKztBKjcLRpofwgNkF8FRriLFLPY8OtTNINZNjAt4r3ermvXao0DQKRbfkDkD/kvKIFMkI1Uah54WKp5FdwyUzpl52E/QzplFwCXZsaiBhvMVuoW5pzCIwfjY4sks3rRLSptL2xUgH6s+OjEXGdKLAVkYM78xvry/+59VTbB74mYiTFCHmw0XNVFJUtJ8YDYUGjrJjCeNa2L9Sfsc042hzHWRw2Mfe98V/ydV2qbxT2jnfLVaOR2nkyDrZIFukTPZJhZySKqkRTh7JM3klPefJeXHenPdh6YQz6lkjY3A+vwALZaYh</latexit>

network<latexit sha1_base64="lxtYGTT5xG0wRWVQK58VvT7ecPk=">AAACDHicbZA9SwNBEIb3/IzxK2ppsxgDVuFixI9OtNAyglEhOcLeZqJL9vaO3Tk1HPkDFv4WCxsVW3tL/4175yFqfGHh5ZkZZuf1IykMuu6HMzY+MTk1XZgpzs7NLyyWlpbPTBhrDk0eylBf+MyAFAqaKFDCRaSBBb6Ec79/mNbPr0EbEapTHETgBexSiZ7gDC3qlNYrbYRbTI5AgbbsGoa0mCHERAHehLo/7JTKbtXNREdNLTdlkqvRKb23uyGPA1DIJTOmVXMj9BKmUXAJw2I7NhAx3meX0LJWsQCMl2TXDGnFki7thdo+hTSjPycSFhgzCHzbGTC8Mn9rKfyv1oqxt+slQkUxguJfi3qxpBjSNBraFRo4yoE1jGth/0r5FdOMow0wy2Av1fb3xaPmbLNaq1frJ1vl/YM8jQJZJWtkg9TIDtknx6RBmoSTO/JAnsizc+88Oi/O61frmJPPrJBfct4+AS3SnLc=</latexit>

Generative<latexit sha1_base64="8JZ26r24EenHX0oIQ+TptEMukHU=">AAACDnicbZA9SwNBEIb3/IzxK2ppsxgUq3BR8aMLWmgZwSRCEsLeZhKX7O0du3NqOO4fWPhbLGxUbG0t/TduLlHU+MLCyzMzzM7rhVIYdN0PZ2JyanpmNjOXnV9YXFrOraxWTRBpDhUeyEBfesyAFAoqKFDCZaiB+Z6Emtc7GdRr16CNCNQF9kNo+qyrREdwhha1clsNhFtEjE9BgbbwGhKa3fyiCvAm0L2klcu7BTcVHTfFkcmTkcqt3HujHfDIB4VcMmPqRTfEZsw0Ci4hyTYiAyHjPdaFurWK+WCacXpPQjctadNOoO1TSFP6cyJmvjF937OdPsMr87c2gP/V6hF2DpuxUGGEoPhwUSeSFAM6CIe2hQaOsm8N41rYv1J+xTTjaCNMMzgaaP/74nFT3SkUdwu753v50vEojQxZJxtkmxTJASmRM1ImFcLJHXkgT+TZuXcenRfnddg64Yxm1sgvOW+f/Iedsw==</latexit>

Recurrent<latexit sha1_base64="gsYJDZiC5pXBMjZvtdnq+NuewPE=">AAACInicbZDLTgIxFIY7eEO8oS7dNBISV2QQ421FdKFLNYIkQEinHKCh05m0Z1Ay4V1c+Cwu3KhxZeLDWC4aFf+kyZ/vnJPT83uhFAZd991JzMzOzS8kF1NLyyura+n1jbIJIs2hxAMZ6IrHDEihoIQCJVRCDcz3JNx43dNh/aYH2ohAXWM/hLrP2kq0BGdoUSN9nK0h3CFifAYKtKU9GNDUN1WAt4HuDlJf4Ap4pDUoHDTSGTfnjkSnTX5iMmSii0b6tdYMeOTbYS6ZMdW8G2I9ZhoFl2BXRAZCxrusDVVrFfPB1OPRjQOataRJW4G2TyEd0Z8TMfON6fue7fQZdszf2hD+V6tG2Dqsx0KFEYLi40WtSFIM6DAw2hQaOMq+NYxrYf9KeYdpxtHGOsrgaKj974unTXk3ly/kCpd7meLJJI0k2SLbZIfkyQEpknNyQUqEk3vySJ7Ji/PgPDmvztu4NeFMZjbJLzkfn04TpmU=</latexit>

network<latexit sha1_base64="ki51rdA6BgYzJJ2aaVTnY165L2U=">AAACH3icbVDLSgMxFM3UV62vqks3wSK4KlMrVneiC11JBVsLbSmZ9FZDM5khuaOWYT7Fhd/iwo2KLvs3pg+LWg8EDuecy809XiiFQdftO6mZ2bn5hfRiZml5ZXUtu75RNUGkOVR4IANd85gBKRRUUKCEWqiB+Z6Ea697OvCv70AbEagr7IXQ9NmNEh3BGVqplS3tNBAeEDE+AwXaqneQ0My3qADvA91NMpPYBUSayaSVzbl5dwg6TQpjkiNjlFvZz0Y74JEPCrlkxtQLbojNmGkUXEKSaUQGQsa77Abqlirmg2nGwwMTumOVNu0E2j6FdKj+nIiZb0zP92zSZ3hr/noD8T+vHmHnsBkLFUYIio8WdSJJMaCDtmhbaOAoe5YwroX9K+W3TDOOttNhB0cDHEwunibVvXyhmC9e7ueOT8ZtpMkW2Sa7pEBK5JickzKpEE4eyTN5JW/Ok/PivDsfo2jKGc9skl9w+l+enKT0</latexit>neural<latexit sha1_base64="RlIzHLn4mgAOgZbHsMxJfLe8X4M=">AAAB+HicbVBNS8NAEN3Ur1o/GvXoJVgETyW14set6MVjBfsBbSib7aRdutmE3YlYQ3+JFw+KePWnePPfmKRB1Ppg4PHeDDPz3FBwjbb9aRSWlldW14rrpY3Nre2yubPb1kGkGLRYIALVdakGwSW0kKOAbqiA+q6Ajju5Sv3OHSjNA3mL0xAcn44k9zijmEgDs9xHuEfEWEKkqJgNzIpdtTNYi6SWkwrJ0RyYH/1hwCIfJDJBte7V7BCdmCrkTMCs1I80hJRN6Ah6CZXUB+3E2eEz6zBRhpYXqKQkWpn6cyKmvtZT3006fYpj/ddLxf+8XoTeuRNzGUYIks0XeZGwMLDSFKwhV8BQTBNCmeLJrRYbU0UZJlmVshAuUpx+v7xI2sfVWr1avzmpNC7zOIpknxyQI1IjZ6RBrkmTtAgjEXkkz+TFeDCejFfjbd5aMPKZPfILxvsX2iaUAA==</latexit>

static

<latexit sha1_base64="LpnaRyw1wnUxY+PdGEKCf4dlZ1I=">AAACW3icbVHLTttAFJ2Y8qhLIRSx6mbUCIlFFTkh4rGjsGFJJQJISRSNx9dhlBnbmrmuGln+nn4NWyoW/AvXToqgcKUZHZ1zX3MmzLRyGAQPDW/pw/LK6tpH/9P6543N5taXK5fmVkJfpjq1N6FwoFUCfVSo4SazIEyo4TqcnlX69S+wTqXJJc4yGBkxSVSspECixs0fQ4TfWPcppIiMys3EAiRlUQuIhUNKlWXJ/d1/lBYICXKXCQnluNkK2kEd/C3oLECLLeJivNXwhlEqc0M9pBbODTpBhqNCWJqjofSHuQNqPRUTGBBMhAE3KuodS75LTMTj1NKhHWr2ZUUhjHMzE1KmEXjr/tcq8j1tkGN8NCpUkuX0ODkfFOeaY8or43ikLEjUMwJCWkW7cnkrrJBI9vrDCGL6gveMtJOwLMiQ7+RKr1vd3d7cteMqDp49eguuuu3Ofnv/Z691crrwb419Zd/YHuuwQ3bCztkF6zPJ/rA7ds/+Nh69Jc/31uepXmNRs81ehbfzBOQrtu0=</latexit>

late

nt<latexit sha1_base64="Hlv0j/5n5C2BDZDSj/ZFBgprYyk=">AAACZHicbVFdSxtBFJ2stepWa6z0qVCGBsEHCZsYrH0TfSk+KTQqJCHMzt6NgzO7y8zdYlj2P/lrBJ/0yd/h3U0qWr0ww+Gc+zVnwkwrh0Fw1/AWPix+XFpe8T+trn1eb258OXNpbiX0ZapTexEKB1ol0EeFGi4yC8KEGs7Dq6NKP/8L1qk0+YPTDEZGTBIVKymQqHHzeGuIcI11o0KKyKjcTCxAUha1gFg4pFxZltz/x2iBkGDpz2opIxMSynGzFbSDOvhb0JmDFpvHyXij4Q2jVOaGmkktnBt0ggxHhbA0T0PpD3MH1PpKTGBAMBEG3Kiody35FjERj1NLJ0Fesy8rCmGcm5qQMo3AS/e/VpHvaYMc4/1RoZIsp1fK2aA41xxTXjnII2VBop4SENIq2pXLS2GFRPLZH0YQ01+8Z6idhGVBhuyQK71udXd7M9d+VbH37NFbcNZtd3bbu6e91sHh3L9l9o39YNusw36yA/abnbA+k+yG3bJ79tB49Fa9Te/rLNVrzGs22avwvj8BUk664w==</latexit>

space

<latexit sha1_base64="utGRZkHdcL/f/4vrJ/OKP/BTykE=">AAACZXicbVHBThsxEHUW2tIAbWgrLhywiCJxQNEmRKXcEFyQuIDUAFISRV7vbLCwd1f2bEW02o/ia1BvcOI3mN0ERAsj2Xp6M/Nm/BykWjn0/b81b2Hxw8dPS5/ryyurX7421r6duySzEvoy0Ym9DIQDrWLoo0INl6kFYQINF8H1UZm/+APWqST+jdMURkZMYhUpKZCoceOkNUS4wUoolyI0KjMTCxAXeZVAzB1SrSwKXm89U1ogxFjUX0pSIaEYN5p+26+CvwWdOWiyeZyO12reMExkZkhMauHcoOOnOMqFpYEaSD9zQNLXYgIDgrEw4EZ5tWzBW8SEPEosnRh5xb7uyIVxbmoCqjQCr9z/uZJ8LzfIMPo1ylWcZvRKORsUZZpjwksLeagsSNRTAkJaRbtyeSWskEhG14chRPQZ7zlqJ0GRkyE75EqvW97d3sy1/TJ+vnj0Fpx3253d9u5Zr3lwOPdviW2wLbbNOmyPHbBjdsr6TLJbdsfu2UPt0Vv1fnjrs1KvNu/5zv4Jb/MJWFa7YQ==</latexit>

late

nt<latexit sha1_base64="Hlv0j/5n5C2BDZDSj/ZFBgprYyk=">AAACZHicbVFdSxtBFJ2stepWa6z0qVCGBsEHCZsYrH0TfSk+KTQqJCHMzt6NgzO7y8zdYlj2P/lrBJ/0yd/h3U0qWr0ww+Gc+zVnwkwrh0Fw1/AWPix+XFpe8T+trn1eb258OXNpbiX0ZapTexEKB1ol0EeFGi4yC8KEGs7Dq6NKP/8L1qk0+YPTDEZGTBIVKymQqHHzeGuIcI11o0KKyKjcTCxAUha1gFg4pFxZltz/x2iBkGDpz2opIxMSynGzFbSDOvhb0JmDFpvHyXij4Q2jVOaGmkktnBt0ggxHhbA0T0PpD3MH1PpKTGBAMBEG3Kiody35FjERj1NLJ0Fesy8rCmGcm5qQMo3AS/e/VpHvaYMc4/1RoZIsp1fK2aA41xxTXjnII2VBop4SENIq2pXLS2GFRPLZH0YQ01+8Z6idhGVBhuyQK71udXd7M9d+VbH37NFbcNZtd3bbu6e91sHh3L9l9o39YNusw36yA/abnbA+k+yG3bJ79tB49Fa9Te/rLNVrzGs22avwvj8BUk664w==</latexit>

space

<latexit sha1_base64="utGRZkHdcL/f/4vrJ/OKP/BTykE=">AAACZXicbVHBThsxEHUW2tIAbWgrLhywiCJxQNEmRKXcEFyQuIDUAFISRV7vbLCwd1f2bEW02o/ia1BvcOI3mN0ERAsj2Xp6M/Nm/BykWjn0/b81b2Hxw8dPS5/ryyurX7421r6duySzEvoy0Ym9DIQDrWLoo0INl6kFYQINF8H1UZm/+APWqST+jdMURkZMYhUpKZCoceOkNUS4wUoolyI0KjMTCxAXeZVAzB1SrSwKXm89U1ogxFjUX0pSIaEYN5p+26+CvwWdOWiyeZyO12reMExkZkhMauHcoOOnOMqFpYEaSD9zQNLXYgIDgrEw4EZ5tWzBW8SEPEosnRh5xb7uyIVxbmoCqjQCr9z/uZJ8LzfIMPo1ylWcZvRKORsUZZpjwksLeagsSNRTAkJaRbtyeSWskEhG14chRPQZ7zlqJ0GRkyE75EqvW97d3sy1/TJ+vnj0Fpx3253d9u5Zr3lwOPdviW2wLbbNOmyPHbBjdsr6TLJbdsfu2UPt0Vv1fnjrs1KvNu/5zv4Jb/MJWFa7YQ==</latexit>

transient

<latexit sha1_base64="4wHfmdYr4lLwocwvEXzaKryh8/I=">AAACaHicbVFNTxsxEHW2hULaQigHVPViESFxqKJNiPpxQ/TCqaISAaQkima9s8GK7V3Zs1Wj1f6r/hm4wqm/Au8mbfl6kq2n9zzj8XOUKekoDK8awYuXK6uv1tabr9+83dhsbb07c2luBQ5EqlJ7EYFDJQ0OSJLCi8wi6EjheTT7VvnnP9E6mZpTmmc41jA1MpECyEuT1vcR4S+q+xQCYi1zPbWIpixqg6ggC8ZJNFSWvLn3V1VAlfRfcBkILCetdtgJa/CnpLskbbbEyWSrEYziVOTadxMKnBt2w4zGBViSQmHZHOUOfesZTHHoqQGNblzUA5d8zysxT1LrlyFeq/crCtDOzXXkT2qgS/fYq8TnvGFOyZdxIU2W+2eKxUVJrjilvEqRx9KiIDX3BISVflYuLsGCIJ91cxRj4v/juVTtNCoLH8hHn0q/V+29/iK1rxU+/cvoKTnrdboHnYMf/fbh0TK/NfaB7bJ91mWf2SE7ZidswAT7za7ZDbtt/AlawU7wfnE0aCxrttkDBLt3ee+8zw==</latexit>

G<latexit sha1_base64="TEEKqEYuQd+TQ6h6Jitr41VTBWU=">AAAB8nicbVDLSsNAFL2pr1pfVZduBovgqiRWfOyKLnRZwT4gDWUynbRDJ5MwMxFK6Ge4caGIW7/GnX/jJA2i1gMDh3PuZc49fsyZ0rb9aZWWlldW18rrlY3Nre2d6u5eR0WJJLRNIh7Jno8V5UzQtmaa014sKQ59Trv+5Drzuw9UKhaJez2NqRfikWABI1gbye2HWI8J5unNbFCt2XU7B1okTkFqUKA1qH70hxFJQio04Vgp17Fj7aVYakY4nVX6iaIxJhM8oq6hAodUeWkeeYaOjDJEQSTNExrl6s+NFIdKTUPfTGYR1V8vE//z3EQHF17KRJxoKsj8oyDhSEcoux8NmaRE86khmEhmsiIyxhITbVqq5CVcZjj7PnmRdE7qTqPeuDutNa+KOspwAIdwDA6cQxNuoQVtIBDBIzzDi6WtJ+vVepuPlqxiZx9+wXr/ApI0kZQ=</latexit>

Updated with gradient of loss

26664

v̂1

v̂2

...v̂L

37775

<latexit sha1_base64="1Wu26tFhIELuEcx8b2p+HeRxzB8=">AAADP3icfVJNb9MwGHbCx0b56uDIxaKqNCRUJRvaxm0aFw47DIluk+quchyntebYkf2mWrHyz3bhL3DjyoUDCHHlhpt2U+lYXynS4+d5Xr8fcVJIYSGKvgbhnbv37q+tP2g8fPT4ydPmxrNjq0vDeJdpqc1pQi2XQvEuCJD8tDCc5onkJ8n5u6l+MubGCq0+wqTg/ZwOlcgEo+CpwUbQbZOED4VySU7BiIuq0SbALyDJnKnO3Cbx9MhmDqpX1SDGhOBVhi1v8DpLNdgZvNV66EWu0sW6tzXyaTHZ+uTXeJVUV19hWSp8RkAXjeXiZETBjWcjXx/q8cj4aror+rA+LN45aLaiTlQHvgniOWiheRwNml9IqlmZcwVMUmt7cVRA31EDgknu2yktLyg7p0Pe81DRnNu+q/9/hdueSXGmjf8U4JpdzHA0t3aSJ95Zr2FZm5L/03olZHt9J1RRAldsVigrJQaNp48Jp8JwBnLiAWVG+F4xG1FDGfgn16iX8HYaO9cj3wTHW514u7P94U1r/2C+jnX0Ar1EmyhGu2gfvUdHqItYcBl8C34EP8PP4ffwV/h7Zg2Dec5z9E+Ef/4CBdwQfQ==</latexit>

Updated with gradient of loss

266664

r(t)1

r(t)2...

r(t)L

377775

<latexit sha1_base64="Acl9MFN689Yrmx0HhsMf5i3YwtQ=">AAADinicfVJdb9MwFHUTYCMM1sEjLxZVpaGxqe3QxsTLBDzwsIch0W1S00WO47RWHSeyb8pK5P/Cb+KNf4OTVF0/pl3J0rnnnOvre+UwE1xDp/Ov4bhPnj7b2n7uvdh5+Wq3uff6Sqe5oqxPU5Gqm5BoJrhkfeAg2E2mGElCwa7DyddSv54ypXkqf8IsY8OEjCSPOSVgqWCv8ccP2YjLIkwIKH5nPHVb7Ps2Geu4APPeBF3s+3iT7lna86dRCrpEG/qF5zMZ3d/bXm/U9oHdQRgXv81Srba1H/BjEi1bPmKxFy83vvUhzR7qPiZQTOvx7rNenS3GWggXdbY2EhMiKKqXFIpRYw58YXcfkaDmwKxYbFUmGBiDD7w2XrUKZZatIv11qIicGBM0W52jThV4E3TnoIXmcRk0//pRSvOESaCCaD3odjIYFkQBp4IZz881ywidkBEbWChJwvSwqL6SwW3LRDhOlT0ScMUuVxQk0XqWhNZZrXxdK8mHtEEO8adhwWWWA5O0bhTnAkOKy3+JI24XCGJmAaGK27diOiaKULC/16uWcFbGyWLkTXDVO+oeHx3/+Ng6/zJfxzZ6i96hfdRFp+gcfUeXqI+os+UcOifOqbvj9twz93NtdRrzmjdoJdxv/wHp7Cc9</latexit>

266664

z(t)1

z(t)2...

z(t)L

377775

<latexit sha1_base64="9+pgnqRMiN15IJzCuR+9dGzQ/FY=">AAADinicfVJdb9MwFHUTYCMM6OCRF4uq0tDY1HZoY+JlAh542MOQ6Dap6SLHcVqrjhPZN2Vd5P/Cb+KNf4OTVF0/pl3J0rnnnOvre+UwE1xDp/Ov4bhPnj7b2n7uvdh5+ep1c/fNpU5zRVmfpiJV1yHRTHDJ+sBBsOtMMZKEgl2Fk2+lfjVlSvNU/oJZxoYJGUkec0rAUsFu448fshGXRZgQUPzWeHc3xZ5vk7GOCzAfTNDFvo836Z6lPX8apaBLtKGfez6T0f297fVGbR/YLYRxcWeWarWt/Ygfk2jZ8hGLvXi58Y0PafZQ9zGBYlqPd5/16mwx1kI4r7O1kZgQQVG9pFCMGrPvC7v7iAQ1B2bFYqsywcAYvO+18apVKLNsFenvA0XkxJig2eocdqrAm6A7By00j4ug+dePUponTAIVROtBt5PBsCAKOBXMeH6uWUbohIzYwEJJEqaHRfWVDG5bJsJxquyRgCt2uaIgidazJLTOauXrWkk+pA1yiD8PCy6zHJikdaM4FxhSXP5LHHG7QBAzCwhV3L4V0zFRhIL9vV61hNMyjhcjb4LL3mH36PDo56fW2df5OrbRO/Qe7aEuOkFn6Ae6QH1EnS3nwDl2Ttwdt+eeul9qq9OY17xFK+F+/w8xUydV</latexit>

⇥z(s), z(s), · · · , z(s)

⇤><latexit sha1_base64="cBd0FoCeNzyEVUGsMyvP2TJN7C4=">AAADcnicbVJLbxMxEHY2PMrySkFcQAJDFKlVS5S0FY9bBRcOPRSJtJXidOX1ehMrXu/Knk0bVr7z+7jxK7jwA3A2C026HcnSN9/MeGY+TZhJYaDX+9Xwmrdu37m7cc+//+Dho8etzScnJs014wOWylSfhdRwKRQfgADJzzLNaRJKfhpOPy/ipzOujUjVN5hnfJTQsRKxYBQcFWw2fnRIyMdCFWFCQYtL63e+nxdbxHkTExdgt23Qx4TgG/g9x/sdMotSMCWsZRy5MFfR1efXm61WGFexi+sMYYsG9Yi/9vU5gTTz69uQCYVitlziyquN/i9wtPTWhnaulEFBgF9CoTmzdodIp3FEgyUHdi3FVWWSg7V4x+/g9VSp7WqqTC/eaqqm1gatdq/bKw3XQb8CbVTZcdD6SaKU5QlXwCQ1ZtjvZTAqqAbBJHfq5IZnlE3pmA8dVDThZlSUJ2NxxzERjlPtngJcsqsVBU2MmSehyywVvx5bkDfFhjnEH0aFUFkOXLFloziXGFK8uD8cCScgyLkDlGnhZsVsQjVl4K7UL0X4uLB3/1eug5O9bn+/u//1oH34qZJjA71Ab9AW6qP36BB9QcdogFjjt/fMe+m98v40nzdfNyvtvEZV8xStWXP3LxROHS8=</latexit>

Updated with gradient of loss

`rec + �s`static + �t`triplet

<latexit sha1_base64="AcJcVKFtP62R8MiPdUBkQVCABfQ=">AAACTXicbVHLSgMxFM3UR2t9VV26CRZBEMqMVHRZdOOygn1Ap5RM5taGZh4kd8QyzA+6Edz5F25cKCKmD0HbXggczjn33uTEi6XQaNuvVm5ldW09X9gobm5t7+yW9vabOkoUhwaPZKTaHtMgRQgNFCihHStggSeh5Q2vx3rrAZQWUXiHoxi6AbsPRV9whobqlXwXpOylLsIjpgp4ltFT6kozwGe/tM6yvy6NpnepEf8bUYlYgiF7pbJdsSdFF4EzA2Uyq3qv9OL6EU8CCJFLpnXHsWPspkyZxRKyoptoiBkfsnvoGBiyAHQ3naSR0WPD+LQfKXNCpBP2b0fKAq1HgWecAcOBntfG5DKtk2D/spuKME4QQj5d1E8kxYiOo6W+MAGiHBnAuBLmrpQPmGIczQcUTQjO/JMXQfOs4lQr57fVcu1qFkeBHJIjckIcckFq5IbUSYNw8kTeyAf5tJ6td+vL+p5ac9as54D8q1z+BwPnuJQ=</latexit>

`rec + �s`static

<latexit sha1_base64="33uAVJkog2VZw8M6iBbvAvNAjhU=">AAACI3icbVDJSgNBEO2JW4xb1KOXxiAIQpiRiOIp6MVjBLNAJoSeTiVp0rPQXSOGYf7Fi7/ixYMSvHjwX+wsQkx80PB49aqq63mRFBpt+8vKrKyurW9kN3Nb2zu7e/n9g5oOY8WhykMZqobHNEgRQBUFSmhECpjvSah7g9txvf4ISosweMBhBC2f9QLRFZyhkdr5axekbCcuwhMmCnia0jPqSjOgw35lnabzLo2m1xjb+YJdtCegy8SZkQKZodLOj9xOyGMfAuSSad107AhbCVNmnIQ058YaIsYHrAdNQwPmg24lkxtTemKUDu2GyrwA6USd70iYr/XQ94zTZ9jXi7Wx+F+tGWP3qpWIIIoRAj5d1I0lxZCOA6MdYWJBOTSEcSXMXynvM8U4mlhzJgRn8eRlUjsvOqXixX2pUL6ZxZElR+SYnBKHXJIyuSMVUiWcPJNX8k4+rBfrzRpZn1Nrxpr1HJI/sL5/ALumptY=</latexit>

`rec + �s`static + �t`triplet

<latexit sha1_base64="AcJcVKFtP62R8MiPdUBkQVCABfQ=">AAACTXicbVHLSgMxFM3UR2t9VV26CRZBEMqMVHRZdOOygn1Ap5RM5taGZh4kd8QyzA+6Edz5F25cKCKmD0HbXggczjn33uTEi6XQaNuvVm5ldW09X9gobm5t7+yW9vabOkoUhwaPZKTaHtMgRQgNFCihHStggSeh5Q2vx3rrAZQWUXiHoxi6AbsPRV9whobqlXwXpOylLsIjpgp4ltFT6kozwGe/tM6yvy6NpnepEf8bUYlYgiF7pbJdsSdFF4EzA2Uyq3qv9OL6EU8CCJFLpnXHsWPspkyZxRKyoptoiBkfsnvoGBiyAHQ3naSR0WPD+LQfKXNCpBP2b0fKAq1HgWecAcOBntfG5DKtk2D/spuKME4QQj5d1E8kxYiOo6W+MAGiHBnAuBLmrpQPmGIczQcUTQjO/JMXQfOs4lQr57fVcu1qFkeBHJIjckIcckFq5IbUSYNw8kTeyAf5tJ6td+vL+p5ac9as54D8q1z+BwPnuJQ=</latexit>

low – rank condition+

<latexit sha1_base64="3DnS3vImeIO9jz1IyYjxDR65oEU=">AAAB6HicbVDLSgNBEOyNrxhfUY9eBoMgCGFXInoMevGYgHlAsoTZSW8yZnZ2mZkVQsgXePGgiFc/yZt/4yTZgyYWNBRV3XR3BYng2rjut5NbW9/Y3MpvF3Z29/YPiodHTR2nimGDxSJW7YBqFFxiw3AjsJ0opFEgsBWM7mZ+6wmV5rF8MOME/YgOJA85o8ZK9YteseSW3TnIKvEyUoIMtV7xq9uPWRqhNExQrTuemxh/QpXhTOC00E01JpSN6AA7lkoaofYn80On5MwqfRLGypY0ZK7+npjQSOtxFNjOiJqhXvZm4n9eJzXhjT/hMkkNSrZYFKaCmJjMviZ9rpAZMbaEMsXtrYQNqaLM2GwKNgRv+eVV0rwse5XyVb1Sqt5mceThBE7hHDy4hircQw0awADhGV7hzXl0Xpx352PRmnOymWP4A+fzB3QdjLc=</latexit>

Figure 1: Overview of Proposed Architecture. We map a video (with L frame sequence) intotwo learnable latent spaces. We jointly learn the static latent space and the transient latent spacealong with the network weights.

Non-Adversarial Video Synthesis 8Multi-view video frame reconstruction/interpolation 1

Related Works

Page 9: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Learning Loss I

Learning the weights of the network:

Let weights of G, and the RNN be γ and θ, respectively. We jointly optimize for θ, γ, and{ZVj}N

j=1 (sampled once in the beginning of training) for every epoch in two stages:

Stage 1 : minγ

`(Vj ,G(ZVj )|

(ZVj , θ

))(3.1)

Stage 2 : minZV,θ

`(Vj ,G(ZVj )|γ

)(3.2)

The index j represents a random video out of N videos chosen from the dataset. `(·) canbe chosen to be any distance based loss. We will refer to both (3.1) and (3.2) together asminZV,θ,γ

`rec.

Non-Adversarial Video Synthesis 9Multi-view video frame reconstruction/interpolation 1

Related Works

Page 10: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Learning Loss II

To equally capture the static portion of the video, we randomly choose a frame from thevideo and ask the generator to compare its corresponding generated frame during training.For this, we update the above loss as follows.

minZV,θ,γ

(`rec + λs`static) (4)

where `static = `(v̂k , vk) with k ∈ [1, 2, · · · , L] is a randomly chosen index, vk is theground truth frame, v̂k = G(zk), and λs is the regularization constant.

Non-Adversarial Video Synthesis 10Multi-view video frame reconstruction/interpolation 1

Related Works

Page 11: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Triplet ConditionLearning the latent space:

z(t),ni · · · z

(t)i · · · · · · z

(t),ai z

(t),pi · · · · · ·

Negative range Negative rangePositive range

1

Figure 2: Triplet Condition in the transient latent space. Latent code representation of shortvideo clips may lie very near to each other in the transient subspace. Using the proposed tripletcondition, our model learns to explain the dynamics of similar looking frames and map them todistinct latent vectors.

Non-Adversarial Video Synthesis 11Multi-view video frame reconstruction/interpolation 1

Related Works

Page 12: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Triplet ConditionPositive frames are randomly sampled within a margin range α of the anchor and negativesare chosen outside of this margin range. Defining a triplet set with transient latent codevectors {z (t),a

i , z (t),pi , z (t),n

i }, we aim to learn the transient embedding space z(t) such that

‖z (t),ai − z (t),p

i ‖22 + α < ‖z (t),ai − z (t),n

i ‖22

∀ {z (t),ai , z (t),p

i , z (t),ni } ∈ Γ, where Γ is the set of all possible triplets in z(t). With the above

regularization, the loss in (4) can be written as

minZV,θ,γ

(`rec + λs`static)

s.t. ‖z (t),ai − z (t),p

i ‖22 + α < ‖z (t),ai − z (t),n

i ‖22 (5)

where α is a hyperparameter that controls the margin while selecting positives and negative.

Non-Adversarial Video Synthesis 12Multi-view video frame reconstruction/interpolation 1

Related Works

Page 13: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Qualitative ResultsMoC

oGAN

Ours

MoC

oGAN

Ours

VGAN

Ours

(a) Chair-CAD [7]

MoC

oGAN

Ours

MoC

oGAN

Ours

VGAN

Ours

(b) Weizmann Human Action [8]

Figure 3: Qualitative results. The proposed method produces visually sharper, and consistentlybetter using the non-adversarial training protocol.

Non-Adversarial Video Synthesis 13Multi-view video frame reconstruction/interpolation 1

Related Works

Page 14: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Quantitative Results

MCS ↓ FCS ↑

Bound 0.0 0.91MoCoGAN [3] 4.11 0.85

Ours (-`triplet-`static) 3.83 0.77Ours (+`triplet+`static) 3.32 0.89

(a) Chair-CAD [7]

MCS ↓ FCS ↑

Bound 0.0 0.95MoCoGAN [3] 3.41 0.85

Ours (-`triplet-`static) 3.87 0.79Ours (+`triplet+`static) 2.63 0.90

(b) Weizmann Human Action [8]

Table 2: Quantitative results. We obtained better scores on the proposed method on both Chair-CAD [7], and Weizmann Human Action [8] datasets, compared to the adversarial approaches(MoCoGAN, and VGAN1). Best scores have been highlighted in bold.

1not shown here

Non-Adversarial Video Synthesis 14Multi-view video frame reconstruction/interpolation 1

Related Works

Page 15: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Application: Action exchange

Actions IdentitiesP1 P2 P3 P4 P5 P6 P7 P8 P9

run • • • •walk • • • •jump • • • • • • •skip • • • • • • •

Table 3: Generating videos by exchanging unseen actions by identities. Each cell in this tableindicates a video in the dataset. Only cells containing the symbol • indicate that the video waspart of the training set. We randomly generated videos corresponding to rest of the cells indicatedby symbols •, •, •, and •, visualized in Fig. 4.

Non-Adversarial Video Synthesis 15Multi-view video frame reconstruction/interpolation 1

Related Works

Page 16: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Application: Action exchange

jump

run

skip

walk

wave2

Figure 4: Examples of action exchange to generate unseen videos. This figure demonstratesthe effectiveness of our method in disentangling static and transient portion of a video, as well asthe efficacy in generating videos unseen during training. The colored bounding boxes indicate theunseen video generated referred in Tab. 3.

Non-Adversarial Video Synthesis 16Multi-view video frame reconstruction/interpolation 1

Related Works

Page 17: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Conclusions

• We presented a non-adversarial approach for synthesizing videos by jointly optimizingboth network weights and input latent space.

• Our approach allows us to generate videos from any mode of data distribution, performframe interpolation2, and generate videos unseen during training.

• Experiments on three3 standard datasets show the efficacy of our proposed approachover state-of-the-methods.

AcknowledgmentThis work was partially supported by NSF grant 1664172 & ONR grant N00014-19-1-2264.

2not shown in slides3only two shown in slides

Non-Adversarial Video Synthesis 17Multi-view video frame reconstruction/interpolation 1

Related Works

Page 18: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

References I

[1] Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. “Generating videos with scenedynamics”. In: Advances In Neural Information Processing Systems. 2016, pp. 613–621.

[2] Masaki Saito, Eiichi Matsumoto, and Shunta Saito. “Temporal generative adversarialnets with singular value clipping”. In: Proceedings of the IEEE International Conferenceon Computer Vision. 2017, pp. 2830–2839.

[3] Sergey Tulyakov et al. “Mocogan: Decomposing motion and content for video gen-eration”. In: Proceedings of the IEEE conference on computer vision and patternrecognition. 2018, pp. 1526–1535.

[4] Jiawei He et al. “Probabilistic video generation using holistic attribute control”. In:Proceedings of the European Conference on Computer Vision (ECCV). 2018, pp. 452–467.

Non-Adversarial Video Synthesis 18Multi-view video frame reconstruction/interpolation 1

Related Works

Page 19: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

References II

[5] Piotr Bojanowski et al. “Optimizing the Latent Space of Generative Networks”. In:International Conference on Machine Learning. 2018, pp. 599–608.

[6] Yedid Hoshen, Ke Li, and Jitendra Malik. “Non-Adversarial Image Synthesis withGenerative Latent Nearest Neighbors”. In: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition. 2019, pp. 5811–5819.

[7] Mathieu Aubry et al. “Seeing 3d chairs: exemplar part-based 2d-3d alignment usinga large dataset of cad models”. In: Proceedings of the IEEE conference on computervision and pattern recognition. 2014, pp. 3762–3769.

[8] Lena Gorelick et al. “Actions as space-time shapes”. In: IEEE transactions on patternanalysis and machine intelligence 29.12 (2007), pp. 2247–2253.

Non-Adversarial Video Synthesis 19Multi-view video frame reconstruction/interpolation 1

Related Works

Page 20: Abhishek Aich , Akash Gupta , Rameswar Panda , M. Salman ... · ReferencesII [5]Piotr Bojanowski et al.“Optimizing the Latent Space of Generative Networks”.In: International Conference

Thank you!

Non-Adversarial Video Synthesis 20Multi-view video frame reconstruction/interpolation 1

Related Works