abhishek aich , akash gupta , rameswar panda , m. salman ... · referencesii [5]piotr bojanowski et...

Non-Adversarial Video Synthesis withLearned Priors

Abhishek Aich†,∗, Akash Gupta†,∗, Rameswar Panda‡,Rakib Hyder†, M. Salman Asif†, Amit K. Roy-Chowdhury†

University of California, Riverside†, IBM Research AI, Cambridge‡

Non-adversarial video reconstruction

Abhishek AichElectrical and Computer Engineering

[email protected]

∗Joint first authors

OutlineProblem Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8Triplet Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Non-Adversarial Video Synthesis 2Multi-view video frame reconstruction/interpolation 1

Related Works

Problem Overview

Brief Statement:How can we synthesize videos from random noise vectors without visual cues?

Why is the problem important?• Understanding how videos are generated can help us in their decomposition of spatial

and temporal behavior.• Can be used as the building block for choosing and designing priors for different

problems.• Will help us in difficult problems like future frame prediction, feature learning for

videos, etc.


Related Works

Related WorksPrior works in this problem• VGAN [1] demonstrates that a video can be divided into foreground and background

using deep neural networks.

• TGAN [2] proposes to use a generator to capture temporal dynamics by generatingcorrelated latent codes for each video frame and then using an image generator tomap each of this latent code to a single frame for the whole video.

• MoCoGAN [3] presents a simple approach to separate content and motion latent codesof a video using adversarial learning.

• Video-VAE [4] extends the idea of image generation to video generation using Varia-tional Auto-Encoder (VAE) by proposing a structured latent space in conjunction withthe VAE architecture for videos synthesis using a given/generated input frame.


Related Works

Related Works

Methods SettingsAdversariallearning?

Inputframe?

Input latentvectors?

VGAN [1] X 7 X(random)TGAN [2] X 7 X(random)

MoCoGAN [3] X 7 X(random)Video-VAE [4] 7 X X(random)

Ours 7 7 X(learned)

Table 1: Categorization of prior works in video synthesis. Different from existing methods,our model doesn’t require a discriminator, or any reference input frame. However, since we havelearned latent vectors, we have control of the kind of videos the model should generate.


Related Works

Proposed Approach: MotivationAdversarial approaches [1–3] involve training a generative network using a discriminator.These are great for learning complex data-distribution, but come with following drawbacks:

• They are difficult to train given the saddle-point based learning surface.

• They suffer from mode collapse problem.

• They suffer from vanishing gradient problem due to the adversarial process.

Also, inspired from · · ·Non-adversarial learning of generative networks for image generation [5, 6] have shown thatproperties of GANs (convolutional networks) can be mimicked using simple reconstructionlosses while discarding the discriminator.


Related Works

Proposed Approach: NotationDefine a video clip V represented by L frames as V =

[v1, v2, · · · , vL

]. Corresponding to

each frame, let there be a point in latent space ZV ∈ RD×L such that

ZV =[z1, z2, · · · , zL

](1)

• We propose to disentangle a video into two parts: a static constituent, which capturesthe constant portion of the video common for all frames, and a transient constituentwhich represents the temporal dynamics between all the frames in the video.• Assuming that the video is of short length, we can fix z (s)

i = z (s) for all frames aftersampling only once. Therefore, (1) can be expressed as

ZV =[[

z (s)

z (t)1

],

[z (s)

z (t)2

], · · · ,

[z (s)

z (t)L

]](2)


Related Works

Proposed Architecture

�<latexit sha1_base64="UU9fvSbrjkVazf53FSAlbxCS9PY=">AAAChHicbZHbattAEIZXapum7slpLnOz1Jg6EIzktKRXJaQ3uSghhToJWK5YrUbOkj2I3VGIK/Qkfave9W26ckwOdgcWfr5/ZnZ3JiulcBhFf4PwydNnG883X3Revnr95m13692ZM5XlMOZGGnuRMQdSaBijQAkXpQWmMgnn2dXX1j+/BuuE0T9wXsJUsZkWheAMPUq7v/sJwg1mRf2rSTlNnFA0UQwvOZP1STOI9jyaKZby3U4/yWAmdJ1534qbpnNXa5uf9SDebVKVJCt0dEevc4Nuzf/W+p6Bzu8bJ6aUlUu7vWgYLYKui3gpemQZp2n3T5IbXinQyCVzbhJHJU5rZlFwCb5t5aBk/IrNYOKlZgrctF4MsaF9T3JaGOuPRrqgDytqppybq8xntvNxq14L/+dNKiw+T2uhywpB89uLikpSNLTdCM2FBY5y7gXjVvi3Un7JLOPo99bxQ4hXv7wuzkbDeH84+v6xd3i0HMcm2SHvyYDE5IAckmNySsaEB0HwIYiCONwI98L98NNtahgsa7bJowi//AMReMJ/</latexit>

...<latexit sha1_base64="GJdAJSlN295GCsGY0regmevVeew=">AAACI3icbVDLSgMxFM34rPVVdekmWARXZariAzdFN66kilWhM5RM5lZDM5MhuVMtQz/Ghd/iwo0VNy78F9MHYtUDgcM595F7gkQKg6774UxMTk3PzObm8vMLi0vLhZXVK6NSzaHGlVT6JmAGpIihhgIl3CQaWBRIuA5aJ33/ug3aCBVfYicBP2K3sWgKztBKjcLRpofwgNkF8FRriLFLPY8OtTNINZNjAt4r3ermvXao0DQKRbfkDkD/kvKIFMkI1Uah54WKp5FdwyUzpl52E/QzplFwCXZsaiBhvMVuoW5pzCIwfjY4sks3rRLSptL2xUgH6s+OjEXGdKLAVkYM78xvry/+59VTbB74mYiTFCHmw0XNVFJUtJ8YDYUGjrJjCeNa2L9Sfsc042hzHWRw2Mfe98V/ydV2qbxT2jnfLVaOR2nkyDrZIFukTPZJhZySKqkRTh7JM3klPefJeXHenPdh6YQz6lkjY3A+vwALZaYh</latexit>

network<latexit sha1_base64="lxtYGTT5xG0wRWVQK58VvT7ecPk=">AAACDHicbZA9SwNBEIb3/IzxK2ppsxgDVuFixI9OtNAyglEhOcLeZqJL9vaO3Tk1HPkDFv4WCxsVW3tL/4175yFqfGHh5ZkZZuf1IykMuu6HMzY+MTk1XZgpzs7NLyyWlpbPTBhrDk0eylBf+MyAFAqaKFDCRaSBBb6Ec79/mNbPr0EbEapTHETgBexSiZ7gDC3qlNYrbYRbTI5AgbbsGoa0mCHERAHehLo/7JTKbtXNREdNLTdlkqvRKb23uyGPA1DIJTOmVXMj9BKmUXAJw2I7NhAx3meX0LJWsQCMl2TXDGnFki7thdo+hTSjPycSFhgzCHzbGTC8Mn9rKfyv1oqxt+slQkUxguJfi3qxpBjSNBraFRo4yoE1jGth/0r5FdOMow0wy2Av1fb3xaPmbLNaq1frJ1vl/YM8jQJZJWtkg9TIDtknx6RBmoSTO/JAnsizc+88Oi/O61frmJPPrJBfct4+AS3SnLc=</latexit>

Generative<latexit sha1_base64="8JZ26r24EenHX0oIQ+TptEMukHU=">AAACDnicbZA9SwNBEIb3/IzxK2ppsxgUq3BR8aMLWmgZwSRCEsLeZhKX7O0du3NqOO4fWPhbLGxUbG0t/TduLlHU+MLCyzMzzM7rhVIYdN0PZ2JyanpmNjOXnV9YXFrOraxWTRBpDhUeyEBfesyAFAoqKFDCZaiB+Z6Emtc7GdRr16CNCNQF9kNo+qyrREdwhha1clsNhFtEjE9BgbbwGhKa3fyiCvAm0L2klcu7BTcVHTfFkcmTkcqt3HujHfDIB4VcMmPqRTfEZsw0Ci4hyTYiAyHjPdaFurWK+WCacXpPQjctadNOoO1TSFP6cyJmvjF937OdPsMr87c2gP/V6hF2DpuxUGGEoPhwUSeSFAM6CIe2hQaOsm8N41rYv1J+xTTjaCNMMzgaaP/74nFT3SkUdwu753v50vEojQxZJxtkmxTJASmRM1ImFcLJHXkgT+TZuXcenRfnddg64Yxm1sgvOW+f/Iedsw==</latexit>

Recurrent<latexit sha1_base64="gsYJDZiC5pXBMjZvtdnq+NuewPE=">AAACInicbZDLTgIxFIY7eEO8oS7dNBISV2QQ421FdKFLNYIkQEinHKCh05m0Z1Ay4V1c+Cwu3KhxZeLDWC4aFf+kyZ/vnJPT83uhFAZd991JzMzOzS8kF1NLyyura+n1jbIJIs2hxAMZ6IrHDEihoIQCJVRCDcz3JNx43dNh/aYH2ohAXWM/hLrP2kq0BGdoUSN9nK0h3CFifAYKtKU9GNDUN1WAt4HuDlJf4Ap4pDUoHDTSGTfnjkSnTX5iMmSii0b6tdYMeOTbYS6ZMdW8G2I9ZhoFl2BXRAZCxrusDVVrFfPB1OPRjQOataRJW4G2TyEd0Z8TMfON6fue7fQZdszf2hD+V6tG2Dqsx0KFEYLi40WtSFIM6DAw2hQaOMq+NYxrYf9KeYdpxtHGOsrgaKj974unTXk3ly/kCpd7meLJJI0k2SLbZIfkyQEpknNyQUqEk3vySJ7Ji/PgPDmvztu4NeFMZjbJLzkfn04TpmU=</latexit>

network<latexit sha1_base64="ki51rdA6BgYzJJ2aaVTnY165L2U=">AAACH3icbVDLSgMxFM3UV62vqks3wSK4KlMrVneiC11JBVsLbSmZ9FZDM5khuaOWYT7Fhd/iwo2KLvs3pg+LWg8EDuecy809XiiFQdftO6mZ2bn5hfRiZml5ZXUtu75RNUGkOVR4IANd85gBKRRUUKCEWqiB+Z6Ea697OvCv70AbEagr7IXQ9NmNEh3BGVqplS3tNBAeEDE+AwXaqneQ0My3qADvA91NMpPYBUSayaSVzbl5dwg6TQpjkiNjlFvZz0Y74JEPCrlkxtQLbojNmGkUXEKSaUQGQsa77Abqlirmg2nGwwMTumOVNu0E2j6FdKj+nIiZb0zP92zSZ3hr/noD8T+vHmHnsBkLFUYIio8WdSJJMaCDtmhbaOAoe5YwroX9K+W3TDOOttNhB0cDHEwunibVvXyhmC9e7ueOT8ZtpMkW2Sa7pEBK5JickzKpEE4eyTN5JW/Ok/PivDsfo2jKGc9skl9w+l+enKT0</latexit>neural<latexit sha1_base64="RlIzHLn4mgAOgZbHsMxJfLe8X4M=">AAAB+HicbVBNS8NAEN3Ur1o/GvXoJVgETyW14set6MVjBfsBbSib7aRdutmE3YlYQ3+JFw+KePWnePPfmKRB1Ppg4PHeDDPz3FBwjbb9aRSWlldW14rrpY3Nre2yubPb1kGkGLRYIALVdakGwSW0kKOAbqiA+q6Ajju5Sv3OHSjNA3mL0xAcn44k9zijmEgDs9xHuEfEWEKkqJgNzIpdtTNYi6SWkwrJ0RyYH/1hwCIfJDJBte7V7BCdmCrkTMCs1I80hJRN6Ah6CZXUB+3E2eEz6zBRhpYXqKQkWpn6cyKmvtZT3006fYpj/ddLxf+8XoTeuRNzGUYIks0XeZGwMLDSFKwhV8BQTBNCmeLJrRYbU0UZJlmVshAuUpx+v7xI2sfVWr1avzmpNC7zOIpknxyQI1IjZ6RBrkmTtAgjEXkkz+TFeDCejFfjbd5aMPKZPfILxvsX2iaUAA==</latexit>

static

<latexit sha1_base64="LpnaRyw1wnUxY+PdGEKCf4dlZ1I=">AAACW3icbVHLTttAFJ2Y8qhLIRSx6mbUCIlFFTkh4rGjsGFJJQJISRSNx9dhlBnbmrmuGln+nn4NWyoW/AvXToqgcKUZHZ1zX3MmzLRyGAQPDW/pw/LK6tpH/9P6543N5taXK5fmVkJfpjq1N6FwoFUCfVSo4SazIEyo4TqcnlX69S+wTqXJJc4yGBkxSVSspECixs0fQ4TfWPcppIiMys3EAiRlUQuIhUNKlWXJ/d1/lBYICXKXCQnluNkK2kEd/C3oLECLLeJivNXwhlEqc0M9pBbODTpBhqNCWJqjofSHuQNqPRUTGBBMhAE3KuodS75LTMTj1NKhHWr2ZUUhjHMzE1KmEXjr/tcq8j1tkGN8NCpUkuX0ODkfFOeaY8or43ikLEjUMwJCWkW7cnkrrJBI9vrDCGL6gveMtJOwLMiQ7+RKr1vd3d7cteMqDp49eguuuu3Ofnv/Z691crrwb419Zd/YHuuwQ3bCztkF6zPJ/rA7ds/+Nh69Jc/31uepXmNRs81ehbfzBOQrtu0=</latexit>

late

nt<latexit sha1_base64="Hlv0j/5n5C2BDZDSj/ZFBgprYyk=">AAACZHicbVFdSxtBFJ2stepWa6z0qVCGBsEHCZsYrH0TfSk+KTQqJCHMzt6NgzO7y8zdYlj2P/lrBJ/0yd/h3U0qWr0ww+Gc+zVnwkwrh0Fw1/AWPix+XFpe8T+trn1eb258OXNpbiX0ZapTexEKB1ol0EeFGi4yC8KEGs7Dq6NKP/8L1qk0+YPTDEZGTBIVKymQqHHzeGuIcI11o0KKyKjcTCxAUha1gFg4pFxZltz/x2iBkGDpz2opIxMSynGzFbSDOvhb0JmDFpvHyXij4Q2jVOaGmkktnBt0ggxHhbA0T0PpD3MH1PpKTGBAMBEG3Kiody35FjERj1NLJ0Fesy8rCmGcm5qQMo3AS/e/VpHvaYMc4/1RoZIsp1fK2aA41xxTXjnII2VBop4SENIq2pXLS2GFRPLZH0YQ01+8Z6idhGVBhuyQK71udXd7M9d+VbH37NFbcNZtd3bbu6e91sHh3L9l9o39YNusw36yA/abnbA+k+yG3bJ79tB49Fa9Te/rLNVrzGs22avwvj8BUk664w==</latexit>

space

<latexit sha1_base64="utGRZkHdcL/f/4vrJ/OKP/BTykE=">AAACZXicbVHBThsxEHUW2tIAbWgrLhywiCJxQNEmRKXcEFyQuIDUAFISRV7vbLCwd1f2bEW02o/ia1BvcOI3mN0ERAsj2Xp6M/Nm/BykWjn0/b81b2Hxw8dPS5/ryyurX7421r6duySzEvoy0Ym9DIQDrWLoo0INl6kFYQINF8H1UZm/+APWqST+jdMURkZMYhUpKZCoceOkNUS4wUoolyI0KjMTCxAXeZVAzB1SrSwKXm89U1ogxFjUX0pSIaEYN5p+26+CvwWdOWiyeZyO12reMExkZkhMauHcoOOnOMqFpYEaSD9zQNLXYgIDgrEw4EZ5tWzBW8SEPEosnRh5xb7uyIVxbmoCqjQCr9z/uZJ8LzfIMPo1ylWcZvRKORsUZZpjwksLeagsSNRTAkJaRbtyeSWskEhG14chRPQZ7zlqJ0GRkyE75EqvW97d3sy1/TJ+vnj0Fpx3253d9u5Zr3lwOPdviW2wLbbNOmyPHbBjdsr6TLJbdsfu2UPt0Vv1fnjrs1KvNu/5zv4Jb/MJWFa7YQ==</latexit>

late

nt<latexit sha1_base64="Hlv0j/5n5C2BDZDSj/ZFBgprYyk=">AAACZHicbVFdSxtBFJ2stepWa6z0qVCGBsEHCZsYrH0TfSk+KTQqJCHMzt6NgzO7y8zdYlj2P/lrBJ/0yd/h3U0qWr0ww+Gc+zVnwkwrh0Fw1/AWPix+XFpe8T+trn1eb258OXNpbiX0ZapTexEKB1ol0EeFGi4yC8KEGs7Dq6NKP/8L1qk0+YPTDEZGTBIVKymQqHHzeGuIcI11o0KKyKjcTCxAUha1gFg4pFxZltz/x2iBkGDpz2opIxMSynGzFbSDOvhb0JmDFpvHyXij4Q2jVOaGmkktnBt0ggxHhbA0T0PpD3MH1PpKTGBAMBEG3Kiody35FjERj1NLJ0Fesy8rCmGcm5qQMo3AS/e/VpHvaYMc4/1RoZIsp1fK2aA41xxTXjnII2VBop4SENIq2pXLS2GFRPLZH0YQ01+8Z6idhGVBhuyQK71udXd7M9d+VbH37NFbcNZtd3bbu6e91sHh3L9l9o39YNusw36yA/abnbA+k+yG3bJ79tB49Fa9Te/rLNVrzGs22avwvj8BUk664w==</latexit>

space

<latexit sha1_base64="utGRZkHdcL/f/4vrJ/OKP/BTykE=">AAACZXicbVHBThsxEHUW2tIAbWgrLhywiCJxQNEmRKXcEFyQuIDUAFISRV7vbLCwd1f2bEW02o/ia1BvcOI3mN0ERAsj2Xp6M/Nm/BykWjn0/b81b2Hxw8dPS5/ryyurX7421r6duySzEvoy0Ym9DIQDrWLoo0INl6kFYQINF8H1UZm/+APWqST+jdMURkZMYhUpKZCoceOkNUS4wUoolyI0KjMTCxAXeZVAzB1SrSwKXm89U1ogxFjUX0pSIaEYN5p+26+CvwWdOWiyeZyO12reMExkZkhMauHcoOOnOMqFpYEaSD9zQNLXYgIDgrEw4EZ5tWzBW8SEPEosnRh5xb7uyIVxbmoCqjQCr9z/uZJ8LzfIMPo1ylWcZvRKORsUZZpjwksLeagsSNRTAkJaRbtyeSWskEhG14chRPQZ7zlqJ0GRkyE75EqvW97d3sy1/TJ+vnj0Fpx3253d9u5Zr3lwOPdviW2wLbbNOmyPHbBjdsr6TLJbdsfu2UPt0Vv1fnjrs1KvNu/5zv4Jb/MJWFa7YQ==</latexit>

transient

<latexit sha1_base64="4wHfmdYr4lLwocwvEXzaKryh8/I=">AAACaHicbVFNTxsxEHW2hULaQigHVPViESFxqKJNiPpxQ/TCqaISAaQkima9s8GK7V3Zs1Wj1f6r/hm4wqm/Au8mbfl6kq2n9zzj8XOUKekoDK8awYuXK6uv1tabr9+83dhsbb07c2luBQ5EqlJ7EYFDJQ0OSJLCi8wi6EjheTT7VvnnP9E6mZpTmmc41jA1MpECyEuT1vcR4S+q+xQCYi1zPbWIpixqg6ggC8ZJNFSWvLn3V1VAlfRfcBkILCetdtgJa/CnpLskbbbEyWSrEYziVOTadxMKnBt2w4zGBViSQmHZHOUOfesZTHHoqQGNblzUA5d8zysxT1LrlyFeq/crCtDOzXXkT2qgS/fYq8TnvGFOyZdxIU2W+2eKxUVJrjilvEqRx9KiIDX3BISVflYuLsGCIJ91cxRj4v/juVTtNCoLH8hHn0q/V+29/iK1rxU+/cvoKTnrdboHnYMf/fbh0TK/NfaB7bJ91mWf2SE7ZidswAT7za7ZDbtt/AlawU7wfnE0aCxrttkDBLt3ee+8zw==</latexit>

G<latexit sha1_base64="TEEKqEYuQd+TQ6h6Jitr41VTBWU=">AAAB8nicbVDLSsNAFL2pr1pfVZduBovgqiRWfOyKLnRZwT4gDWUynbRDJ5MwMxFK6Ge4caGIW7/GnX/jJA2i1gMDh3PuZc49fsyZ0rb9aZWWlldW18rrlY3Nre2d6u5eR0WJJLRNIh7Jno8V5UzQtmaa014sKQ59Trv+5Drzuw9UKhaJez2NqRfikWABI1gbye2HWI8J5unNbFCt2XU7B1okTkFqUKA1qH70hxFJQio04Vgp17Fj7aVYakY4nVX6iaIxJhM8oq6hAodUeWkeeYaOjDJEQSTNExrl6s+NFIdKTUPfTGYR1V8vE//z3EQHF17KRJxoKsj8oyDhSEcoux8NmaRE86khmEhmsiIyxhITbVqq5CVcZjj7PnmRdE7qTqPeuDutNa+KOspwAIdwDA6cQxNuoQVtIBDBIzzDi6WtJ+vVepuPlqxiZx9+wXr/ApI0kZQ=</latexit>

Updated with gradient of loss

26664

v̂1

v̂2

...v̂L

37775

<latexit sha1_base64="1Wu26tFhIELuEcx8b2p+HeRxzB8=">AAADP3icfVJNb9MwGHbCx0b56uDIxaKqNCRUJRvaxm0aFw47DIluk+quchyntebYkf2mWrHyz3bhL3DjyoUDCHHlhpt2U+lYXynS4+d5Xr8fcVJIYSGKvgbhnbv37q+tP2g8fPT4ydPmxrNjq0vDeJdpqc1pQi2XQvEuCJD8tDCc5onkJ8n5u6l+MubGCq0+wqTg/ZwOlcgEo+CpwUbQbZOED4VySU7BiIuq0SbALyDJnKnO3Cbx9MhmDqpX1SDGhOBVhi1v8DpLNdgZvNV66EWu0sW6tzXyaTHZ+uTXeJVUV19hWSp8RkAXjeXiZETBjWcjXx/q8cj4aror+rA+LN45aLaiTlQHvgniOWiheRwNml9IqlmZcwVMUmt7cVRA31EDgknu2yktLyg7p0Pe81DRnNu+q/9/hdueSXGmjf8U4JpdzHA0t3aSJ95Zr2FZm5L/03olZHt9J1RRAldsVigrJQaNp48Jp8JwBnLiAWVG+F4xG1FDGfgn16iX8HYaO9cj3wTHW514u7P94U1r/2C+jnX0Ar1EmyhGu2gfvUdHqItYcBl8C34EP8PP4ffwV/h7Zg2Dec5z9E+Ef/4CBdwQfQ==</latexit>


266664

r(t)1

r(t)2...

r(t)L

377775

<latexit sha1_base64="Acl9MFN689Yrmx0HhsMf5i3YwtQ=">AAADinicfVJdb9MwFHUTYCMM1sEjLxZVpaGxqe3QxsTLBDzwsIch0W1S00WO47RWHSeyb8pK5P/Cb+KNf4OTVF0/pl3J0rnnnOvre+UwE1xDp/Ov4bhPnj7b2n7uvdh5+Wq3uff6Sqe5oqxPU5Gqm5BoJrhkfeAg2E2mGElCwa7DyddSv54ypXkqf8IsY8OEjCSPOSVgqWCv8ccP2YjLIkwIKH5nPHVb7Ps2Geu4APPeBF3s+3iT7lna86dRCrpEG/qF5zMZ3d/bXm/U9oHdQRgXv81Srba1H/BjEi1bPmKxFy83vvUhzR7qPiZQTOvx7rNenS3GWggXdbY2EhMiKKqXFIpRYw58YXcfkaDmwKxYbFUmGBiDD7w2XrUKZZatIv11qIicGBM0W52jThV4E3TnoIXmcRk0//pRSvOESaCCaD3odjIYFkQBp4IZz881ywidkBEbWChJwvSwqL6SwW3LRDhOlT0ScMUuVxQk0XqWhNZZrXxdK8mHtEEO8adhwWWWA5O0bhTnAkOKy3+JI24XCGJmAaGK27diOiaKULC/16uWcFbGyWLkTXDVO+oeHx3/+Ng6/zJfxzZ6i96hfdRFp+gcfUeXqI+os+UcOifOqbvj9twz93NtdRrzmjdoJdxv/wHp7Cc9</latexit>

266664

z(t)1

z(t)2...

z(t)L

377775

<latexit sha1_base64="9+pgnqRMiN15IJzCuR+9dGzQ/FY=">AAADinicfVJdb9MwFHUTYCMM6OCRF4uq0tDY1HZoY+JlAh542MOQ6Dap6SLHcVqrjhPZN2Vd5P/Cb+KNf4OTVF0/pl3J0rnnnOvre+UwE1xDp/Ov4bhPnj7b2n7uvdh5+ep1c/fNpU5zRVmfpiJV1yHRTHDJ+sBBsOtMMZKEgl2Fk2+lfjVlSvNU/oJZxoYJGUkec0rAUsFu448fshGXRZgQUPzWeHc3xZ5vk7GOCzAfTNDFvo836Z6lPX8apaBLtKGfez6T0f297fVGbR/YLYRxcWeWarWt/Ygfk2jZ8hGLvXi58Y0PafZQ9zGBYlqPd5/16mwx1kI4r7O1kZgQQVG9pFCMGrPvC7v7iAQ1B2bFYqsywcAYvO+18apVKLNsFenvA0XkxJig2eocdqrAm6A7By00j4ug+dePUponTAIVROtBt5PBsCAKOBXMeH6uWUbohIzYwEJJEqaHRfWVDG5bJsJxquyRgCt2uaIgidazJLTOauXrWkk+pA1yiD8PCy6zHJikdaM4FxhSXP5LHHG7QBAzCwhV3L4V0zFRhIL9vV61hNMyjhcjb4LL3mH36PDo56fW2df5OrbRO/Qe7aEuOkFn6Ae6QH1EnS3nwDl2Ttwdt+eeul9qq9OY17xFK+F+/w8xUydV</latexit>

⇥z(s), z(s), · · · , z(s)

⇤><latexit sha1_base64="cBd0FoCeNzyEVUGsMyvP2TJN7C4=">AAADcnicbVJLbxMxEHY2PMrySkFcQAJDFKlVS5S0FY9bBRcOPRSJtJXidOX1ehMrXu/Knk0bVr7z+7jxK7jwA3A2C026HcnSN9/MeGY+TZhJYaDX+9Xwmrdu37m7cc+//+Dho8etzScnJs014wOWylSfhdRwKRQfgADJzzLNaRJKfhpOPy/ipzOujUjVN5hnfJTQsRKxYBQcFWw2fnRIyMdCFWFCQYtL63e+nxdbxHkTExdgt23Qx4TgG/g9x/sdMotSMCWsZRy5MFfR1efXm61WGFexi+sMYYsG9Yi/9vU5gTTz69uQCYVitlziyquN/i9wtPTWhnaulEFBgF9CoTmzdodIp3FEgyUHdi3FVWWSg7V4x+/g9VSp7WqqTC/eaqqm1gatdq/bKw3XQb8CbVTZcdD6SaKU5QlXwCQ1ZtjvZTAqqAbBJHfq5IZnlE3pmA8dVDThZlSUJ2NxxzERjlPtngJcsqsVBU2MmSehyywVvx5bkDfFhjnEH0aFUFkOXLFloziXGFK8uD8cCScgyLkDlGnhZsVsQjVl4K7UL0X4uLB3/1eug5O9bn+/u//1oH34qZJjA71Ab9AW6qP36BB9QcdogFjjt/fMe+m98v40nzdfNyvtvEZV8xStWXP3LxROHS8=</latexit>


`rec + �s`static + �t`triplet

<latexit sha1_base64="AcJcVKFtP62R8MiPdUBkQVCABfQ=">AAACTXicbVHLSgMxFM3UR2t9VV26CRZBEMqMVHRZdOOygn1Ap5RM5taGZh4kd8QyzA+6Edz5F25cKCKmD0HbXggczjn33uTEi6XQaNuvVm5ldW09X9gobm5t7+yW9vabOkoUhwaPZKTaHtMgRQgNFCihHStggSeh5Q2vx3rrAZQWUXiHoxi6AbsPRV9whobqlXwXpOylLsIjpgp4ltFT6kozwGe/tM6yvy6NpnepEf8bUYlYgiF7pbJdsSdFF4EzA2Uyq3qv9OL6EU8CCJFLpnXHsWPspkyZxRKyoptoiBkfsnvoGBiyAHQ3naSR0WPD+LQfKXNCpBP2b0fKAq1HgWecAcOBntfG5DKtk2D/spuKME4QQj5d1E8kxYiOo6W+MAGiHBnAuBLmrpQPmGIczQcUTQjO/JMXQfOs4lQr57fVcu1qFkeBHJIjckIcckFq5IbUSYNw8kTeyAf5tJ6td+vL+p5ac9as54D8q1z+BwPnuJQ=</latexit>

`rec + �s`static

<latexit sha1_base64="33uAVJkog2VZw8M6iBbvAvNAjhU=">AAACI3icbVDJSgNBEO2JW4xb1KOXxiAIQpiRiOIp6MVjBLNAJoSeTiVp0rPQXSOGYf7Fi7/ixYMSvHjwX+wsQkx80PB49aqq63mRFBpt+8vKrKyurW9kN3Nb2zu7e/n9g5oOY8WhykMZqobHNEgRQBUFSmhECpjvSah7g9txvf4ISosweMBhBC2f9QLRFZyhkdr5axekbCcuwhMmCnia0jPqSjOgw35lnabzLo2m1xjb+YJdtCegy8SZkQKZodLOj9xOyGMfAuSSad107AhbCVNmnIQ058YaIsYHrAdNQwPmg24lkxtTemKUDu2GyrwA6USd70iYr/XQ94zTZ9jXi7Wx+F+tGWP3qpWIIIoRAj5d1I0lxZCOA6MdYWJBOTSEcSXMXynvM8U4mlhzJgRn8eRlUjsvOqXixX2pUL6ZxZElR+SYnBKHXJIyuSMVUiWcPJNX8k4+rBfrzRpZn1Nrxpr1HJI/sL5/ALumptY=</latexit>

`rec + �s`static + �t`triplet

<latexit sha1_base64="AcJcVKFtP62R8MiPdUBkQVCABfQ=">AAACTXicbVHLSgMxFM3UR2t9VV26CRZBEMqMVHRZdOOygn1Ap5RM5taGZh4kd8QyzA+6Edz5F25cKCKmD0HbXggczjn33uTEi6XQaNuvVm5ldW09X9gobm5t7+yW9vabOkoUhwaPZKTaHtMgRQgNFCihHStggSeh5Q2vx3rrAZQWUXiHoxi6AbsPRV9whobqlXwXpOylLsIjpgp4ltFT6kozwGe/tM6yvy6NpnepEf8bUYlYgiF7pbJdsSdFF4EzA2Uyq3qv9OL6EU8CCJFLpnXHsWPspkyZxRKyoptoiBkfsnvoGBiyAHQ3naSR0WPD+LQfKXNCpBP2b0fKAq1HgWecAcOBntfG5DKtk2D/spuKME4QQj5d1E8kxYiOo6W+MAGiHBnAuBLmrpQPmGIczQcUTQjO/JMXQfOs4lQr57fVcu1qFkeBHJIjckIcckFq5IbUSYNw8kTeyAf5tJ6td+vL+p5ac9as54D8q1z+BwPnuJQ=</latexit>

low – rank condition+

<latexit sha1_base64="3DnS3vImeIO9jz1IyYjxDR65oEU=">AAAB6HicbVDLSgNBEOyNrxhfUY9eBoMgCGFXInoMevGYgHlAsoTZSW8yZnZ2mZkVQsgXePGgiFc/yZt/4yTZgyYWNBRV3XR3BYng2rjut5NbW9/Y3MpvF3Z29/YPiodHTR2nimGDxSJW7YBqFFxiw3AjsJ0opFEgsBWM7mZ+6wmV5rF8MOME/YgOJA85o8ZK9YteseSW3TnIKvEyUoIMtV7xq9uPWRqhNExQrTuemxh/QpXhTOC00E01JpSN6AA7lkoaofYn80On5MwqfRLGypY0ZK7+npjQSOtxFNjOiJqhXvZm4n9eJzXhjT/hMkkNSrZYFKaCmJjMviZ9rpAZMbaEMsXtrYQNqaLM2GwKNgRv+eVV0rwse5XyVb1Sqt5mceThBE7hHDy4hircQw0awADhGV7hzXl0Xpx352PRmnOymWP4A+fzB3QdjLc=</latexit>

Figure 1: Overview of Proposed Architecture. We map a video (with L frame sequence) intotwo learnable latent spaces. We jointly learn the static latent space and the transient latent spacealong with the network weights.


Related Works

Learning Loss I

Learning the weights of the network:

Let weights of G, and the RNN be γ and θ, respectively. We jointly optimize for θ, γ, and{ZVj}N

j=1 (sampled once in the beginning of training) for every epoch in two stages:

Stage 1 : minγ

`(Vj ,G(ZVj )|

(ZVj , θ

))(3.1)

Stage 2 : minZV,θ

`(Vj ,G(ZVj )|γ

)(3.2)

The index j represents a random video out of N videos chosen from the dataset. `(·) canbe chosen to be any distance based loss. We will refer to both (3.1) and (3.2) together asminZV,θ,γ

`rec.


Related Works

Learning Loss II

To equally capture the static portion of the video, we randomly choose a frame from thevideo and ask the generator to compare its corresponding generated frame during training.For this, we update the above loss as follows.

minZV,θ,γ

(`rec + λs`static) (4)

where `static = `(v̂k , vk) with k ∈ [1, 2, · · · , L] is a randomly chosen index, vk is theground truth frame, v̂k = G(zk), and λs is the regularization constant.


Related Works

Triplet ConditionLearning the latent space:

z(t),ni · · · z

(t)i · · · · · · z

(t),ai z

(t),pi · · · · · ·

Negative range Negative rangePositive range

1

Figure 2: Triplet Condition in the transient latent space. Latent code representation of shortvideo clips may lie very near to each other in the transient subspace. Using the proposed tripletcondition, our model learns to explain the dynamics of similar looking frames and map them todistinct latent vectors.


Related Works

Triplet ConditionPositive frames are randomly sampled within a margin range α of the anchor and negativesare chosen outside of this margin range. Defining a triplet set with transient latent codevectors {z (t),a

i , z (t),pi , z (t),n

i }, we aim to learn the transient embedding space z(t) such that

‖z (t),ai − z (t),p

i ‖22 + α < ‖z (t),ai − z (t),n

i ‖22

∀ {z (t),ai , z (t),p

i , z (t),ni } ∈ Γ, where Γ is the set of all possible triplets in z(t). With the above

regularization, the loss in (4) can be written as

minZV,θ,γ

(`rec + λs`static)

s.t. ‖z (t),ai − z (t),p

i ‖22 + α < ‖z (t),ai − z (t),n

i ‖22 (5)

where α is a hyperparameter that controls the margin while selecting positives and negative.


Related Works

Qualitative ResultsMoC

oGAN

Ours

MoC

oGAN

Ours

VGAN

Ours

(a) Chair-CAD [7]

MoC

oGAN

Ours

MoC

oGAN

Ours

VGAN

Ours

(b) Weizmann Human Action [8]

Figure 3: Qualitative results. The proposed method produces visually sharper, and consistentlybetter using the non-adversarial training protocol.


Related Works

Quantitative Results

MCS ↓ FCS ↑

Bound 0.0 0.91MoCoGAN [3] 4.11 0.85

Ours (-`triplet-`static) 3.83 0.77Ours (+`triplet+`static) 3.32 0.89

(a) Chair-CAD [7]

MCS ↓ FCS ↑

Bound 0.0 0.95MoCoGAN [3] 3.41 0.85

Ours (-`triplet-`static) 3.87 0.79Ours (+`triplet+`static) 2.63 0.90

(b) Weizmann Human Action [8]

Table 2: Quantitative results. We obtained better scores on the proposed method on both Chair-CAD [7], and Weizmann Human Action [8] datasets, compared to the adversarial approaches(MoCoGAN, and VGAN1). Best scores have been highlighted in bold.

1not shown here


Related Works

Application: Action exchange

Actions IdentitiesP1 P2 P3 P4 P5 P6 P7 P8 P9

run • • • •walk • • • •jump • • • • • • •skip • • • • • • •

Table 3: Generating videos by exchanging unseen actions by identities. Each cell in this tableindicates a video in the dataset. Only cells containing the symbol • indicate that the video waspart of the training set. We randomly generated videos corresponding to rest of the cells indicatedby symbols •, •, •, and •, visualized in Fig. 4.


Related Works

Application: Action exchange

jump

run

skip

walk

wave2

Figure 4: Examples of action exchange to generate unseen videos. This figure demonstratesthe effectiveness of our method in disentangling static and transient portion of a video, as well asthe efficacy in generating videos unseen during training. The colored bounding boxes indicate theunseen video generated referred in Tab. 3.


Related Works

Conclusions

• We presented a non-adversarial approach for synthesizing videos by jointly optimizingboth network weights and input latent space.

• Our approach allows us to generate videos from any mode of data distribution, performframe interpolation2, and generate videos unseen during training.

• Experiments on three3 standard datasets show the efficacy of our proposed approachover state-of-the-methods.

AcknowledgmentThis work was partially supported by NSF grant 1664172 & ONR grant N00014-19-1-2264.

2not shown in slides3only two shown in slides


Related Works

References I

[1] Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. “Generating videos with scenedynamics”. In: Advances In Neural Information Processing Systems. 2016, pp. 613–621.

[2] Masaki Saito, Eiichi Matsumoto, and Shunta Saito. “Temporal generative adversarialnets with singular value clipping”. In: Proceedings of the IEEE International Conferenceon Computer Vision. 2017, pp. 2830–2839.

[3] Sergey Tulyakov et al. “Mocogan: Decomposing motion and content for video gen-eration”. In: Proceedings of the IEEE conference on computer vision and patternrecognition. 2018, pp. 1526–1535.

[4] Jiawei He et al. “Probabilistic video generation using holistic attribute control”. In:Proceedings of the European Conference on Computer Vision (ECCV). 2018, pp. 452–467.


Related Works

References II

[5] Piotr Bojanowski et al. “Optimizing the Latent Space of Generative Networks”. In:International Conference on Machine Learning. 2018, pp. 599–608.

[6] Yedid Hoshen, Ke Li, and Jitendra Malik. “Non-Adversarial Image Synthesis withGenerative Latent Nearest Neighbors”. In: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition. 2019, pp. 5811–5819.

[7] Mathieu Aubry et al. “Seeing 3d chairs: exemplar part-based 2d-3d alignment usinga large dataset of cad models”. In: Proceedings of the IEEE conference on computervision and pattern recognition. 2014, pp. 3762–3769.

[8] Lena Gorelick et al. “Actions as space-time shapes”. In: IEEE transactions on patternanalysis and machine intelligence 29.12 (2007), pp. 2247–2253.


Related Works

Thank you!


Related Works

abhishek aich , akash gupta , rameswar panda , m. salman ... · referencesii [5]piotr bojanowski et...

Documents