understanding video with deep convolubonal neural networks · computer vision victoria dean, carl...

1
Computer Vision Victoria Dean, Carl Vondrick, Antonio Torralba (Electrical Engineering and Computer Science) Cisco Undergraduate Research and InnovaBon Scholar Understanding Video with Deep ConvoluBonal Neural Networks Neural Networks Unsupervised Learning for Frame PredicBon Current Frame PredicBon Models Current status Next Steps and References Given past frames, predict future frame The previous frames are inputs to the predictor, which a6empts to generate the next frame x t+1 Cost func<on computes how close predicted frame pixels are to actual next frame pixels Problem: Predictor tends to output mean frame Neural network computes a differen<able func<on of its input A convolu<onal neural network (CNN) applies the same func<on to each sec<on of the input (good for computer vision tasks) Useful for many tasks, including pedestrian detec<on for self-driving cars Figure from [3] Figures and defini<ons from [2] Photo from [1] Try out different network architectures (number of layers, dimensions, etc.) and determine which is the most successful for predic<on Experiment with different cost func<ons: do any yield predic<ons that are be6er than just mean? Implement LSTM (Long Short Term Memory) network for learning greater context in videos References [1] h6p://www.nvidia.com/content/tegra/automo<ve/images/driver- assistance/pedestrian-detec<on-large.jpg [2] h6p://www.image-net.org/challenges/LSVRC/2012/supervision.pdf [3] h6p://arxiv.org/abs/1412.6604 [4] h6p://caffe.berkeleyvision.org/ [5] h6p://arxiv.org/abs/1311.2524 Implementa<on of simple predic<on model in Caffe, a popular open-source library for CNNs [4] Current model generates future frame given one past frame Simplified problem to predic<on of cropped frame based on object detector bounding boxes [5] Considering cost func<ons that will hopefully improve issue with outpu_ng mean frame Much successful work on images, but less done on videos (labeling videos is expensive)

Upload: others

Post on 24-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • ComputerVision

    VictoriaDean,CarlVondrick,AntonioTorralba(ElectricalEngineeringandComputerScience)CiscoUndergraduateResearchandInnovaBonScholar

    UnderstandingVideowithDeepConvoluBonalNeuralNetworks

    NeuralNetworks

    UnsupervisedLearningforFramePredicBon CurrentFramePredicBonModels

    Currentstatus NextStepsandReferences

    Givenpastframes,predictfutureframe•  Thepreviousframesareinputstothepredictor,

    whicha6emptstogeneratethenextframext+1•  Costfunc