video synopsis by heterogeneous multi-source correlation problem: how to generate semantic synopsis...

Download Video Synopsis by Heterogeneous Multi-Source Correlation Problem: How to generate semantic synopsis given long video streams by exploiting information

If you can't read please download the document

Upload: sheena-watts

Post on 13-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

Video Synopsis by Heterogeneous Multi-Source Correlation Problem: How to generate semantic synopsis given long video streams by exploiting information beyond low-level visual features? Introduction Input: a long video sequence Output: a concise semantic video synopsis event 1 event 2 event 3 Learning a multi-source video synopsis model Visual Features Event calendar Sensor-based traffic data Weather forecast Non-Visual Auxiliary Data Complement Xiatian Zhu Queen Mary, University of London [email protected] Chen Change Loy The Chinese University of Hong Kong [email protected] Shaogang Gong Queen Mary, University of London [email protected] 1 Motivation 2 Structure-driven tag inference Non-trivial problem that requires joint learning to discover latent associations between heterogeneous multiple data sources: Heteroscedasticity problem, e.g. very different representations Individual data sources can be inaccurate and incomplete Non-visual data is not always available, nor synchronised with visual data Clustering evaluation Tag inference evaluation Semantic video synopsis Capture the common physical phenomenon, thus intrinsically correlated 3 What content is meaningful? Contributions: Generate semantic video synopsis by jointly learning heterogeneous data sources in an unsupervised manner Handle missing non-visual data Existing video synopsis methods: typically rely on visual cues alone, this is inherently unreliable difficult to bridge the semantic gap between low-level visual features and high-level semantic content interpretation required for better summarisation 4 Joint optimisation of individual information gain Isolate different characteristics of different sources Accommodate partial or completely missing non-visual data Step (a): Constrained Clustering Forest (CC-Forest) where : the total information gain : gain in individual sources : inherent source impurity : source weights, with Merits of the proposed CC-Forest: Handle missing non-visual data An adaptive source weighting method: 1.Reweight the -th non-visual source as: with the missing ratio 2.Renormalise all source weights to ensure: Infer non-visual tag of a test sample Step (a): trace the target leaf of tree - search for the leaf of each tree falls into Step (b): retrieve leaf level clusters - derived from training samples sharing the same leaf node - search for nearest clusters whose tag distribution is used as tree-level prediction Step (c): average tree-level predictions - yield a smooth prediction Datasets Two datasets collected from publicly available webcams: TIme Square Intersection (TISI) and Educational Resource Centre (ERCe) dataset ERCeTISI Non-visual auxiliary data: TISI: weather, traffic speed ERCe: campus event calendar Weather Traffic speed Event calendar DatasetTISIERCe Methodtraffic speedweatherevent VO-Forest [1]0.86751.06760.0616 VNV-Kmeans0.91971.49941.2519 VNV-AASC [2]0.72170.70390.0691 VNV-CC-Forest*0.72620.60710.0024 VPNV10-CC-Forest*0.71900.62610.0024 VPNV20-CC-Forest*0.72830.64970.0090 Table 1. Mean entropy of cluster NV tag distribution (Red: the best) 5 6 7 (1)Student Orientation, (2)Career Fair, (3)Cleaning, (4)Group Studying, (5) Gun Forum, (6)Scholarship Competition. MethodVO- Forest [1] VNV- Kmeans VNV- AASC [2] VNV-CC- Forest VPNV10- CC-Forest VPNV20- CC-Forest traffic speed27.6237.8036.1335.7737.9938.05 weather50.6543.1444.3761.0555.9954.97 Table 2. TISI: tag inference accuracy comparison (Red: the best) MethodVO- Forest [1] VNV- Kmeans VNV- AASC [2] VNV-CC- Forest VPNV10- CC-Forest VPNV20- CC-Forest No Schd. Event79.4887.9148.5155.9847.9655.57 Cleaning39.5019.3345.8041.2846.6446.22 Career Fair94.4159.3879.77100.0 Gun Forum74.8244.3084.9383.8285.29 Group Studying92.9746.2596.8897.66 95.78 Schlr Comp.82.7416.7189.4099.4699.7399.59 Accom. Service00.00 21.1537.26 37.02 Stud. Orient.60.949.7738.8788.0992.3888.09 Average65.6135.4563.1675.6975.8775.95 Table 3. ERCe: tag inference accuracy comparison (Red: the best) * Our methods; VO = visual only; VNV = visual + non-visual; VPNVxx = xx% missing ratio of the training non-visual data. ERCe: tag inference confusion matrices comparison TISI: tag inference confusion matrices comparison 8 Source association 9 Visual-Visual Vehicle detection and traffic speed ERCe: summarisation of some key events TISI: A synopsis of weather+trafc changes TISI: discovered latent correlations among visual and non-visual sources Training a synopsis model (overview) Step (b-c): Multi-Source Latent Cluster Discovery (1) Derive a multi-source-aware affinity matrix from a learned CC-Forest: (2) Symmetrically normalise the affinity matrix, obtain (3) Perform spectral clustering [3] on, with automatically estimated cluster number (4) Predict a unique distribution of each non-visual data for a cluster where is a tree-level affinity, with element defined as: with where denotes a diagonal matrix with elements Each training sample is then assigned to a cluster where refers to the training sample set in [1] L. Breiman. Random forests. ML, 2001 [2] H.-C. Huang, Y.-Y. Chuang, C.-S. Chen. Afnity aggregation for spectral clustering. CVPR, 2012 [3] L. Zelnik-manor and P. Perona. Self-tuning spectral clustering. NIPS, 2004 Project page: http://www.eecs.qmul.ac.uk/~xz303/ TISI: cluster purity example sunny (Red box: errors) Tree 1 Leaves (a) (b) (c) Nearest Clusters Tag Distribution Tree Visual Data Non-Visual Data Constrained Clustering Forest Tree 1 (a) (b) Tree Cluster 1 Cluster Non-visual tag distribution Affinity matrix (c) Graph partition Non-visual tag distribution VNV-Kmeans (14/75) VNV-AASC [2] (372/1324) VO-Forest [1] (43/45) VNV-CC-Forest (58/58) VPNV10-CC-Forest (50/73) VPNV20-CC-Forest (29/31) Methods Samples in a cluster VO-Forest [1]VNV-Kmeans VNV-AASC [2] VPNV10-CC-ForestVPNV20-CC-Forest VNV-CC-Forest No Schd. Event Cleaning Career FairGun Forum Group Studying Schlr Comp. Accom. Service Stud. Orient. No Schd. Event Cleaning Career FairGun Forum Group Studying Schlr Comp. Accom. ServiceNo Schd. Event Cleaning Career FairGun Forum Group Studying Schlr Comp. Accom. Service Stud. Orient. No Schd. Event Cleaning Career Fair Gun Forum Group Studying Accom. Service Stud. Orient. Schlr Comp. No Schd. Event Cleaning Career Fair Gun Forum Group Studying Accom. Service Stud. Orient. Schlr Comp. Sunny Cloudy Rainy Sunny Cloudy Rainy Sunny Cloudy Rainy Sunny Cloudy Rainy Sunny Cloudy Rainy Sunny Cloudy Rainy Sunny Cloudy Rainy VNV- Kmean s VO- Forest VNV-CC- Forest VPNV10- CC-Forest VPNV20- CC-Forest VNV- AASC W = Cloudy, T = Fast Day 1 06am10am 17pm22pm W = Sunny, T = Slow W = Cloudy, T = V.Slow Day 2 Day 3 Day 6 W = Cloudy, T = Fast 10am06am 17pm19pm W = Sunny, T = Slow W = Cloudy, T = Slow W = Sunny, T = Slow 06am10am 16pm22pm W = Cloudy, T = FastW = Sunny, T = Slow W = Cloudy,T = SlowW = Cloudy,T = V.Slow W = Cloudy, T = Fast W = Sunny, T = Slow W = Cloudy, T = V.Slow 06am11am 16pm22pm W = Cloudy, T = Slow 01-0901-27 02-07 03-01 16pm13pm16pm11am 14pm10am 15pm13pm Career Fair Group Studying Stud. Orient. Schlr. Comp. person detection in regions 1-16 vehicle detection in regions 1-16