semantic content analysis for advanced video processing ... · 4. hyperspectral imaging and machine...

Semantic Content Analysis for Advanced Video Processing and Understanding

Dr. Jinchang [email protected]

Topics of Interest

1. Semantic Video Content Analysis

2. Image analysis, fusion and recognition

3. Motion estimation and image registration

4. Content-based video annotation & retrieval

5. Archive film restoration

6. Video surveillance and 3-D vision

7. Hyperspectral Imaging & Machine Learning

Main Workflow

video AnalysisAnalysis UnderstandingUnderstanding

Objectstexts and structures

Content delivery

MultimediaDatabase(data & Indexes)

Structure indexing

Object indexing

Text indexing

Event indexing

users

Query,BrowsingAbstract, Skimming

Raw data

Using learning and recognition techniques for content analysis and understanding based on segmentation and classification

Extracting (offline)

Feature Extraction

Semantic concepts detection

Shot Detection

Detection of camera motion & moving objects

Semanticconcepts

humanobjects

Shot events

Semantic video indexing and annotation

Human object detection

Camera motionevents & moving objects

Semantic Video Content Analysis

Applications (online)

Semantic video indexing and annotation

Enabled apps

Semantic video retrieval

qualityindex Content-adaptive video

summarisation

Content delivery

query

Sports video analysis and reconstruction

Archived video restoration

Semantic Video Content Analysis

Model-based Shot Events Detection

Feature extraction and selection (via AdaBoost)Cut detection

Pre-filteringModel-based detectionValidation

Detect gradual transitions

Combined shotsFade/Dissolve

Fusion

List of shots

Location of current shot

Current shot

Ref: J. Ren etc., Shot boundary detection in MPEG videos using local and global indicators, IEEE Trans. CSVT, 19(8): 1234-1238, 2009

Results: No. 1 in cut detection and No. 3 on overall performance in TRECVID 2007.

• Robust motion detection under various conditions via spatial/luminance normalisation;

• Global motion estimation for image registration and camera motion detection;

• Robust image registration using gradient-based subspace phase correlation;

• High-accuracy sub-pixel motion detection in spatial and frequency domain.

Detection of Motion and Moving Object

Two original images normalized results Detected objects

Invariant Moving Object Detection

Ref: J. Ren etc., A General Framework for Vision-Based Interactive Board Games. Proc. 4th Int. Conf. on Intelligent Games and Simulation, pp. 238-42, London, Nov. 2003

Ref: J. Ren etc., High-accuracy Sub-pixel Motion Estimation from Noisy Images in Fourier Domain, IEEE Trans. Image Processing,19(5): 1379-94, 2010

Original two frames from coast-guard sequence

Raw difference and motion compensated result

1) Subspace phase correlation is more robust to additive noise;

2) Subspace correlation using 1D FFT is more efficient;

3) Interpolation using the main peak and its two side-peaks yields more accurate estimates.

4) Good results are achieved from video frames, general images, MRI and remote sensing images.

Subpixel Motion Estimation

Ref: J. Ren etc., Extracting Objects and Events from MPEG Sequences for Video Highlights Indexing and Retrieval, LNCS, 2007

Human Object Detection• Modelling of skin pixels

for human object detection– Statistical modelling– YCbCr space– Supervised learning– Compressed domain– Adaptive thresholding

• Bayesian classification

skinnonskinep

skinep

b

b →>η)/(

)/(

Ref: J. Ren etc., Extracting Objects and Events from MPEG Sequences for Video Highlights Indexing and Retrieval, Journal of Multimedia (JMM), Academy, vol. 5, no. 2, 2010

Semantic Video Content Retrieval

Promising results have been achieved in query by video highlights.

Activity-Driven Video Summarisation

Main difficulties in summarising rush videos:Accurate modelling of several kinds of junk framesDetermine retakes (varying from 1 to more than 20);Extracting content of interest (COI) for effective summarisationHow to achieve objective evaluation is unsolved.

Hierarchical modelling via formal language descriptions and adaptive clustering of retakes;Excitement modelling is used to determine COIs.

Activity-Driven Video Summarisation

The original video is summarised to less than 3% in frames whilst keeping over 80% of key contents at a speed over 5 times of real-time play.

Ref: J. Ren etc., Hierarchical modeling and adaptive clustering for real-time summarization of rush videos, IEEE T-Multimedia, vol. 11, no. 5 pp. 906-917, Aug. 2009.

Video summarisation in TRECVID’0839 clips; >1.5million frames (17.2h at 25fps), MPEG‐1 format44 teams registered, 32 had results submitted in 43 groups;

Evaluation criteria9 criteria in 3 groups covering objective/subjective/usability measuresUnder a combined measurement, our result was ranked the 2nd or the 3rd best.

Based on the work in Surrey, see papers published in IEEE T-SMCB, Signal Processing and SPIE Optical Engineering, et al.

• Segmentation-assisted effective detection of film dirt;

• Global motion compensated robust detection of dirt in colour images;

• Detection supported concealment of dirt;

•Improved motion estimation for refined coding and segmentation applications.

Archived Video Restoration

Quantitative Performance

ROC analysis of several methods on dirt detection, GMCC and Conf refer to our methods with or without global motion compensation.

Restored Frames

a) Original image b) SDIp (grey) c) ML3Dex (global) d) Our method

Due to missing detection or false alarms, recovered image appears poor quality even with over-smoothing in c) image via global ML3Dex filtering.

Ref: J. Ren etc., Missing-Data Recovery from Dirt Sparkles on Degraded Color Films. SPIE Journal of Optical Engineering, 46(7), DOI: 10.1117/1.2751162, 2007

Based on the work in Kingston, see papers published in IEEE T-CSVT, CVIU, and Machine Vision and Applications et al.

•Video surveillance via multiple fixed cameras;

• Background modelling (GMM + running average)• Tracking players and the ball for soccer game reconstruction; • Trajectory-based modelling and tracking of multiple objects in multi-view sports scenarios;• Geometric modelling for 3-D ball positioning;• Modelling and classifying of motion phases and events for semantic analysis.

Sports Video Analysis & Reconstruction

Improved Tracking

Motion correction results in four consecutive frames (l-r) when the ball of ID 10 merged with a player (ID 8) using our tracking plus matching method in overcoming occlusion.

Ref: J. Ren etc., Tracking the Soccer Ball using Multiple Fixed Cameras. Computer Vision and Image Understanding. Vol. 113, no. 5, pp. 633-642, May, 2009.

detected foreground

Without correction

With correction

Ref: J. Ren etc., Real-time Modeling of 3-D Soccer Ball Trajectories from Multiple Fixed Cameras. IEEE Trans. Circuits Syst. Video Techn. (T-CSVT), 18(3): 350-362, 2008

Reconstructed Soccer Game

Motion Phase Classification and Phase-Specific Tracking

Four phases are defined as rolling, flying, possessed and out of play; ball motion is modelled as phase transition cycles starting from possessed and ending at out of playFor different phases, linear or non-linear model is applied for estimating the trajectory.

Ball trajectory in a whole phase transition cycle

Recognised motion phases comparing with manual GT

Hyperspectral ImagingAble to identify changes in moisture and temperature and even difference of chemical component due to continuous spectral band images captured;

• 5-10nm spectral resolution covering visual and near-infrared range

• Desktop analysis beyond remote sensing

Typical applications:• Food quality control and assessment (fruit, vegetable, meat, tea, wine…)

• Pharmaceutical for tablet analysis and material analysis

• Security and forensics (fingerprint extraction, fake document/stamps identification)

• Environmental monitoring and land usage evaluation (city planning…)

Machine Learning & Pattern Recognition

Using SVM, ANN, HMM, decision trees, Adaboost, clustering, Bayesian classifiers etc for various recognition/classification applications;

•Classification of MCCs in mammogram imaging

• Handwritten text recognition

• Decision making in extraction, detection and classification of image/video contents/events

1. Entertainments, Training and Education• Sports: automatic game analysis and performance

evaluation;• Media: Immersive experiencing of digital media contents for

museum /film /TV /entertainments (with restored original quality);

2. Health, Safety and Security• Intelligent surveillance for monitoring and health care;• Machine learning in medical imaging ;• Crime prevention and accidental events prediction;

3. Business for improved efficiency and productivity.• Automatic systems to assist or replace human beings

Potential Applications

1. Media Understanding• Extraction of high-level semantics• Surveillance event detection• Objective quality evaluation of coded images and summarised

videos• Image and video mining from social websites

2. Information Retrieval• Semantic video content retrieval with support from multiple clues• Generic model for video copy detection• Evaluation and retrieval of medical images• From face detection to clustering-based recognition in

interpreting of image/video

Possible Projects for Collaborations

3. Archive Restoration and Digital Preservation• Archived image/video restoration via learning using

spatial (and temporal) consistency;• Improved coding• Applications in protection of historical Chinese video asset

4. Hyperspectral Imaging and Machine Learning• Food quality control and assessment• Chinese painting analysis• Quality monitoring and verification of museum collections

Possible Projects for Collaborations

Thank you for your attention!

Any Questions?

semantic content analysis for advanced video processing ... · 4. hyperspectral imaging and machine...

Documents