semantic content analysis for advanced video processing ... · 4. hyperspectral imaging and machine...
TRANSCRIPT
Semantic Content Analysis for Advanced Video Processing and Understanding
Dr. Jinchang [email protected]
Topics of Interest
1. Semantic Video Content Analysis
2. Image analysis, fusion and recognition
3. Motion estimation and image registration
4. Content-based video annotation & retrieval
5. Archive film restoration
6. Video surveillance and 3-D vision
7. Hyperspectral Imaging & Machine Learning
Main Workflow
video AnalysisAnalysis UnderstandingUnderstanding
Objectstexts and structures
Content delivery
MultimediaDatabase(data & Indexes)
Structure indexing
Object indexing
Text indexing
Event indexing
users
Query,BrowsingAbstract, Skimming
Raw data
Using learning and recognition techniques for content analysis and understanding based on segmentation and classification
Extracting (offline)
Feature Extraction
Semantic concepts detection
Shot Detection
Detection of camera motion & moving objects
Semanticconcepts
humanobjects
Shot events
Semantic video indexing and annotation
Human object detection
Camera motionevents & moving objects
Semantic Video Content Analysis
Applications (online)
Semantic video indexing and annotation
Enabled apps
Semantic video retrieval
qualityindex Content-adaptive video
summarisation
Content delivery
query
Sports video analysis and reconstruction
Archived video restoration
Semantic Video Content Analysis
Model-based Shot Events Detection
Feature extraction and selection (via AdaBoost)Cut detection
Pre-filteringModel-based detectionValidation
Detect gradual transitions
Combined shotsFade/Dissolve
Fusion
List of shots
Location of current shot
Current shot
Ref: J. Ren etc., Shot boundary detection in MPEG videos using local and global indicators, IEEE Trans. CSVT, 19(8): 1234-1238, 2009
Results: No. 1 in cut detection and No. 3 on overall performance in TRECVID 2007.
• Robust motion detection under various conditions via spatial/luminance normalisation;
• Global motion estimation for image registration and camera motion detection;
• Robust image registration using gradient-based subspace phase correlation;
• High-accuracy sub-pixel motion detection in spatial and frequency domain.
Detection of Motion and Moving Object
Two original images normalized results Detected objects
Invariant Moving Object Detection
Ref: J. Ren etc., A General Framework for Vision-Based Interactive Board Games. Proc. 4th Int. Conf. on Intelligent Games and Simulation, pp. 238-42, London, Nov. 2003
Ref: J. Ren etc., High-accuracy Sub-pixel Motion Estimation from Noisy Images in Fourier Domain, IEEE Trans. Image Processing,19(5): 1379-94, 2010
Original two frames from coast-guard sequence
Raw difference and motion compensated result
1) Subspace phase correlation is more robust to additive noise;
2) Subspace correlation using 1D FFT is more efficient;
3) Interpolation using the main peak and its two side-peaks yields more accurate estimates.
4) Good results are achieved from video frames, general images, MRI and remote sensing images.
Subpixel Motion Estimation
Ref: J. Ren etc., Extracting Objects and Events from MPEG Sequences for Video Highlights Indexing and Retrieval, LNCS, 2007
Human Object Detection• Modelling of skin pixels
for human object detection– Statistical modelling– YCbCr space– Supervised learning– Compressed domain– Adaptive thresholding
• Bayesian classification
skinnonskinep
skinep
b
b →>η)/(
)/(
Ref: J. Ren etc., Extracting Objects and Events from MPEG Sequences for Video Highlights Indexing and Retrieval, Journal of Multimedia (JMM), Academy, vol. 5, no. 2, 2010
Semantic Video Content Retrieval
Promising results have been achieved in query by video highlights.
Activity-Driven Video Summarisation
Main difficulties in summarising rush videos:Accurate modelling of several kinds of junk framesDetermine retakes (varying from 1 to more than 20);Extracting content of interest (COI) for effective summarisationHow to achieve objective evaluation is unsolved.
Hierarchical modelling via formal language descriptions and adaptive clustering of retakes;Excitement modelling is used to determine COIs.
Activity-Driven Video Summarisation
The original video is summarised to less than 3% in frames whilst keeping over 80% of key contents at a speed over 5 times of real-time play.
Ref: J. Ren etc., Hierarchical modeling and adaptive clustering for real-time summarization of rush videos, IEEE T-Multimedia, vol. 11, no. 5 pp. 906-917, Aug. 2009.
Video summarisation in TRECVID’0839 clips; >1.5million frames (17.2h at 25fps), MPEG‐1 format44 teams registered, 32 had results submitted in 43 groups;
Evaluation criteria9 criteria in 3 groups covering objective/subjective/usability measuresUnder a combined measurement, our result was ranked the 2nd or the 3rd best.
Based on the work in Surrey, see papers published in IEEE T-SMCB, Signal Processing and SPIE Optical Engineering, et al.
• Segmentation-assisted effective detection of film dirt;
• Global motion compensated robust detection of dirt in colour images;
• Detection supported concealment of dirt;
•Improved motion estimation for refined coding and segmentation applications.
Archived Video Restoration
Quantitative Performance
ROC analysis of several methods on dirt detection, GMCC and Conf refer to our methods with or without global motion compensation.
Restored Frames
a) Original image b) SDIp (grey) c) ML3Dex (global) d) Our method
Due to missing detection or false alarms, recovered image appears poor quality even with over-smoothing in c) image via global ML3Dex filtering.
Ref: J. Ren etc., Missing-Data Recovery from Dirt Sparkles on Degraded Color Films. SPIE Journal of Optical Engineering, 46(7), DOI: 10.1117/1.2751162, 2007
Based on the work in Kingston, see papers published in IEEE T-CSVT, CVIU, and Machine Vision and Applications et al.
•Video surveillance via multiple fixed cameras;
• Background modelling (GMM + running average)• Tracking players and the ball for soccer game reconstruction; • Trajectory-based modelling and tracking of multiple objects in multi-view sports scenarios;• Geometric modelling for 3-D ball positioning;• Modelling and classifying of motion phases and events for semantic analysis.
Sports Video Analysis & Reconstruction
Improved Tracking
Motion correction results in four consecutive frames (l-r) when the ball of ID 10 merged with a player (ID 8) using our tracking plus matching method in overcoming occlusion.
Ref: J. Ren etc., Tracking the Soccer Ball using Multiple Fixed Cameras. Computer Vision and Image Understanding. Vol. 113, no. 5, pp. 633-642, May, 2009.
detected foreground
Without correction
With correction
Ref: J. Ren etc., Real-time Modeling of 3-D Soccer Ball Trajectories from Multiple Fixed Cameras. IEEE Trans. Circuits Syst. Video Techn. (T-CSVT), 18(3): 350-362, 2008
Reconstructed Soccer Game
Motion Phase Classification and Phase-Specific Tracking
Four phases are defined as rolling, flying, possessed and out of play; ball motion is modelled as phase transition cycles starting from possessed and ending at out of playFor different phases, linear or non-linear model is applied for estimating the trajectory.
Ball trajectory in a whole phase transition cycle
Recognised motion phases comparing with manual GT
Hyperspectral ImagingAble to identify changes in moisture and temperature and even difference of chemical component due to continuous spectral band images captured;
• 5-10nm spectral resolution covering visual and near-infrared range
• Desktop analysis beyond remote sensing
Typical applications:• Food quality control and assessment (fruit, vegetable, meat, tea, wine…)
• Pharmaceutical for tablet analysis and material analysis
• Security and forensics (fingerprint extraction, fake document/stamps identification)
• Environmental monitoring and land usage evaluation (city planning…)
Machine Learning & Pattern Recognition
Using SVM, ANN, HMM, decision trees, Adaboost, clustering, Bayesian classifiers etc for various recognition/classification applications;
•Classification of MCCs in mammogram imaging
• Handwritten text recognition
• Decision making in extraction, detection and classification of image/video contents/events
1. Entertainments, Training and Education• Sports: automatic game analysis and performance
evaluation;• Media: Immersive experiencing of digital media contents for
museum /film /TV /entertainments (with restored original quality);
2. Health, Safety and Security• Intelligent surveillance for monitoring and health care;• Machine learning in medical imaging ;• Crime prevention and accidental events prediction;
3. Business for improved efficiency and productivity.• Automatic systems to assist or replace human beings
Potential Applications
1. Media Understanding• Extraction of high-level semantics• Surveillance event detection• Objective quality evaluation of coded images and summarised
videos• Image and video mining from social websites
2. Information Retrieval• Semantic video content retrieval with support from multiple clues• Generic model for video copy detection• Evaluation and retrieval of medical images• From face detection to clustering-based recognition in
interpreting of image/video
Possible Projects for Collaborations
3. Archive Restoration and Digital Preservation• Archived image/video restoration via learning using
spatial (and temporal) consistency;• Improved coding• Applications in protection of historical Chinese video asset
4. Hyperspectral Imaging and Machine Learning• Food quality control and assessment• Chinese painting analysis• Quality monitoring and verification of museum collections
Possible Projects for Collaborations
Thank you for your attention!
Any Questions?