video table-of-contents: construction and matching master of philosophy 3 rd term presentation -...
Post on 19-Dec-2015
216 views
TRANSCRIPT
Video Table-of-Contents: Construction and Matching
Master of Philosophy3rd Term Presentation
- Presented by Ng Chung Wing
OutlineOverview of ResearchPrevious WorkADVISE
Advanced Digital Video Information Segmentation Engine
Future Work and Conclusion
Overview of ResearchSituation
A large volume of video contents on the Internet
Problems Not enough information to describe the
video contents Difficult to search for videos with similar
contents
Overview of ResearchWeb-based Video Retrieval System
Provides a Video Table-of-Contents to describe the structure of video
Applies Tree Matching Algorithms to measure the similarity between videos
Allows retrieval of similar videos according to tree matching results
Overview of Research (Cont’d)
In the last semester Definition of Video Tree Structure Video Tree Matching Algorithms
In this semester ADVISE
Generation of Video Tree Structure Web-based presentation of the structure as a
Video Table-of-Contents
In the coming semester Video Retrieval
Review on Previous Work: Video Tree Structure
Decompose a video into 5 levels: Video Frames Video Shots Video Groups Video Scenes Whole Video
Hierarchical Representation of a Video
Review on Previous Work: Video Tree Structure (Cont’d)
Example:
Group 1 Group 3Group 2
Scene 1 Scene 2
Video
Shots:
Review on Previous Work: Video Tree Structure (Cont’d)
4 levels tree structureRegarded as: Video Table-of-Contents (V-ToC)
Review on Previous Work: Video Tree Matching Algorithms
Measure the similarity of videos Matching on their video tree structures
Two approaches: Ordered Tree Matching Algorithm
Constrained by temporal ordering Non-ordered Tree Matching Algorithm
Not constrained by temporal ordering
Video feature used Color histograms of video frames
ADVISEAdvanced Digital Video Information Segmentation Engine3 modules:
Generates V-ToC to describe videos
Presents V-ToC on the Internet using XML
Allows video customization according to the V-ToC with SMIL
ADVISE
Video StructureConstruction
(Scenes, Groups andShots Detections)
Video StructureConstruction
(Scenes, Groups andShots Detections)
Web Server
Generation ofSMIL
presentation
Generation ofSMIL
presentation
Online User Terminal
1. Described thevideo content to
online user 3. ReturnCustomized SMIL
Video to user
2. SubmitSelectionRequest
Presention ofV-ToC in XML
Presention ofV-ToC in XML
ADVISE:Video Structure Construction
Video Shots Detection Color histogram based method with
weighted regions 5 regional and 1 overall color histograms for
each video frame Catch local color features in video frame Different weights to regions
according to importance
6
ADVISE:Video Structure Construction (Cont’d)
Calculate the frame-to-frame color difference:
where Histi,t(k) denotes the k-th color value in the histogram for region i in frame t. WRDX are weights to regions.
66525
43211
1,,
)())(
)()()(()( )( DifferenceColor Frames
escolor valu allfor , )( (t))( Region in Difference
RDtoRD
RDt
titii
WtRDWtRD
tRDtRDtRDWtRDFD
kkHist(k)HistRDi
ADVISE:Video Structure Construction (Cont’d)
Find the sudden change in color contents as the video shot boundaries
Need a threshold to determine the shot boundaries:
Not suitable to assign a fixed threshold to different videos
Use adaptive threshold
boundaryshot aNot Threshold )( DifferenceColor Frame
occursboundary Shot Threshold )( DifferenceColor Frame
t
t
FD
FD
ADVISE:Video Structure Construction (Cont’d)
Employed an entropic thresholding method
Divide the frame-to-frame histograms difference into two populations at a threshold point
Measure the entropies of the populations Find the maximum sum of two entropies at
different threshold point
Distribution of histogram differences:
0 Max. Difference
Optimal Threshold(with most informative entropies)
Shot Breaks
Non-Shot Breaks
ADVISE:Video Structure Construction (Cont’d)
Video Groups Formation For each shot s, compare its key frame (the
first frame) with the key frame from most recent shot in group g.
66525
43211,
,,
)())(
)()()(()(
escolor valu allfor , )(
RDgstoRDgs
gsgsgsRDgstgts
tgitsigsi
W,ttRDW,ttRD
,ttRD,ttRD,ttRDW,ttRDFD
kkHist(k)Hist),t(tRD
ADVISE:Video Structure Construction (Cont’d)
After comparing all groups, we assign the shot to a group if:
Difference is smallest amongst groups Difference is smaller than the calculated threshold The shot is temporally not far apart from the group
ADVISE:Video Structure Construction (Cont’d)
Scene B
Scene A Scene B
aTime : b c d
Case (i) : c<m<n<d m
GroupG1
n
Scene A
a b c d
Scene B Assign G1 to B
Case (ii) : c<m<d & d<n m
GroupG1
n
Scene A
a b c
Scene B Assign G1 to B and extend B to n
n
Case (iii) : d<m
m
GroupG1
n
Scene A
a b c d
m n
Scene C Create Scene C and assign G1 to C
Video Scenes Formation Construct a continuous
video sequence from video groups
Video scenes For each group with the first
and the last shots m and n If m is within a scene, add
the group to the scene and extend the scene to n if necessary.
Case (i) & (ii) If m is not within any scene,
add it to a new scene Case (iii)
ADVISE:Video Structure Construction (Cont’d)
User Interface of implemented system
ADVISE:Video Structure Construction (Cont’d)
Experiments: Evaluate 4 different settings of video
structure construction Single color histogram, and fixed threshold Single color histogram, and adaptive threshold Weighted regional color histograms, and fixed
threshold Weighted regional color histograms, and adaptive
threshold Compare the generated video structure with
human judgments
ADVISE:Video Structure Construction (Cont’d)
The setting with Weighted regional color histograms Adaptive threshold
generates the most accurate video structure.
ADVISE:XML Presentation
4 benefits to store the video structure in XML
Nested hierarchy Fit into our video tree structure
Plain-text format Easy to search and modify
Extensibility Available to extend the video structure
Application to the Internet
ADVISE:XML Presentation (Cont’d)
Defined XML grammar for video structure in DTD
<?xml version="1.0"?><!ELEMENT advise (video+)><!ELEMENT video (scene+)> <!ATTLIST video src CDATA #REQUIRED><!ELEMENT scene (group+)> <!ATTLIST scene id CDATA #REQUIRED><!ELEMENT group (shot+)> <!ATTLIST group id CDATA #REQUIRED><!ELEMENT shot (time+, keyframe+)> <!ATTLIST shot id CDATA #REQUIRED><!ELEMENT time EMPTY> <!ATTLIST time value CDATA #REQUIRED><!ELEMENT keyframe EMPTY> <!ATTLIST keyframe img CDATA #REQUIRED>
<?xml version="1.0"?><!DOCTYPE advise SYSTEM "./toc.dtd"><advise><video src="rstp:// source video on server"><scene id="1"> <group id="1"> <shot id="1"> <time value="0"/> <keyframe img="./sh_1.jpg"/> </shot> <shot id="2"> <time value="11"/> <keyframe img="./sh_2.jpg"/> </shot> </group></scene></video></advise>
DTD for XML Video Tree Structure XML Video Tree Structure
ADVISE:XML Presentation (Cont’d)
Web-based presentation of XML using XSL
Transformation to HTML Sorting and filtering of XML data<xsl:for-each select="advise/video/scene/group/shot" order-by="../@id">
<tr class="nfont"> <th><xsl:value-of select="../../@id"/></th> <th><xsl:value-of select="../@id"/></th> <th><xsl:value-of select="@id"/></th> <th align="left"><img width="55" height="45"><xsl:attribute name="src"><xsl:value-of select="keyframe/@img"/></xsl:attribute></img> at <xsl:value-of select="time/@value"/> sec</th></tr></xsl:for-each>
ADVISE:XML Presentation (Cont’d)
An example XML presentation
V-ToC describes the structure of a video
ADVISE:SMIL Generation
Video customization Allow user to pick some video segments that
they are interested from the V-ToC
SMIL Designed for performing synchronized
multimedia presentation on the Internet Use RealPlayer to browse Benefits
Easy to generate because of the XML plain-text property
Dynamically adapt to different network and client condition
ADVISE:SMIL Generation (Cont’d)
Defined a SMIL template:<?xml version="1.0"?><smil><head> ... Define the Layout </head><body> <seq> <par> <video src="rtsp:// source video on server" clip-begin="3s" clip-end="15s" region="video" fill="freeze"/> <textstream src="desc.rt" clip-begin="3s" clip-end="15s" region="description" fill="freeze"/> <img src="./sh_2.jpg" region="keyframe" fill="freeze"/> </par> <par> <video src="rtsp:// source video on server" clip-begin="35s" clip-end="50s" region="video" fill="freeze"/> <textstream src="desc.rt" clip-begin="35s" clip-end="50s" region="description" fill="freeze"/> <img src="./sh_4.jpg" region="keyframe" fill="freeze"/> </par> </seq> </body></smil>
ADVISE:SMIL Generation (Cont’d)
Customized SMIL Video Presentation
Script on Web Server1. Interpret request2. Select video segments
according to XML video structure
3. Generate customizedSMIL presentation
User Interface for Customization
Submit request
Return SMIL
Future WorkVideo retrieval system framework
Integrates ADVISE and video tree matching algorithms
Explore the capability of using the video tree matching on video retrieval
Video clustering
Efficient retrieval of video using XML Hierarchy of V-ToC Textual search of video information
ConclusionOverview of the research on video retrieval system
Based on the structure of video (V-ToC)
Described ADVISE Generates video tree structure (V-ToC) Provides V-ToC in XML as descriptions of
videos on the Internet Enables video customization based on V-
ToC using SMIL