lecture 3 video syntax analysis - national chung cheng
Post on 02-May-2022
5 Views
Preview:
TRANSCRIPT
Wei-Ta Chu
2010/9/30
Video Syntax Analysis1
Multimedia Content Analysis, CSIE, CCU
Types of Shot Change
Multimedia Content Analysis, CSIE, CCU
2
Abrupt change (hard cut) Cut occurs in a single frame when stopping and restarting the
camera Gradual transition
Fade-in: gradual increase in intensity starting from a black frame Fade-out: gradual decrease in intensity resulting a black frame Dissolve: transiting from the end of one clip to the beginning of
another Wipe: One image is replaced by another with a distinct edge
that forms a shape.…
Examples of Shot Changes
Multimedia Content Analysis, CSIE, CCU
3
Li and Lee. “Effective detection of various wipe transitions” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 6, pp. 663-673, 2007.
Cut
Dissolve
Wipe
Examples of Fade
Multimedia Content Analysis, CSIE, CCU
4
Cernekova, et al., “Information theory-based shot cut/fade detection and video summarization” IEEE Trans. on Circuits and Systems for Video Technology, vol. 16, no. 1, pp. 82-91, 2006.
Fade out
Fade in
Different Types of Wipe5
Li and Lee. “Effective detection of various wipe transitions” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 6, pp. 663-673, 2007.
Video example: http://en.wikipedia.org/wiki/Wipe_%28transition%29
Detection Process
Multimedia Content Analysis, CSIE, CCU
6
Extractfeatures
Similaritycalculating
Boundarydecision
Video
Shot 1 Shot 2 Shot 3 Shot 4
Features
Multimedia Content Analysis, CSIE, CCU
7
Pixel difference Statistical difference Histograms Compression differences Edge Motion
Pixel Difference
Multimedia Content Analysis, CSIE, CCU
8
Count the number of pixels that change in valuemore than some threshold.
May be sensitive to camera motion.
1. Pair-wise comparison
Multimedia Content Analysis, CSIE, CCU
9
Compare the corresponding pixels in two frames.
Problems: sensitive to camera movementE.g. camera panning Improvement: smoothing by a 3x3 window before
comparisonZhang, et al., “Automatic partitioning of full-motion video” Multimedia Systems Journal, vol. 1, pp. 10-28, 1993.
2. Histogram Comparison
Multimedia Content Analysis, CSIE, CCU
10
Less sensitive to object motion, since it ignores thespatial changes in a frame.
Hi(j): the histogram value for the ith frame, where jis one of the G grey levels.
2. Histogram Comparison–Example11
Example video sequence
The intensity histogram ofthe first three frames
2. Histogram Comparison
Multimedia Content Analysis, CSIE, CCU
12
Color histogram difference
pi(r,g,b) is the number of pixels of color (r,g,b) in frame Ii of N pixels.Each color component is discritized to 2B different values.
3. Likelihood Ratio
Multimedia Content Analysis, CSIE, CCU
13
Compare corresponding regions (blocks) in two successiveframes based on second-order statistical characteristics oftheir intensity values.
Then a camera break can be declared whenever the totalnumber of sample areas whose likelihood ratio exceeds thethreshold is sufficiently large
Raise the tolerance of slow and small object motion from frameto frame.
mi: mean intensity value for a given regionSi: variances for a given region
4. Edge Change Ratio
Multimedia Content Analysis, CSIE, CCU
14
Zabih, et al., “A feature-based algorithm for detecting and classifying scene breaks” Proc. Of ACM Multimedia, pp. 189-200,1995.
4. Edge Change Ratio
Multimedia Content Analysis, CSIE, CCU
15
4. Edge Change Ratio16
Edge change ratio
5. Motion Vectors17
Using the direction of motionprediction to be the cues for shotchange detection
Pei, et al., “Scene-effect detection and insertion MPEGencoding scheme for video browsing and error concealment” IEEE Trans. on Multimedia, vol. 7, no. 4, pp. 606-614, 2005.
5. Motion Vectors
Multimedia Content Analysis, CSIE, CCU
18
Using motion vector information to filter out falsepositives
Zhang, et al., “Automatic partitioning of full-motion video” Multimedia Systems Journal, vol. 1, pp. 10-28, 1993.
6. Differences in DCT domain
Multimedia Content Analysis, CSIE, CCU
19
Discrete Cosine Transform (DCT) coefficients 1. Select subset of blocks 2. Select subset of DCT coefficients of these blocks 3. Concatenate selected coefficients of selected blocks as a
vector 4. Calculate the similarity of two coefficient vectors
Arman, et al., “Image processing on encoded video sequences” Multimedia Systems Journal, vol. 1, no. 5, pp. 211-219, 1994.
Gradual Transition Detection
Multimedia Content Analysis, CSIE, CCU
20
Cuts or abrupt change
Gradual transition
1. Twin-Comparison Approach
Multimedia Content Analysis, CSIE, CCU
21
Zhang, et al., “Automatic partitioning of full-motion video” Multimedia Systems Journal, vol.1, pp. 10-28, 1993.
2. Edge Change Ratio22
Lienhart, R., “Comparison of automatic shot boundary detectionalgorithms” Proc. of SPIE Storage and Retrieval for Image and VideoDatabases VII, vol. 3656, pp. 290-301, 1999.
2. Edge Change Ratio23
3. Characterizing a Wipe Transition
Multimedia Content Analysis, CSIE, CCU
24
Evaluation
Multimedia Content Analysis, CSIE, CCU
25
Precision The percentage of retrieved items that are desired items
Recall The percentage of desired items that are retrieved.
Precision =# Correctly retrieved items
# All retrieved items=
# Correctly retrieved items
# Correctly retrieved items + # Falsely retrieved items
Recall =# Correctly retrieved items
# All relevant items=
# Correctly retrieved items
# Correctly retrieved items + # Items that are not retrieved
Evaluation–Other Terms
Multimedia Content Analysis, CSIE, CCU
26
Miss # Items that are not retrieved
True positive (TP) # Correctly retrieved items
False positive (FP) # Falsely retrieved items
True negative (TN) # Correctly missed items
False negative (FN) # Items that are not retrieved
Actualpositive
Actualnegative
Predictedpositive
TP FP
Predictednegative
FN TN
Evaluation
Multimedia Content Analysis, CSIE, CCU
27
Actualpositive
Actualnegative
Predictedpositive
TP FP
Predictednegative
FN TN
Detected(retrieved)
Relevant(ground truth)
TPFP FN
TN
Relationship between Precision & Recall
Multimedia Content Analysis, CSIE, CCU
28
Precision-Recall (PR) curve
Relationship between True Positive andFalse Positive
Multimedia Content Analysis, CSIE, CCU
29
Receiver Operator Characteristic (ROC) curve
Using PR or ROC Curves?
Multimedia Content Analysis, CSIE, CCU
30
ROC curves can present an overly optimistic view of analgorithm’s performance if there is a large skew in the class distribution.
Number of true negative examples greatly exceeds thenumber of positive examples. Thus a large change in thenumber in false positives can lead to a small change in thefalse positive rate.
Precision compares false positives to true positives and bettercaptures the algorithm’s performance.
Davis, et al., “The relationship between precision-recall and ROC curves” Proc. of International Conference on Machine Learning, pp. 233-240, 2006.
Comparison of Shot BoundaryDetection Techniques
Multimedia Content Analysis, CSIE, CCU
31
MethodsHistograms, region histograms, running histograms,
motion-compensated pixel differences, DCT coefficientdifferences
Evaluation dataVideo type # Frames Cuts Gradual transitions
TV 133204 831 42
News 81595 293 99
Movie 142507 564 95
Commercial 51733 755 254
Misc. 10706 64 16
Total 419745 2507 506
Methods Compared
Multimedia Content Analysis, CSIE, CCU
32
Histogram (64-bin gray-level) difference, single threshold Region (block) histogram
16 blocks, 64 gray-scale histograms, difference threshold for each block, and countthreshold for changed blocks
Running histogram (Twin method) 64 gray-scale histogram for each frame, twin thresholds Compute motion vectors. If excessive motion, reject gradual changes
Motion compensated pixel difference 12 blocks per frame, motion vector for each block Compute average residual errors, if larger than high threshold, detected as a cut Use cumulative errors to detect gradual changes (similar to above) Use motion vectors to reject false gradual changes
DCT difference Concatenate 15 coefficients of same locations from different blocks to form a vector Compute (1-inner product of two vectors from consecutive frames)
PR Curve for TV program
Multimedia Content Analysis, CSIE, CCU
33
PR Curve for News program
Multimedia Content Analysis, CSIE, CCU
34
PR Curve for Movie Videos
Multimedia Content Analysis, CSIE, CCU
35
PR Curve for Commercials
Multimedia Content Analysis, CSIE, CCU
36
PR Curve for All Data
Multimedia Content Analysis, CSIE, CCU
37
PR Curve for All Data–Cut Only
Multimedia Content Analysis, CSIE, CCU
38
Observations
Multimedia Content Analysis, CSIE, CCU
39
Histogram-based method is consistent Produced the first or second best precision Simplicity & straightforward
Region algorithm seems to be the best Where recall is not the highest priority
Running algorithm seems to be the best Where recall is important Motion vector is helpful to reduce false positives
DCT the worst Large number of false positives in black frames
References
Multimedia Content Analysis, CSIE, CCU
40
J.S. Boreczky, et al., "Comparison of video shot boundary detectiontechniques" Proc. of SPIE Conference on Storage and Retrieval forImage and Video Databases, vol. 2670, 1996. (must read)
R. Lienhart, "Comparison of automatic shot boundary detectionalgorithms" Proc. of SPIE Storage and Retrieval for Image andVideo Databases VII, vol. 3656, pp. 290-301, 1999.
J. Yuan, et al., "A formal study of shot boundary detection" IEEETrans. on Circuits and Systems for Video Technology, vol. 17, no. 2,pp. 168-186, 2007.
A. Hanjalic, "Shot-boundary detection: unraveled or resolved?" IEEETrans. on Circuits and Systems for Video Technology, vol. 12, no. 2,pp. 90-105, 2002.
Edge41
Multimedia Content Analysis, CSIE, CCU
Edge42
An edge is a set of connected pixels that lie on the boundarybetween two regions.
Chapters 10 of “Digital Image Processing” by R.C. Gonzalez and R.E. Woods, Prentice Hall, 2nd
edition, 2001
Edge
Multimedia Content Analysis, CSIE, CCU
43
Gradient Operators44
Roberts cross-gradient operators:
Prewitt operators:
Sobel operators:
Edge Examples
Multimedia Content Analysis, CSIE, CCU
45
Edge Examples–after smoothing
Multimedia Content Analysis, CSIE, CCU
46
Edge Examples
Multimedia Content Analysis, CSIE, CCU
47
Canny Edge Detectors48
Step 1: the image is smoothed by Gaussian convolution Step 2: a 2D first derivative operator is applied to the
smoothed image Step 3: non-maximal suppression
Edges give rise to ridges in the gradient magnitude image. Thealgorithm tracks along the top of these ridges and sets to zero all pixelsthat are not actually on the ridge.
http://homepages.inf.ed.ac.uk/rbf/HIPR2/canny.htm
Very Brief Introduction of DiscreteCosine Transform
49
Multimedia Content Analysis, CSIE, CCU
Spatial Frequency and DCT
Multimedia Content Analysis, CSIE, CCU
50
Definition of DCT
Multimedia Content Analysis, CSIE, CCU
51
2D DCT
Multimedia Content Analysis, CSIE, CCU
52
1D DCT53
DCT Basis
Multimedia Content Analysis, CSIE, CCU
54
DCT Basis
Multimedia Content Analysis, CSIE, CCU
55
Example
Multimedia Content Analysis, CSIE, CCU
56
Example
Multimedia Content Analysis, CSIE, CCU
57
Example
Multimedia Content Analysis, CSIE, CCU
58
Example
Multimedia Content Analysis, CSIE, CCU
59
Discrete Cosine Transform
Multimedia Content Analysis, CSIE, CCU
60
DCT converts a block of pixelsinto a block of transformcoefficients, which representthe spatial frequency.
Each coefficient is a weightapplied to an appropriatebasis function.
Any gray-scale 8x8 pixel blockcan be fully represented by aweighted sum of these 64 basisfunctions.
Increasing horizontal frequency
Increasingverticalfrequency
“DC” basis function
Intra-Frame Encoding (JPEG Compression)
Multimedia Content Analysis, CSIE, CCU
61
Scene Transition Graph62
Multimedia Content Analysis, CSIE, CCU
Yeung, et al. “Segmentation of video by clustering and graph analysis” Computer Vision and Image Understanding, vol. 71, no. 1, pp. 94-109, 1998.
Observations
Multimedia Content Analysis, CSIE, CCU
63
Shots in a scene are often repetitive. We are ableto classify shots by grouping shots of similar visualcontents.
Often, a scene is made up of temporally adjacentshots indicating their interrelationships.
Similarity of Video Shots
Multimedia Content Analysis, CSIE, CCU
64
D(.,.) measures the dissimilarity between two image frames.
Similarity of Video Shots65
Dissimilarity based on color histogram intersection
Dissimilarity based on luminance projection
Yeungand Liu, “Efficient matching and clustering of video shots” Proc. of IEEE International Conference on Image Processing,vol. 1, pp. 338-341, 1995.
Representative Image Setfor a Video Shot
66
Selection of representative set is achieved by nonlineartemporal sampling
Representative Image Setfor a Video Shot
Multimedia Content Analysis, CSIE, CCU
67
Only 2 to 5% of frames are needed in comparisonto achieve good matching results.
In addition to temporal subsampling, spatialsubsampling can also be used to improve matchingefficiency.
Clustering of Video Shots
Multimedia Content Analysis, CSIE, CCU
68
Shots in the same cluster are similar Any other shot outside of the cluster must have a
dissimilarity greater than the dissimilarity betweenany shot in the cluster.
Ci: the ith cluster
Clustering of Video Shots
Multimedia Content Analysis, CSIE, CCU
69
Dissimilarity between two clusters:
Using the shot pair, in which two shotsare in two different clusters, that hasthe largest dissimilarity value.
Dissimilarity between two clustersshould be updated at each iteration.
Clustering of Video Shots
Multimedia Content Analysis, CSIE, CCU
70
Time-Constrained Clustering71
Any two shots that are far apart in time, even if they sharesimilar visual contents, they potentially represent differentcontents or occur in different scenes.
Temporal distance between two shotsThe distance in number of framesfrom the end of the earlier shot to thebeginning of the latter one.
Scene Transition Graph
Multimedia Content Analysis, CSIE, CCU
72
A scene transition graph is a directed graph with the propertyG=(V,E,F)
V: each node represents a cluster of shots E: a directed edge is drawn from node U to W if there is a
shot represented by node U that immediately precedes anyshots represented by node W.
F: a mapping that partitions the set of shots into clusters STG is able to represent compactly the structures of shots and
the temporal flow of the story for many video programs.
Example of STG
Multimedia Content Analysis, CSIE, CCU
73
3 scenes of 9 shots
Sample clustering results
Scene transition graph
Cut Edges
Multimedia Content Analysis, CSIE, CCU
74
An edge is a cut edge, if when is removed, results in two disconnected graphs.
Each partitioned STG Gi represents the interactions of shots in a story unit.
STG After Time Confining and CutEdges Finding
Multimedia Content Analysis, CSIE, CCU
75
Framework76
Shot segmentation Time-constrained
clustering Building of scene
transition graph Scene segmentation
Influences of Parameters
Multimedia Content Analysis, CSIE, CCU
77
Without the knowledge of how long each individual scene lasts,T cannot be approximated well. If T is too large, shots from different scenes are clustered together. If T is too small, shots in the same scene may be separated into
different scenes.
It’s less detrimental to have several story units represent a scene than to have one story unit represent several scenes.
Influences of Time Constraints78
T = 20s. dt(B1,B3) > T
Clustering results are {B1,B2},{A1,A2,A3},{B3,B4},{C1},{D1}
Story unit results are {B1,A1,B2,A2,B3,A3,B4},{C1},{D1}
B1B2
A1A2A3
B3B4 C1
D1
STG
{Bi} are not clustered into one cluster because thereare at least a pair of shots, one from each cluster, that has a temporaldistance dt > T*.
Influences of Time Constraints79
T = 20s.
Clustering results are {B1},{B2,B3},{A1,A2,A3},{B4},{C1},{D1}
Story unit results are {B1},{A1,B2,A2,B3,A3},{B4},{C1},{D1}
B1
A1A2A3
B2B3 C1
D1
B4
STG
Refined Analysis
Multimedia Content Analysis, CSIE, CCU
80
Make the time-window more elasticCompute the duration of each story unit and adjust
Given a story unit, examination of the next storyunit by relaxing the temporal windows andreclustering the shots in these two units. If there exists at least one new cluster that contains
shots from the two units, two story units are merged intoone.
Refined Analysis
Multimedia Content Analysis, CSIE, CCU
81
Example
Multimedia Content Analysis, CSIE, CCU
82
{B1,B2},{A1,A2,A3},{B3,B4},{C1},{D1}
{B1},{B2,B3},{A1,A2,A3},{B4},{C1},{D1}
Results
Multimedia Content Analysis, CSIE, CCU
83
STG constructed from the sitcom “Friends”. There are 35575 frames, each at a spatial resolution of 320x240.There are 313 shots.
Results
Multimedia Content Analysis, CSIE, CCU
84
Time-constrained clustering of video shots is able to identifyindividual story units.
The resulting STG permits rapid nonlinear browsing of longvideo programs.
Variations of Clustering Parameters85
Smaller delta values result in more clusters and thus more story units.Users often prefer over-segmentation rather than under-segmentation.
Refining the Segmentation Results
Multimedia Content Analysis, CSIE, CCU
86
The first two story units in Scene 1 are merged into one.The number of story units in Scene 6 is reduced from 4 to 2.
Conclusion
Multimedia Content Analysis, CSIE, CCU
87
Analysis based on time-constrained clustering andscene transition graph analysis has contributed tothe extraction of story units.
The building of story structure provides nonlinearaccess to video contents.
Identification, integration, and application ofdomain-dependent and semantic features tend toimprove segmentation accuracy.
top related