lecture 3 video syntax analysis - national chung cheng
TRANSCRIPT
![Page 1: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/1.jpg)
Wei-Ta Chu
2010/9/30
Video Syntax Analysis1
Multimedia Content Analysis, CSIE, CCU
![Page 2: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/2.jpg)
Types of Shot Change
Multimedia Content Analysis, CSIE, CCU
2
Abrupt change (hard cut) Cut occurs in a single frame when stopping and restarting the
camera Gradual transition
Fade-in: gradual increase in intensity starting from a black frame Fade-out: gradual decrease in intensity resulting a black frame Dissolve: transiting from the end of one clip to the beginning of
another Wipe: One image is replaced by another with a distinct edge
that forms a shape.…
![Page 3: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/3.jpg)
Examples of Shot Changes
Multimedia Content Analysis, CSIE, CCU
3
Li and Lee. “Effective detection of various wipe transitions” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 6, pp. 663-673, 2007.
Cut
Dissolve
Wipe
![Page 4: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/4.jpg)
Examples of Fade
Multimedia Content Analysis, CSIE, CCU
4
Cernekova, et al., “Information theory-based shot cut/fade detection and video summarization” IEEE Trans. on Circuits and Systems for Video Technology, vol. 16, no. 1, pp. 82-91, 2006.
Fade out
Fade in
![Page 5: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/5.jpg)
Different Types of Wipe5
Li and Lee. “Effective detection of various wipe transitions” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 6, pp. 663-673, 2007.
Video example: http://en.wikipedia.org/wiki/Wipe_%28transition%29
![Page 6: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/6.jpg)
Detection Process
Multimedia Content Analysis, CSIE, CCU
6
Extractfeatures
Similaritycalculating
Boundarydecision
Video
Shot 1 Shot 2 Shot 3 Shot 4
![Page 7: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/7.jpg)
Features
Multimedia Content Analysis, CSIE, CCU
7
Pixel difference Statistical difference Histograms Compression differences Edge Motion
![Page 8: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/8.jpg)
Pixel Difference
Multimedia Content Analysis, CSIE, CCU
8
Count the number of pixels that change in valuemore than some threshold.
May be sensitive to camera motion.
![Page 9: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/9.jpg)
1. Pair-wise comparison
Multimedia Content Analysis, CSIE, CCU
9
Compare the corresponding pixels in two frames.
Problems: sensitive to camera movementE.g. camera panning Improvement: smoothing by a 3x3 window before
comparisonZhang, et al., “Automatic partitioning of full-motion video” Multimedia Systems Journal, vol. 1, pp. 10-28, 1993.
![Page 10: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/10.jpg)
2. Histogram Comparison
Multimedia Content Analysis, CSIE, CCU
10
Less sensitive to object motion, since it ignores thespatial changes in a frame.
Hi(j): the histogram value for the ith frame, where jis one of the G grey levels.
![Page 11: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/11.jpg)
2. Histogram Comparison–Example11
Example video sequence
The intensity histogram ofthe first three frames
![Page 12: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/12.jpg)
2. Histogram Comparison
Multimedia Content Analysis, CSIE, CCU
12
Color histogram difference
pi(r,g,b) is the number of pixels of color (r,g,b) in frame Ii of N pixels.Each color component is discritized to 2B different values.
![Page 13: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/13.jpg)
3. Likelihood Ratio
Multimedia Content Analysis, CSIE, CCU
13
Compare corresponding regions (blocks) in two successiveframes based on second-order statistical characteristics oftheir intensity values.
Then a camera break can be declared whenever the totalnumber of sample areas whose likelihood ratio exceeds thethreshold is sufficiently large
Raise the tolerance of slow and small object motion from frameto frame.
mi: mean intensity value for a given regionSi: variances for a given region
![Page 14: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/14.jpg)
4. Edge Change Ratio
Multimedia Content Analysis, CSIE, CCU
14
Zabih, et al., “A feature-based algorithm for detecting and classifying scene breaks” Proc. Of ACM Multimedia, pp. 189-200,1995.
![Page 15: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/15.jpg)
4. Edge Change Ratio
Multimedia Content Analysis, CSIE, CCU
15
![Page 16: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/16.jpg)
4. Edge Change Ratio16
Edge change ratio
![Page 17: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/17.jpg)
5. Motion Vectors17
Using the direction of motionprediction to be the cues for shotchange detection
Pei, et al., “Scene-effect detection and insertion MPEGencoding scheme for video browsing and error concealment” IEEE Trans. on Multimedia, vol. 7, no. 4, pp. 606-614, 2005.
![Page 18: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/18.jpg)
5. Motion Vectors
Multimedia Content Analysis, CSIE, CCU
18
Using motion vector information to filter out falsepositives
Zhang, et al., “Automatic partitioning of full-motion video” Multimedia Systems Journal, vol. 1, pp. 10-28, 1993.
![Page 19: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/19.jpg)
6. Differences in DCT domain
Multimedia Content Analysis, CSIE, CCU
19
Discrete Cosine Transform (DCT) coefficients 1. Select subset of blocks 2. Select subset of DCT coefficients of these blocks 3. Concatenate selected coefficients of selected blocks as a
vector 4. Calculate the similarity of two coefficient vectors
Arman, et al., “Image processing on encoded video sequences” Multimedia Systems Journal, vol. 1, no. 5, pp. 211-219, 1994.
![Page 20: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/20.jpg)
Gradual Transition Detection
Multimedia Content Analysis, CSIE, CCU
20
Cuts or abrupt change
Gradual transition
![Page 21: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/21.jpg)
1. Twin-Comparison Approach
Multimedia Content Analysis, CSIE, CCU
21
Zhang, et al., “Automatic partitioning of full-motion video” Multimedia Systems Journal, vol.1, pp. 10-28, 1993.
![Page 22: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/22.jpg)
2. Edge Change Ratio22
Lienhart, R., “Comparison of automatic shot boundary detectionalgorithms” Proc. of SPIE Storage and Retrieval for Image and VideoDatabases VII, vol. 3656, pp. 290-301, 1999.
![Page 23: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/23.jpg)
2. Edge Change Ratio23
![Page 24: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/24.jpg)
3. Characterizing a Wipe Transition
Multimedia Content Analysis, CSIE, CCU
24
![Page 25: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/25.jpg)
Evaluation
Multimedia Content Analysis, CSIE, CCU
25
Precision The percentage of retrieved items that are desired items
Recall The percentage of desired items that are retrieved.
Precision =# Correctly retrieved items
# All retrieved items=
# Correctly retrieved items
# Correctly retrieved items + # Falsely retrieved items
Recall =# Correctly retrieved items
# All relevant items=
# Correctly retrieved items
# Correctly retrieved items + # Items that are not retrieved
![Page 26: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/26.jpg)
Evaluation–Other Terms
Multimedia Content Analysis, CSIE, CCU
26
Miss # Items that are not retrieved
True positive (TP) # Correctly retrieved items
False positive (FP) # Falsely retrieved items
True negative (TN) # Correctly missed items
False negative (FN) # Items that are not retrieved
Actualpositive
Actualnegative
Predictedpositive
TP FP
Predictednegative
FN TN
![Page 27: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/27.jpg)
Evaluation
Multimedia Content Analysis, CSIE, CCU
27
Actualpositive
Actualnegative
Predictedpositive
TP FP
Predictednegative
FN TN
Detected(retrieved)
Relevant(ground truth)
TPFP FN
TN
![Page 28: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/28.jpg)
Relationship between Precision & Recall
Multimedia Content Analysis, CSIE, CCU
28
Precision-Recall (PR) curve
![Page 29: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/29.jpg)
Relationship between True Positive andFalse Positive
Multimedia Content Analysis, CSIE, CCU
29
Receiver Operator Characteristic (ROC) curve
![Page 30: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/30.jpg)
Using PR or ROC Curves?
Multimedia Content Analysis, CSIE, CCU
30
ROC curves can present an overly optimistic view of analgorithm’s performance if there is a large skew in the class distribution.
Number of true negative examples greatly exceeds thenumber of positive examples. Thus a large change in thenumber in false positives can lead to a small change in thefalse positive rate.
Precision compares false positives to true positives and bettercaptures the algorithm’s performance.
Davis, et al., “The relationship between precision-recall and ROC curves” Proc. of International Conference on Machine Learning, pp. 233-240, 2006.
![Page 31: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/31.jpg)
Comparison of Shot BoundaryDetection Techniques
Multimedia Content Analysis, CSIE, CCU
31
MethodsHistograms, region histograms, running histograms,
motion-compensated pixel differences, DCT coefficientdifferences
Evaluation dataVideo type # Frames Cuts Gradual transitions
TV 133204 831 42
News 81595 293 99
Movie 142507 564 95
Commercial 51733 755 254
Misc. 10706 64 16
Total 419745 2507 506
![Page 32: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/32.jpg)
Methods Compared
Multimedia Content Analysis, CSIE, CCU
32
Histogram (64-bin gray-level) difference, single threshold Region (block) histogram
16 blocks, 64 gray-scale histograms, difference threshold for each block, and countthreshold for changed blocks
Running histogram (Twin method) 64 gray-scale histogram for each frame, twin thresholds Compute motion vectors. If excessive motion, reject gradual changes
Motion compensated pixel difference 12 blocks per frame, motion vector for each block Compute average residual errors, if larger than high threshold, detected as a cut Use cumulative errors to detect gradual changes (similar to above) Use motion vectors to reject false gradual changes
DCT difference Concatenate 15 coefficients of same locations from different blocks to form a vector Compute (1-inner product of two vectors from consecutive frames)
![Page 33: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/33.jpg)
PR Curve for TV program
Multimedia Content Analysis, CSIE, CCU
33
![Page 34: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/34.jpg)
PR Curve for News program
Multimedia Content Analysis, CSIE, CCU
34
![Page 35: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/35.jpg)
PR Curve for Movie Videos
Multimedia Content Analysis, CSIE, CCU
35
![Page 36: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/36.jpg)
PR Curve for Commercials
Multimedia Content Analysis, CSIE, CCU
36
![Page 37: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/37.jpg)
PR Curve for All Data
Multimedia Content Analysis, CSIE, CCU
37
![Page 38: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/38.jpg)
PR Curve for All Data–Cut Only
Multimedia Content Analysis, CSIE, CCU
38
![Page 39: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/39.jpg)
Observations
Multimedia Content Analysis, CSIE, CCU
39
Histogram-based method is consistent Produced the first or second best precision Simplicity & straightforward
Region algorithm seems to be the best Where recall is not the highest priority
Running algorithm seems to be the best Where recall is important Motion vector is helpful to reduce false positives
DCT the worst Large number of false positives in black frames
![Page 40: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/40.jpg)
References
Multimedia Content Analysis, CSIE, CCU
40
J.S. Boreczky, et al., "Comparison of video shot boundary detectiontechniques" Proc. of SPIE Conference on Storage and Retrieval forImage and Video Databases, vol. 2670, 1996. (must read)
R. Lienhart, "Comparison of automatic shot boundary detectionalgorithms" Proc. of SPIE Storage and Retrieval for Image andVideo Databases VII, vol. 3656, pp. 290-301, 1999.
J. Yuan, et al., "A formal study of shot boundary detection" IEEETrans. on Circuits and Systems for Video Technology, vol. 17, no. 2,pp. 168-186, 2007.
A. Hanjalic, "Shot-boundary detection: unraveled or resolved?" IEEETrans. on Circuits and Systems for Video Technology, vol. 12, no. 2,pp. 90-105, 2002.
![Page 41: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/41.jpg)
Edge41
Multimedia Content Analysis, CSIE, CCU
![Page 42: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/42.jpg)
Edge42
An edge is a set of connected pixels that lie on the boundarybetween two regions.
Chapters 10 of “Digital Image Processing” by R.C. Gonzalez and R.E. Woods, Prentice Hall, 2nd
edition, 2001
![Page 43: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/43.jpg)
Edge
Multimedia Content Analysis, CSIE, CCU
43
![Page 44: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/44.jpg)
Gradient Operators44
Roberts cross-gradient operators:
Prewitt operators:
Sobel operators:
![Page 45: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/45.jpg)
Edge Examples
Multimedia Content Analysis, CSIE, CCU
45
![Page 46: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/46.jpg)
Edge Examples–after smoothing
Multimedia Content Analysis, CSIE, CCU
46
![Page 47: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/47.jpg)
Edge Examples
Multimedia Content Analysis, CSIE, CCU
47
![Page 48: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/48.jpg)
Canny Edge Detectors48
Step 1: the image is smoothed by Gaussian convolution Step 2: a 2D first derivative operator is applied to the
smoothed image Step 3: non-maximal suppression
Edges give rise to ridges in the gradient magnitude image. Thealgorithm tracks along the top of these ridges and sets to zero all pixelsthat are not actually on the ridge.
http://homepages.inf.ed.ac.uk/rbf/HIPR2/canny.htm
![Page 49: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/49.jpg)
Very Brief Introduction of DiscreteCosine Transform
49
Multimedia Content Analysis, CSIE, CCU
![Page 50: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/50.jpg)
Spatial Frequency and DCT
Multimedia Content Analysis, CSIE, CCU
50
![Page 51: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/51.jpg)
Definition of DCT
Multimedia Content Analysis, CSIE, CCU
51
![Page 52: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/52.jpg)
2D DCT
Multimedia Content Analysis, CSIE, CCU
52
![Page 53: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/53.jpg)
1D DCT53
![Page 54: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/54.jpg)
DCT Basis
Multimedia Content Analysis, CSIE, CCU
54
![Page 55: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/55.jpg)
DCT Basis
Multimedia Content Analysis, CSIE, CCU
55
![Page 56: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/56.jpg)
Example
Multimedia Content Analysis, CSIE, CCU
56
![Page 57: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/57.jpg)
Example
Multimedia Content Analysis, CSIE, CCU
57
![Page 58: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/58.jpg)
Example
Multimedia Content Analysis, CSIE, CCU
58
![Page 59: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/59.jpg)
Example
Multimedia Content Analysis, CSIE, CCU
59
![Page 60: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/60.jpg)
Discrete Cosine Transform
Multimedia Content Analysis, CSIE, CCU
60
DCT converts a block of pixelsinto a block of transformcoefficients, which representthe spatial frequency.
Each coefficient is a weightapplied to an appropriatebasis function.
Any gray-scale 8x8 pixel blockcan be fully represented by aweighted sum of these 64 basisfunctions.
Increasing horizontal frequency
Increasingverticalfrequency
“DC” basis function
![Page 61: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/61.jpg)
Intra-Frame Encoding (JPEG Compression)
Multimedia Content Analysis, CSIE, CCU
61
![Page 62: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/62.jpg)
Scene Transition Graph62
Multimedia Content Analysis, CSIE, CCU
Yeung, et al. “Segmentation of video by clustering and graph analysis” Computer Vision and Image Understanding, vol. 71, no. 1, pp. 94-109, 1998.
![Page 63: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/63.jpg)
Observations
Multimedia Content Analysis, CSIE, CCU
63
Shots in a scene are often repetitive. We are ableto classify shots by grouping shots of similar visualcontents.
Often, a scene is made up of temporally adjacentshots indicating their interrelationships.
![Page 64: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/64.jpg)
Similarity of Video Shots
Multimedia Content Analysis, CSIE, CCU
64
D(.,.) measures the dissimilarity between two image frames.
![Page 65: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/65.jpg)
Similarity of Video Shots65
Dissimilarity based on color histogram intersection
Dissimilarity based on luminance projection
Yeungand Liu, “Efficient matching and clustering of video shots” Proc. of IEEE International Conference on Image Processing,vol. 1, pp. 338-341, 1995.
![Page 66: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/66.jpg)
Representative Image Setfor a Video Shot
66
Selection of representative set is achieved by nonlineartemporal sampling
![Page 67: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/67.jpg)
Representative Image Setfor a Video Shot
Multimedia Content Analysis, CSIE, CCU
67
Only 2 to 5% of frames are needed in comparisonto achieve good matching results.
In addition to temporal subsampling, spatialsubsampling can also be used to improve matchingefficiency.
![Page 68: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/68.jpg)
Clustering of Video Shots
Multimedia Content Analysis, CSIE, CCU
68
Shots in the same cluster are similar Any other shot outside of the cluster must have a
dissimilarity greater than the dissimilarity betweenany shot in the cluster.
Ci: the ith cluster
![Page 69: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/69.jpg)
Clustering of Video Shots
Multimedia Content Analysis, CSIE, CCU
69
Dissimilarity between two clusters:
Using the shot pair, in which two shotsare in two different clusters, that hasthe largest dissimilarity value.
Dissimilarity between two clustersshould be updated at each iteration.
![Page 70: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/70.jpg)
Clustering of Video Shots
Multimedia Content Analysis, CSIE, CCU
70
![Page 71: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/71.jpg)
Time-Constrained Clustering71
Any two shots that are far apart in time, even if they sharesimilar visual contents, they potentially represent differentcontents or occur in different scenes.
Temporal distance between two shotsThe distance in number of framesfrom the end of the earlier shot to thebeginning of the latter one.
![Page 72: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/72.jpg)
Scene Transition Graph
Multimedia Content Analysis, CSIE, CCU
72
A scene transition graph is a directed graph with the propertyG=(V,E,F)
V: each node represents a cluster of shots E: a directed edge is drawn from node U to W if there is a
shot represented by node U that immediately precedes anyshots represented by node W.
F: a mapping that partitions the set of shots into clusters STG is able to represent compactly the structures of shots and
the temporal flow of the story for many video programs.
![Page 73: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/73.jpg)
Example of STG
Multimedia Content Analysis, CSIE, CCU
73
3 scenes of 9 shots
Sample clustering results
Scene transition graph
![Page 74: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/74.jpg)
Cut Edges
Multimedia Content Analysis, CSIE, CCU
74
An edge is a cut edge, if when is removed, results in two disconnected graphs.
Each partitioned STG Gi represents the interactions of shots in a story unit.
![Page 75: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/75.jpg)
STG After Time Confining and CutEdges Finding
Multimedia Content Analysis, CSIE, CCU
75
![Page 76: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/76.jpg)
Framework76
Shot segmentation Time-constrained
clustering Building of scene
transition graph Scene segmentation
![Page 77: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/77.jpg)
Influences of Parameters
Multimedia Content Analysis, CSIE, CCU
77
Without the knowledge of how long each individual scene lasts,T cannot be approximated well. If T is too large, shots from different scenes are clustered together. If T is too small, shots in the same scene may be separated into
different scenes.
It’s less detrimental to have several story units represent a scene than to have one story unit represent several scenes.
![Page 78: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/78.jpg)
Influences of Time Constraints78
T = 20s. dt(B1,B3) > T
Clustering results are {B1,B2},{A1,A2,A3},{B3,B4},{C1},{D1}
Story unit results are {B1,A1,B2,A2,B3,A3,B4},{C1},{D1}
B1B2
A1A2A3
B3B4 C1
D1
STG
{Bi} are not clustered into one cluster because thereare at least a pair of shots, one from each cluster, that has a temporaldistance dt > T*.
![Page 79: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/79.jpg)
Influences of Time Constraints79
T = 20s.
Clustering results are {B1},{B2,B3},{A1,A2,A3},{B4},{C1},{D1}
Story unit results are {B1},{A1,B2,A2,B3,A3},{B4},{C1},{D1}
B1
A1A2A3
B2B3 C1
D1
B4
STG
![Page 80: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/80.jpg)
Refined Analysis
Multimedia Content Analysis, CSIE, CCU
80
Make the time-window more elasticCompute the duration of each story unit and adjust
Given a story unit, examination of the next storyunit by relaxing the temporal windows andreclustering the shots in these two units. If there exists at least one new cluster that contains
shots from the two units, two story units are merged intoone.
![Page 81: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/81.jpg)
Refined Analysis
Multimedia Content Analysis, CSIE, CCU
81
![Page 82: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/82.jpg)
Example
Multimedia Content Analysis, CSIE, CCU
82
{B1,B2},{A1,A2,A3},{B3,B4},{C1},{D1}
{B1},{B2,B3},{A1,A2,A3},{B4},{C1},{D1}
![Page 83: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/83.jpg)
Results
Multimedia Content Analysis, CSIE, CCU
83
STG constructed from the sitcom “Friends”. There are 35575 frames, each at a spatial resolution of 320x240.There are 313 shots.
![Page 84: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/84.jpg)
Results
Multimedia Content Analysis, CSIE, CCU
84
Time-constrained clustering of video shots is able to identifyindividual story units.
The resulting STG permits rapid nonlinear browsing of longvideo programs.
![Page 85: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/85.jpg)
Variations of Clustering Parameters85
Smaller delta values result in more clusters and thus more story units.Users often prefer over-segmentation rather than under-segmentation.
![Page 86: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/86.jpg)
Refining the Segmentation Results
Multimedia Content Analysis, CSIE, CCU
86
The first two story units in Scene 1 are merged into one.The number of story units in Scene 6 is reduced from 4 to 2.
![Page 87: Lecture 3 Video Syntax Analysis - National Chung Cheng](https://reader033.vdocument.in/reader033/viewer/2022050302/626f06518061141b09534af5/html5/thumbnails/87.jpg)
Conclusion
Multimedia Content Analysis, CSIE, CCU
87
Analysis based on time-constrained clustering andscene transition graph analysis has contributed tothe extraction of story units.
The building of story structure provides nonlinearaccess to video contents.
Identification, integration, and application ofdomain-dependent and semantic features tend toimprove segmentation accuracy.