lecture 3 video syntax analysis - national chung cheng

Wei-Ta Chu

2010/9/30

Video Syntax Analysis1

Multimedia Content Analysis, CSIE, CCU

Types of Shot Change

Abrupt change (hard cut) Cut occurs in a single frame when stopping and restarting the

camera Gradual transition

Fade-in: gradual increase in intensity starting from a black frame Fade-out: gradual decrease in intensity resulting a black frame Dissolve: transiting from the end of one clip to the beginning of

another Wipe: One image is replaced by another with a distinct edge

that forms a shape.…

Examples of Shot Changes

Li and Lee. “Effective detection of various wipe transitions” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 6, pp. 663-673, 2007.

Dissolve

Examples of Fade

Cernekova, et al., “Information theory-based shot cut/fade detection and video summarization” IEEE Trans. on Circuits and Systems for Video Technology, vol. 16, no. 1, pp. 82-91, 2006.

Fade out

Fade in

Different Types of Wipe5

Li and Lee. “Effective detection of various wipe transitions” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 6, pp. 663-673, 2007.

Video example: http://en.wikipedia.org/wiki/Wipe_%28transition%29

Detection Process

Extractfeatures

Similaritycalculating

Boundarydecision

Shot 1 Shot 2 Shot 3 Shot 4

Features

Pixel difference Statistical difference Histograms Compression differences Edge Motion

Pixel Difference

Count the number of pixels that change in valuemore than some threshold.

May be sensitive to camera motion.

1. Pair-wise comparison

Compare the corresponding pixels in two frames.

Problems: sensitive to camera movementE.g. camera panning Improvement: smoothing by a 3x3 window before

comparisonZhang, et al., “Automatic partitioning of full-motion video” Multimedia Systems Journal, vol. 1, pp. 10-28, 1993.

2. Histogram Comparison

Less sensitive to object motion, since it ignores thespatial changes in a frame.

Hi(j): the histogram value for the ith frame, where jis one of the G grey levels.

2. Histogram Comparison–Example11

Example video sequence

The intensity histogram ofthe first three frames

2. Histogram Comparison

Color histogram difference

pi(r,g,b) is the number of pixels of color (r,g,b) in frame Ii of N pixels.Each color component is discritized to 2B different values.

3. Likelihood Ratio

Compare corresponding regions (blocks) in two successiveframes based on second-order statistical characteristics oftheir intensity values.

Then a camera break can be declared whenever the totalnumber of sample areas whose likelihood ratio exceeds thethreshold is sufficiently large

Raise the tolerance of slow and small object motion from frameto frame.

mi: mean intensity value for a given regionSi: variances for a given region

4. Edge Change Ratio

Zabih, et al., “A feature-based algorithm for detecting and classifying scene breaks” Proc. Of ACM Multimedia, pp. 189-200,1995.

4. Edge Change Ratio

4. Edge Change Ratio16

Edge change ratio

5. Motion Vectors17

Using the direction of motionprediction to be the cues for shotchange detection

Pei, et al., “Scene-effect detection and insertion MPEGencoding scheme for video browsing and error concealment” IEEE Trans. on Multimedia, vol. 7, no. 4, pp. 606-614, 2005.

5. Motion Vectors

Using motion vector information to filter out falsepositives

Zhang, et al., “Automatic partitioning of full-motion video” Multimedia Systems Journal, vol. 1, pp. 10-28, 1993.

6. Differences in DCT domain

Discrete Cosine Transform (DCT) coefficients 1. Select subset of blocks 2. Select subset of DCT coefficients of these blocks 3. Concatenate selected coefficients of selected blocks as a

vector 4. Calculate the similarity of two coefficient vectors

Arman, et al., “Image processing on encoded video sequences” Multimedia Systems Journal, vol. 1, no. 5, pp. 211-219, 1994.

Gradual Transition Detection

Cuts or abrupt change

Gradual transition

1. Twin-Comparison Approach

Zhang, et al., “Automatic partitioning of full-motion video” Multimedia Systems Journal, vol.1, pp. 10-28, 1993.

Lienhart, R., “Comparison of automatic shot boundary detectionalgorithms” Proc. of SPIE Storage and Retrieval for Image and VideoDatabases VII, vol. 3656, pp. 290-301, 1999.

3. Characterizing a Wipe Transition

Evaluation

Precision The percentage of retrieved items that are desired items

Recall The percentage of desired items that are retrieved.

Precision =# Correctly retrieved items

# All retrieved items=

# Correctly retrieved items

# Correctly retrieved items + # Falsely retrieved items

Recall =# Correctly retrieved items

# All relevant items=

# Correctly retrieved items

# Correctly retrieved items + # Items that are not retrieved

Evaluation–Other Terms

Miss # Items that are not retrieved

True positive (TP) # Correctly retrieved items

False positive (FP) # Falsely retrieved items

True negative (TN) # Correctly missed items

False negative (FN) # Items that are not retrieved

Actualpositive

Actualnegative

Predictedpositive

Predictednegative

Evaluation

Actualpositive

Actualnegative

Predictedpositive

Predictednegative

Detected(retrieved)

Relevant(ground truth)

TPFP FN

Relationship between Precision & Recall

Precision-Recall (PR) curve

Relationship between True Positive andFalse Positive

Receiver Operator Characteristic (ROC) curve

Using PR or ROC Curves?

ROC curves can present an overly optimistic view of analgorithm’s performance if there is a large skew in the class distribution.

Number of true negative examples greatly exceeds thenumber of positive examples. Thus a large change in thenumber in false positives can lead to a small change in thefalse positive rate.

Precision compares false positives to true positives and bettercaptures the algorithm’s performance.

Davis, et al., “The relationship between precision-recall and ROC curves” Proc. of International Conference on Machine Learning, pp. 233-240, 2006.

Comparison of Shot BoundaryDetection Techniques

MethodsHistograms, region histograms, running histograms,

motion-compensated pixel differences, DCT coefficientdifferences

Evaluation dataVideo type # Frames Cuts Gradual transitions

TV 133204 831 42

News 81595 293 99

Movie 142507 564 95

Commercial 51733 755 254

Misc. 10706 64 16

Total 419745 2507 506

Methods Compared

Histogram (64-bin gray-level) difference, single threshold Region (block) histogram

16 blocks, 64 gray-scale histograms, difference threshold for each block, and countthreshold for changed blocks

Running histogram (Twin method) 64 gray-scale histogram for each frame, twin thresholds Compute motion vectors. If excessive motion, reject gradual changes

Motion compensated pixel difference 12 blocks per frame, motion vector for each block Compute average residual errors, if larger than high threshold, detected as a cut Use cumulative errors to detect gradual changes (similar to above) Use motion vectors to reject false gradual changes

DCT difference Concatenate 15 coefficients of same locations from different blocks to form a vector Compute (1-inner product of two vectors from consecutive frames)

PR Curve for TV program

PR Curve for News program

PR Curve for Movie Videos

PR Curve for Commercials

PR Curve for All Data

PR Curve for All Data–Cut Only

Observations

Histogram-based method is consistent Produced the first or second best precision Simplicity & straightforward

Region algorithm seems to be the best Where recall is not the highest priority

Running algorithm seems to be the best Where recall is important Motion vector is helpful to reduce false positives

DCT the worst Large number of false positives in black frames

References

J.S. Boreczky, et al., "Comparison of video shot boundary detectiontechniques" Proc. of SPIE Conference on Storage and Retrieval forImage and Video Databases, vol. 2670, 1996. (must read)

R. Lienhart, "Comparison of automatic shot boundary detectionalgorithms" Proc. of SPIE Storage and Retrieval for Image andVideo Databases VII, vol. 3656, pp. 290-301, 1999.

J. Yuan, et al., "A formal study of shot boundary detection" IEEETrans. on Circuits and Systems for Video Technology, vol. 17, no. 2,pp. 168-186, 2007.

A. Hanjalic, "Shot-boundary detection: unraveled or resolved?" IEEETrans. on Circuits and Systems for Video Technology, vol. 12, no. 2,pp. 90-105, 2002.

Edge41

Edge42

An edge is a set of connected pixels that lie on the boundarybetween two regions.

Chapters 10 of “Digital Image Processing” by R.C. Gonzalez and R.E. Woods, Prentice Hall, 2nd

edition, 2001

Gradient Operators44

Roberts cross-gradient operators:

Prewitt operators:

Sobel operators:

Edge Examples

Edge Examples–after smoothing

Edge Examples

Canny Edge Detectors48

Step 1: the image is smoothed by Gaussian convolution Step 2: a 2D first derivative operator is applied to the

smoothed image Step 3: non-maximal suppression

Edges give rise to ridges in the gradient magnitude image. Thealgorithm tracks along the top of these ridges and sets to zero all pixelsthat are not actually on the ridge.

http://homepages.inf.ed.ac.uk/rbf/HIPR2/canny.htm

Very Brief Introduction of DiscreteCosine Transform

Spatial Frequency and DCT

Definition of DCT

2D DCT

1D DCT53

DCT Basis

Example

Discrete Cosine Transform

DCT converts a block of pixelsinto a block of transformcoefficients, which representthe spatial frequency.

Each coefficient is a weightapplied to an appropriatebasis function.

Any gray-scale 8x8 pixel blockcan be fully represented by aweighted sum of these 64 basisfunctions.

Increasing horizontal frequency

Increasingverticalfrequency

“DC” basis function

Intra-Frame Encoding (JPEG Compression)

Scene Transition Graph62

Yeung, et al. “Segmentation of video by clustering and graph analysis” Computer Vision and Image Understanding, vol. 71, no. 1, pp. 94-109, 1998.

Observations

Shots in a scene are often repetitive. We are ableto classify shots by grouping shots of similar visualcontents.

Often, a scene is made up of temporally adjacentshots indicating their interrelationships.

Similarity of Video Shots

D(.,.) measures the dissimilarity between two image frames.

Similarity of Video Shots65

Dissimilarity based on color histogram intersection

Dissimilarity based on luminance projection

Yeungand Liu, “Efficient matching and clustering of video shots” Proc. of IEEE International Conference on Image Processing,vol. 1, pp. 338-341, 1995.

Representative Image Setfor a Video Shot

Selection of representative set is achieved by nonlineartemporal sampling

Representative Image Setfor a Video Shot

Only 2 to 5% of frames are needed in comparisonto achieve good matching results.

In addition to temporal subsampling, spatialsubsampling can also be used to improve matchingefficiency.

Clustering of Video Shots

Shots in the same cluster are similar Any other shot outside of the cluster must have a

dissimilarity greater than the dissimilarity betweenany shot in the cluster.

Ci: the ith cluster

Dissimilarity between two clusters:

Using the shot pair, in which two shotsare in two different clusters, that hasthe largest dissimilarity value.

Dissimilarity between two clustersshould be updated at each iteration.

Time-Constrained Clustering71

Any two shots that are far apart in time, even if they sharesimilar visual contents, they potentially represent differentcontents or occur in different scenes.

Temporal distance between two shotsThe distance in number of framesfrom the end of the earlier shot to thebeginning of the latter one.

Scene Transition Graph

A scene transition graph is a directed graph with the propertyG=(V,E,F)

V: each node represents a cluster of shots E: a directed edge is drawn from node U to W if there is a

shot represented by node U that immediately precedes anyshots represented by node W.

F: a mapping that partitions the set of shots into clusters STG is able to represent compactly the structures of shots and

the temporal flow of the story for many video programs.

Example of STG

3 scenes of 9 shots

Sample clustering results

Scene transition graph

Cut Edges

An edge is a cut edge, if when is removed, results in two disconnected graphs.

Each partitioned STG Gi represents the interactions of shots in a story unit.

STG After Time Confining and CutEdges Finding

Framework76

Shot segmentation Time-constrained

clustering Building of scene

transition graph Scene segmentation

Influences of Parameters

Without the knowledge of how long each individual scene lasts,T cannot be approximated well. If T is too large, shots from different scenes are clustered together. If T is too small, shots in the same scene may be separated into

different scenes.

It’s less detrimental to have several story units represent a scene than to have one story unit represent several scenes.

Influences of Time Constraints78

T = 20s. dt(B1,B3) > T

Clustering results are {B1,B2},{A1,A2,A3},{B3,B4},{C1},{D1}

Story unit results are {B1,A1,B2,A2,B3,A3,B4},{C1},{D1}

A1A2A3

B3B4 C1

{Bi} are not clustered into one cluster because thereare at least a pair of shots, one from each cluster, that has a temporaldistance dt > T*.

Influences of Time Constraints79

T = 20s.

Clustering results are {B1},{B2,B3},{A1,A2,A3},{B4},{C1},{D1}

Story unit results are {B1},{A1,B2,A2,B3,A3},{B4},{C1},{D1}

A1A2A3

B2B3 C1

Refined Analysis

Make the time-window more elasticCompute the duration of each story unit and adjust

Given a story unit, examination of the next storyunit by relaxing the temporal windows andreclustering the shots in these two units. If there exists at least one new cluster that contains

shots from the two units, two story units are merged intoone.

Refined Analysis

Example

{B1,B2},{A1,A2,A3},{B3,B4},{C1},{D1}

{B1},{B2,B3},{A1,A2,A3},{B4},{C1},{D1}

Results

STG constructed from the sitcom “Friends”. There are 35575 frames, each at a spatial resolution of 320x240.There are 313 shots.

Results

Time-constrained clustering of video shots is able to identifyindividual story units.

The resulting STG permits rapid nonlinear browsing of longvideo programs.

Variations of Clustering Parameters85

Smaller delta values result in more clusters and thus more story units.Users often prefer over-segmentation rather than under-segmentation.

Refining the Segmentation Results

The first two story units in Scene 1 are merged into one.The number of story units in Scene 6 is reduced from 4 to 2.

Conclusion

Analysis based on time-constrained clustering andscene transition graph analysis has contributed tothe extraction of story units.

The building of story structure provides nonlinearaccess to video contents.

Identification, integration, and application ofdomain-dependent and semantic features tend toimprove segmentation accuracy.

lecture 3 video syntax analysis - national chung cheng

Documents

national chung cheng university long term evolution and

spring school - national chung cheng university

welcome [] · 230 - 249 anglican high school catholic high...

2011 chung cheng high sch main sec 2 eoy exam paper 1

department of law national chung cheng university …

linear algebra - national chung cheng university

chung cheng high school student handbook (year...

cse245: computer-aided circuit simulation and verification...

abstract syntax cheng-chia chen. 2 concrete v.s. abstract...

national chung cheng university

chung cheng university 2015 international summer school

national chung cheng university public switched telephone

chung-cheng lee department of business administration,...

candidate number name: age: election platform: cheng chung

1, fuh-jyh jan 2,3,* and cheng-chung chang 4,5,

chung cheng armed forces preparatory school 109 10 …

lecture 6: networking j. s. chou, p.e., ph.d. national chung...

caritas wu cheng-chung secondary school

2011 chung cheng high main sec 1 final exam paper 2

c 2020 ti-chung cheng - ideals