final doc (1)

Key Frame Extraction On MPEG by using Threshold Algorithm

CHAPTER 1

INTRODUCTION

1.1 LITERATURE REVIEW

Recent years have witnessed an enormous increase in video data on the

internet. This rapid increase demands efficient techniques for management and

storage of video data. Video summarization is one of the commonly used mechanisms

to build an efficient video archiving system. The video summarization methods

generate summaries of the videos which are the sequences of stationary or moving

images (Money and Agius, 2008). Key frame extraction is a widely used method for

video summarization. The key frames are the characteristic frames of the video which

render limited, but meaningful information about the contents of the video (Li et al.,

2001).

The researchers have attempted to exploit various features for the extraction

of key frames in videos. These features have been utilized in a variety of different

ways. Some of the low level features which are commonly used include color

histogram, frame correlation, motion information and edge histogram etc. (Jiang et al.,

2009). Zhang et al. (1997) used the color histogram difference between the current

frame and the last extracted key frame to draw out key frames from the video. Gunsel

and Tekalp (1998) compared the histogram of current frame with the average color

histograms of the previous frames to compute the discontinuity value.

A thorough survey of existing techniques reveals that the researchers have

used many different visual features for the problem of key frame extraction. In our

project we dealt with the frame difference measures such as color histogram, frame

correlation and edge orientation histogram for the extraction of key frame.

1.2 OVERVIEW OF PROJECT

Efficient key frame extraction enables efficient cataloguing and retrieval with

large video collections. Video is rich in content and it results in a tremendous amount

of data to process. This can be made easier by only processing some frames, such as

the key frames of video. In general, a key frame extraction technique must be fully

automated in nature and must use the contents of the video to generate summary.

Department. of ECE, MRITS 1


Theoretically, key frames must be extracted using high level features such as

objects, actions and events. However, the key frame extraction based on high level

features is mostly specific to certain applications and usually low level features have

been employed. Some of the examples of low features that are commonly used are

colour histogram, correlation, moments, edges and motion features. These low level

features can then be employed to derive high level features to generate domain

specific applications.

A common methodology is to compare consecutive frames based on some

low level Frame Difference Measures (FDMs) and extract a key frame if this

difference satisfies a certain threshold value. The low level features used in our

project are

(1) Colour histogram

(2) Frame correlation

(3) Edge orientation histogram

The basic block diagram of Elicitation of key frames in sports video based on

multiple frame difference features is shown in Fig.1.1. It consists of Extraction of

frames, color histogram, correlation, and edge orientation histogram and threshold

logic modules. Extraction of frames module extract all the frames from the given

input video and the keyframes are identified based on color histogram, correlation and

edge orientation histogram methods by making use of threshold logic. In our work,

the results from these three methods are compared for sample video (Foot Ball),

Cricket video, Hockey video and Foot Ball Video.

Colour Histogram for the frames is calculated in HSV color space. HSV

stands for Hue, Saturation and Value. HSV colour model is based on how colors

appear to a human observer. From the colour histograms of these three channels

between the frames, colour histogram difference measure is calculated. This measure

lies between -64 to 0.

Frame correlation is done by using Pearson’s Distance. Pearson’s Distance is

defined as one minus Pearson’s correlation coefficient. Pearson's correlation

coefficient between two variables is defined as the covariance of the two variables


http://en.wikipedia.org/wiki/Covariance


divided by the product of their standard deviations. The Pearson correlation

coefficient falls between [-1, 1] and the Pearson distance lies in [0, 1].

The third measure used for computing is the histogram of edge orientation.

Edge orientation histogram is done by sobel operator. The edges are first computed

using horizontal and vertical sobel operators which are then used to find gradient and

angle of edges. The angles are then used to build a histogram of edge orientation. The

range of values for edge orientation measure is 0 to 82.


Thr

esho

ld

logi

c

Thr

esho

ld

logi

cK

ey f

ram

es

Key

fra

mes

Thr

esho

ld

logi

c

Edg

e

orie

ntat

ion

hist

ogra

m

Inpu

t

Vid

eo

Cor

rela

tion

Col

our

hist

ogra

m

Ext

ract

ion

of

all f

ram

es in

the

vide

o

Fea

ture

Ext

ract

ion

Key

fra

mes

http://en.wikipedia.org/wiki/Standard_deviations


Fig 1.1: Basic Block DiagramInput Video

Cricket, football and hockey are taken as input videos to this work. The video

can be in format of .avi, .flv, .mov, .mp4, .mpg, .rm etc. To process this video, frames

have to be extracted. The following is a brief explanation of the different video file

formats found commonly

AVI (.avi)

The AVI (Audio Video Interleave) format was developed by Microsoft. The

AVI format is supported by all computers running Windows, and by all the most

popular web browsers.

MWV (.mwv)

The Windows Media format is developed by Microsoft. Windows Media is a

common format on the Internet, but Windows Media movies cannot be played on

non-Windows computer without an extra (free) component installed. Some later

Windows Media movies cannot play at all on non-Windows computers because no

player is available.

MPEG (.mpg/.mpeg)

The MPEG (Moving Pictures Expert Group) format is the most popular format

on the Internet. It is cross-platform, and supported by all the most popular web

browsers.

QuickTime (.mov)

The QuickTime format is developed by Apple. QuickTime is a common

format on the Internet, but QuickTime movies cannot be played on a Windows

computer without an extra (free) component installed.

Flash (.flv/.swf)


http://www.fileinfo.com/extension/rm

http://www.fileinfo.com/extension/mpg

http://www.fileinfo.com/extension/mp4

http://www.fileinfo.com/extension/mov

http://www.fileinfo.com/extension/flv

http://www.fileinfo.com/extension/avi


The Flash (Shockwave) format was developed by Macromedia. The

Shockwave format requires an extra component to play. But this component comes

preinstalled with web browsers like Firefox and Internet Explorer.

3GP(.3gp)

The 3gp format is both an audio and video format that was designed as a

multimedia format for transmitting audio and video files between 3G cell phones and

the internet. It is a 3G streaming video format, mainly used to meet the high

transmission speed of 3G networks and is currently the most common type of mobile

phone video format.

Realmedia(.rm)

Real Media is a format which was created my Real Networks. It contains

both audio and video data and typically used for streaming media files over the

internet. Real media can play on a wide variety of media players for both Mac and

Windows platforms. The real player is the most compatible.

Mpeg-4(.mp4)

Mpeg-4 is the new format for the internet. In fact, You Tube recommends

using MP4. You Tube accepts multiple formats, and then converts them all to .flv

or .mp4 for distribution. More and more online video publishers are moving to MP4

as the internet sharing format for both Flash players and HTML5.

Advances Streaming Format (.asf)

ASF is a subset of the mwv format and was developed by Microsoft. It is

intended for streaming and is used to support playback from digital media and HTTP

servers, and to support storage devices such as hard disks. It can be compressed using

a variety of video codes. The most common files types that are contained within an

ASF file are Windows Media Audio, and Windows Media video

Frame Extraction

The video taken as input is divided into frames in this section. To do this task

we have used mmreader and extracted frames. The input to mmreader can be any of

the above mentioned formats.



Feature extraction

The features of the extracted key frames can be colour, edge, motion or textual

features. The low level features such as colour histogram, frame correlation and edge

histogram are obtained using certain frame difference measures. Then the frame

difference values are calculated for all extracted frames for different videos.

Key frame extraction

To start the extraction process, the first frame is declared as a key frame. Then the

frame difference is computed between the current frame and the last extracted key

frame. If the frame difference satisfies a certain threshold condition, then the current

frame is selected as key frame. This process is repeated for all frames in the video.

1.3 ORGANISATION OF THESIS

In view of the proposed research work, explanation of theoretical aspects used

in this work is presented as per the sequence described below.

Chapter 2 Describes the need for key frames and introduction to frame difference

measures for the extraction of key frames.

Chapter 3 Deals with the basic colour models and different frame difference

measures for the extraction of key frames based on colour histogram.

Chapter 4 Explains different correlation coefficients for the extraction of key

frames.

Chapter 5 Describes fundamentals of edge detection and different edge detection

operators for the extraction of key frames.

Chapter 6 Deals with results and discussions.

Chapter 7 has conclusions and future scope.

CHAPTER 2



VIDEO COMPACTION USING KEY FRAME EXTRACTION

2.1 DEFINITION

Key frame is the frame which can represent the salient content and distinct

information as compared to the previous frame. Key frame extraction is a widely used

method for video summarization that is the Key frames extracted will summarize the

characteristics of the video. Video summarization is a method to generate succinct

version of a video by eliminating the redundant frames. The method for video

summarization is shown in Fig 2.1. The effective way of generating key frames is

shown in Fig.2.2.

.

Fig. 2.1: Scheme For Video Summary

Fig. 2.2: The Basic Framework Of The Key Frame Extraction Algorithm


Frame sequence

Key frame extraction

Video stream


2.2 NEED FOR KEY FRAMES

Key frame extraction is an essential part in video analysis and management,

providing a suitable video summarization for video indexing, browsing and retrieval.

General video is rich in content and consists of 24 frames per second. Hence a one

hour video would contain around 24x60x60 frames. Most of these frames contain

redundant information and thus key frame extraction is essential. Thus, the use of key

frames reduces the amount of data required in video indexing and provides the

framework for dealing with the video content. A basic rule of key frame extraction is

that key frame extraction would rather be wrong than not enough. So it is necessary to

discard the frames with repetitive or redundant information during the extraction.

To extract valid information from video, process video data efficiently, and

reduce the transfer stress of network, more and more attention is being paid to the

video processing technology. The amount of data in video processing is significantly

reduced by using video segmentation and key-frame extraction. To reduce the transfer

stress in network and invalid information transmission, the transmission, storage and

management techniques of video information become more and more important.

2.3 EXTRACTION OF KEY FRAMES USING FRAME

DIFFERENCE MEASURES (FDMs)

2.3.1 INTRODUCTION TO FDMs

A common methodology for extraction of key frames is to compare

consecutive frames based on some low level Frame Difference Measures (FDMs).

The frame difference is measured and if this difference exceeds a certain threshold,

then that frame is selected as a key frame otherwise discard the frame. Some of the

low level features which are commonly used for the extraction purpose include colour

histogram, frame correlation, motion information and edge histogram etc.

2.3.2 KEY FRAME EXTRACTION

To start the extraction process, the first frame is declared as a key frame.

Instead of computing one histogram for the entire image, we divide the image shown

in Fig 2.3(a) into total of Ts sections each of size m*m, as shown in fig 2.3(b). This is



to effectively measure the level of difference between the two frames. Then the frame

difference is computed between the current frame and the last extracted key frame.

This frame difference is computed by using colour histogram, correlation, edge

orientation histogram. Then the obtained frame difference is compared with certain

threshold, if the difference satisfies with the threshold condition then the current

frame is selected as a key frame. By continuously repeating the procedure for all

frames we can extract the key frames.

Fig 2.3(a): Original image Fig 2.3(b): Division of image in to sections



CHAPTER 3

COLOUR HISTOGRAM DIFFERENCE

3.1 INTRODUCTION

The colour histograms have been commonly used for key frame extraction in

frame difference based techniques. This is because the colour is one of the most

important visual features to describe an image. Colour histograms are easy to compute

and are robust in case of small camera motions. An image histogram is a type

of histogram that acts as a graphical representation of the tonal distribution in

an image. It plots the number of pixels for each tonal value. By looking at the

histogram for a specific image a viewer will be able to judge the entire tonal

distribution at a glance.

The horizontal axis of the graph represents the tonal variations, while the vertical

axis represents the number of pixels in that particular tone. The left side of the

horizontal axis represents the black and dark areas, the middle represents medium

grey and the right hand side represents light and pure white areas. The vertical axis

represents the size of the area that is captured in each one of these zones. Thus, the

histogram for a very dark image will have the majority of its data points on the left

side and centre of the graph. Conversely, the histogram for a very bright image with

few dark areas and/or shadows will have most of its data points on the right side and

centre of the graph. The representation of color histogram is shown in the Fig.3.1.

Fig 3.1: Colour Histogram representation of image


http://en.wikipedia.org/wiki/Vertical_axis

http://en.wikipedia.org/wiki/Vertical_axis

http://en.wikipedia.org/wiki/Graphics

http://en.wikipedia.org/wiki/Horizontal_axis

http://en.wikipedia.org/wiki/Lightness_(color)

http://en.wikipedia.org/wiki/Graphical_representation

http://en.wikipedia.org/wiki/Histogram


3.2COLOR IMAGE PROCESSING

3.2.1 Color fundamentals

Basically, the colors the humans and some other animals perceive in an object

are determined by the nature of the light reflected from the object. Visible light is

composed of relatively narrow band of frequencies in the electromagnetic spectrum.

A body that reflects light that is balanced in all visible wavelengths appears

white to the observer. However, a body that favors reflectance in a limited range of

the visible spectrum exhibits some shades of colors. For example, Green objects

reflect light with wavelengths primarily in the 500-570nm range while observing most

of the energy at other wavelengths

Characterization of light is central to the science of color. If the light is

achromatic (void of color), It’s only attribute is its intensity. Achromatic light is what

viewers see on a black and white television set and it has been an implicit component

of discussion of Image processing thus far. The term gray level refers to a scalar

measure of intensity that ranges from black to grays and finally to white.

Chromatic light spans the electromagnetic spectrum from approximately 400 -

700nm.three basic quantities are used to describe the quality of a chromatic light

source: radiance, luminance and brightness. Radiance is total amount of energy that

flows from light source and is usually measure is Watts (W). Luminance measured in

lumens (ln), gives a measure of amount of energy an observer perceives from light

source. For example, light emitted from a source operating in the far infrared region

of the spectrum could have significant energy (radiance), but an observer would

hardly perceive it. Its luminance would be almost zero. Finally, brightness is a

subjective descriptor that is practically impossible to measure. It embodies the

achromatic notion of intensity and is one of the key factors in describing color

sensation.



3.2.2 Primary colors

Cones are the sensors in the eye responsible for color vision. Cones in the

human eye can be divided in to thee principle sensing categories corresponding

roughly to red green and blue. Approximately 65% of all cones are sensitive to red

light.33% are sensitive to green light and only 2% are sensitive to blue (but the blue

cones are the most sensitive).Due to these absorption characteristics of human eye,

colors are seen as variable combinations of the so called primary colors Red (R),

Green (G), Blue (B).The wavelength values to the three primary colors:

Blue=435.8nm, Green=546.1nm and red=700nm

The primary colors can be added to produce the secondary colors of light

magenta(red plus blue),cyan(green plus blue) and yellow(red plus green)Mixing these

primary or secondary with its opposite primary color, in the right intensity produces

white light. Differentiating between primary colors of light and the primary colors of

pigments or solourants is important. In the later, a primary color is defined as one that

subtracts or absorbs the primary color of light and reflects or transmits the other two.

Therefore, the primary colors of pigments are magenta, cyan and yellow and the

secondary colors red, green and blue. A proper combination of the three pigment

primaries or a secondary with its opposite primary produces black.

3.2.3 Hue and saturation

The characteristics generally used to distinguish one color from another are

brightness, hue and saturation. Brightness embodies the chromatic notion of intensity.

Hue represents dominant color as perceived by an observer. Saturation refers to

relative purity or the amount of white light mixed with hue. The pure spectrum color

is fully saturated. Colors sic as pink (red and white) and lavender (violet and white)

are less saturated, with the degree of saturation being inversely proportional to the

amount of white light added. Hue and saturation taken together are called

chromaticity and therefore a color may be characterized by its brightness and

chromaticity.



3.2.4 Importance of color image processing

The use of color in image processing is motivated by two principle factors

(1) First, color is a powerful descriptor that often simplifies object identification

and extraction from a scene.

(2) Second, humans can discern thousands of color shades ad intensities compared

to about every two dozen shades of gray. This second factor is particularly

important in manual. (i.e., when performed by human) image analysis.

3.3 COLOR MODELS

3.3.1 Introduction to Color models

The purpose of color model is to facilitate the specification of color in some

standards, generally accepted way in essence; a color model is the specification of a

coordinate system and a subspace within that system where each color is represented

by a single point.

Most color models in use today are oriented either towards hardware or

towards application where color manipulation is o goal .In terms of digital image

processing, the hardware oriented models most commonly used in practice are the

RGB(red, green, blue) model for color monitors and a broad class of color video

cameras: the CMY(cyan magenta and yellow)and CMYK(cyan, magenta ,yellow and

black)models for color primitive; and the HSI (hue saturation and intensity)models,

which corresponds closely with the way humans describe and interpret colors. The

HIS model also has the advantage that it decouples the color and gray-scale

information in an image making it suitable for many of gray-scale techniques

developed. There are numerous color models in use today due to the fact that color

science is a broad field that encompasses many areas of applications.

3.3.2 RGB color model

The RGB color model is an additive color model in which red, green and blue

light are added together in various ways to reproduce a broad array of colors. The

name of model comes from the initials of the three additive primary colors red, green

and blue.



The main purpose f the RGB color model is for the sensing, representation

and display of images in electronic systems such as televisions and computers, though

it has also been used in conventional photography. Before the electronic age, the RGB

color model already had a solid theory behind it based on human perception of colors.

Typical RGB input devices are color TV and video cameras, image scanners

and digital cameras. Typical RGB output devices are TV sets of various technologies

(CRT, LCD, and Plasma etc), computer and mobile phone displays and video

projectors, multicolor LED displays and large screens as jumbotron etc. Color printers

on the other hand are not RGB devices but subtractive color devices (typically CMYK

color models).

Fig. 3.2: RGB Colour Model

Fig.3.2 shows the RGB colour mode. To form a color with RGB, three colored light

beams (one red, one green and one blue) must be superimposed (for e.g., by emission

from a black screen, or by reflection from a white screen).Each of the three beams is

called a component of that color, and each of them can have an arbitrary intensity

from fully off to fully on in the mixture.

3.3.2.1 Representation of RGB

We can represent the RGB model by using a unit cube. Each point in the cube

(or vector where the other point is the origin) represents a specific color. This model

is the best for setting the electron guns for a CRT.Note that for the complimentary

colures the sum of the values equals white light (1,1,1).For example:

Red(1,0,1)+cyan(0,1,1)=white(1,1,1)



Green(0,1,0)+magenta(1,0,1)=white(1,1,1)

Blue(0,0,1)+yellow(1,1,0)=white(1,1,1)

Fig. 3.3 Cartesian coordinates (3D)

MATLAB code for extraction of a particular component:

R=RGB (:,:, 1) //extracting red component

G=RGB (:,:,2) //extracting green component

B=RGB(:,:,3) //extracting blue component

3.3.3 HSV color model

The characteristics generally used to distinguish one color from another are

brightness hue and saturation. Brightness embodies the chromatic motion of intensity.

(1) Hue represents the dominant wavelength of the light wave. Thus, when we

call an object red, orange or yellow, we are specifying its hue.

(2) Saturation refers to the relative purity or the amount of white light mixed

with the hue. The pure spectrum colors are fully saturated.

The HSV (Hue saturation and value) color model is more intuitive than the RGB

color model. The user specifies a color (hue) and then adds white or black. There are



three color parameters: Hue, Saturation and value. Change in the saturation parameter

corresponds to adding or subtracting whiter and changing the value parameter

corresponds to adding or subtracting black. The HSV model is shown in Fig.3.4.

Fig. 3.4: HSV Color Model

HSV improves on the color cube representation of RGB by arranging colors of

each hue in a radial slice around a simple axis of neutral colors which ranges from

black at the bottom to white at the top. The fully saturated colors of each hue lie in a

circle, a color wheel.

Matlab code for extraction of a particular component:

H=HSV (:,:,1); //extracting hue component

S=HSV (:,:,2); //extracting saturation component

V=HSV (:,:,3); //extracting value

Conversion from RGB to HSV:

Let r, g, b [0, 1] be the red, green and blue coordinates respectively, of a color in

RGB space.

Let max be the greatest of r, g and b and min the least.

To find the hue angle h [0,360], compute:



0, if max=min

H= (60*((g-b)/ (max-min)) +360), if max=r

(60*((b-r)/ (max-min) +120), if max=g

(60*((r-g)/ (max-min) +240), if max=b

The values for s and v of an HSV color are defined as follows:

0, if max=0

S= ((max-min)/min) =1-(min/max), otherwise

3.3.4 CMYK Color model

It is possible to achieve a large range of colors seen by humans combining cyan,

magenta and yellow transparent dyes/inks on white substrate. These are the

subtractive primary colors. Often a fourth black is added to improve reproduction of

some dark colors. This is called “CMY” or “CMYK”colour space. The cyan ink will

reflect all but the red light, the yellow ink will reflect all but the blue light and the

magenta ink will reflect all but the green light. This is because cyan light is an equal

mixture of green and blue, yellow is a mixture of red, green and magenta light is an

equal mixture of red and blue.

Cyan=green+blue, so light reflected from a cyan pigment has no red component

i.e., the red is absorbed by cyan. Similarly magenta subtracts green and yellow

subtracts blue. Printers usually use four colors: cyan, yellow, magenta and black. This

is because cyan, yellow and magenta together produce a dark gray rather than a true

black. The conversion between the RGB and CMY is easily computed as below:

C=1-R; M=1-G; Y=1-B

R=1-C; G=1-M; B=1-Y



3.3.5 YIQ Color Model

This model was designed to separate chrominance from luminance. This was a

requirement in the early days of color television when black and white sets were

expected to pick up and display what were originally color pictures .The Y-channel

contains luminance information (sufficient for black and television sets) while the I

and Q channels (in-phase and in quadrature) carried the color information .A color

television set would take these three channels Y, I and Q and the information back to

R, G and B levels on a display on a screen.

3.3.6 HIS color model

In this color model, as in YIQ model, luminance or intensity (I) is decoupled

from the color information which is described by a Hue channel and Saturation

channel .Hue and saturation of colors respond closely to the way humans perceive

color and thus this model is suited for interactive manipulation of color images where

changes occur for each variable shift that corresponds to what the operator expects.

3.3.7 L*a*b* Colour Space

The L*a*b* (Brightness, red-green and yellow blue content) system gives

quantitative expression to the Munsell system of colour classification L*a*b* colour

space is best according to perceptual similarity. It is not dependent on any particular

device. Colours can be set as them are perceived when operating a repro system. In

the analysis L*a*b* is divided into 7 L* levels, 5 a* levels and 5 b* levels. The

problems with L*a*b* colour space is quantization. From Fig. 2 can be seen, that on

each edge the quantization should be coarser, because the volume should be the same

for each subspace. In our tests the volume is smaller for values near the edges.

3.4 COLOUR HISTOGRAM DISCRIMINATION

There are several distance formulas for measuring the similarity of colour

histograms. In general, the techniques for comparing probability distributions, such as

the kolmogoroff-smirnov test are not appropriate for colour histograms. This is

because visual perception determines similarity rather than closeness of the

probability distributions. Essentially, the colour distance formulas arrive at a measure

of similarity between images based on the perception of colour content. Three



distance formulas that have been used for image retrieval including histogram

Euclidean distance, histogram intersection and histogram quadratic (cross) distance.

3.4.1 Histogram Euclidean distance

Let h and g represent two colour histograms. The Euclidean distance between

the colour histograms h and g can be computed as:

In this distance formula, there is only comparison between the identical bins in

the respective histograms. Two different bins may represent perceptually similar

colours but are not compared crosswise. All bins contribute equally to the distance.

3.4.2 Histogram quadratic (cross) distance

The colour histogram quadratic distance was used by the QBIC system

introduced in the cross distance formula is given by:

The cross distance formula considers the cross-correlation between histogram

bins based on the perceptual similarity of the colours represented by the bins. And the

set of all cross-correlation values are represented by a matrix A, which is called a

similarity matrix. And a (i, j) the element in the similarity matrix A is given by: for

RGB space,

Where dij is the L2 distance between the colour i and j in the RGB space. In the case

that quantization of the colour space is not perceptually uniform the cross term

contributes to the perceptual distance between colour bins.

For HSV space it is given in by:



3.4.3 Histogram intersection distance

The colour histogram intersection was proposed for colour image retrieval in the

intersection of histograms h and g is given by:

Where |h| and |g| gives the magnitude of each histogram, which is equal to the

number of samples. Colours not present in the user's query image do not contribute to

the intersection distance. This reduces the contribution of background colours. The

sum is normalized by the histogram with fewest samples.

3.5 FORMULATION:

For computing FDM, colour histogram has been built in HSV colour space by

performing a quantization step to reduce the number of distinct colours to 64. Instead

of computing one histogram for the entire image, we divided image in a total of ‘Ts’

sections, each of size mxm. This is to effectively measure the level of difference

between two frames. Each corresponding section of one frame is compared with the

corresponding section of other frame using the histogram intersection mechanism.

The histogram difference HDi,j,s between two corresponding sections ‘s’ of histogram

His of frame i and histogram Hjs of frame j is defined as:

The histogram difference “HD” between two frames i and j is then calculated by

taking the average of the difference measure between each section.



CHAPTER 4

CORRELATION DIFFERENCE HISTOGRAM

4.1 INTRODUCTION

The correlation coefficients have been very popular scheme to find similarity

between two data sets. The correlation coefficients are invariant to brightness .The

cross correlation is used to determine the degree of similarity between two similar

images, or, with the addition of a linear offset to one of the images, the spatial shift or

spatial correlation between the images. The degree of similarity between the two

images is determined by correlation coefficient. The correlation coefficient has value

1 if the two images are identical, 0 if they are completely uncorrelated, and –1 if they

are completely anti-correlatedangles in the contrast.

4.2 TYPES OF CORRELATION COEFFICIENTS

4.2.1. Pearson’s Correlation Coefficient (PCC)

The Pearson’s Correlation Coefficient, r, is widely used in statistical analysis,

pattern recognition, and image processing. Applications on the latter include

comparing two images for image registration purposes, object recognition, and

disparity measurement.

For monochrome digital images, the Pearson’s Correlation Coefficient is described by

Where xi is the intensity of the ith pixel in the first image, yi is the intensity of the ith

pixel in the next image.

The correlation coefficient has value 1 if the two images are identical, 0 if they

are completely uncorrelated, and –1 if they are completely anti-correlated, for

example, if one image is the negative of the other. In theory, they would obtain a

value of 1 for r if the object is intact and a value of less than 1 if alteration or



movement has occurred. In practice, distortions in the imaging system, pixel noise,

slight variations in the object’s position relative to the camera, and other factors

produce an r value less than 1, even if the object has not been moved or physically

altered in any manner. For security applications, typical r values for two digital

images of the same scene, one recorded immediately after the other using the same

imaging system and illumination, range from 0.95 to 0.98.

Interpretation of Correlation Coefficient (r) is shown in Fig.4.1.The value of

correlation coefficient ‘r’ ranges from -1 to +1.

Case1: If r = +1, then the correlation between the two variables is said to be perfect

and positive.

Case2: If r = -1, then the correlation between the two variables is said to be perfect

and negative.

Case3: If r = 0, then there exists no correlation between the variables.

Fig.4.1: Coefficient(r) of Determination between x and y frames

One of the obvious advantages of Pearson’s correlation coefficient is that it

condenses the comparison of the two dimensional images down to a single vector r. The

most widely recognized disadvantage is that it is computationally intensive.

4.2.2 Point-Biserial

The point-biserial correlation coefficient, referred to as rpb, is a special case of

Pearson in which one variable is quantitative and the other variable is dichotomous

and nominal. The calculations simplify since typically the values 1 (presence) and 0

(absence) are used for the dichotomous variable. This simplification is sometimes

expressed as follows:

rpb = (Y1 - Y0) • sqrt (pq)/y,



where Y0 and Y1 are the Y score means for data pairs with an x score of 0 and 1,

respectively, q = 1 - p and p are the proportions of data pairs with x scores of 0 and 1,

respectively, and y is the population standard deviation for the y data. An example

usage might be to determine if one gender accomplished some task significantly better

than the other gender.

4.2.3 Phi Coefficient

If both variables instead are nominal and dichotomous, the Pearson simplifies

even further. First, we need to introduce contingency tables. A contingency table is

two dimensional table containing frequencies by category. For this situation it will be

two by two since each variable can only take on two values, but each dimension will

exceed two when the associated variable is not dichotomous. In addition, column and

row headings and totals are frequently appended so that the contingency table ends up

being n + 2 by m + 2, where n and m are the number of values each variable can take

on.

4.2.4 Biserial Correlation Coefficient

Another measure of association, the biserial correlation coefficient, termed rb, is

similar to the point biserial, but pits quantitative data against ordinal data, but ordinal

data with an underlying continuity but measured discretely as two values

(dichotomous). An example might be test performance vs. anxiety, where anxiety is

designated as either high or low. Presumably, anxiety can take on any value in

between, perhaps beyond, but it may be difficult to measure. We further assume that

anxiety is normally distributed. The formula is very similar to the point-biserial but

yet different; rb = (Y1 - Y0) • (pq/Y) / y, where Y0 and Y1 are the Y score means for data

pairs with an x score of 0 and 1, respectively, q = 1 - p and p are the proportions of

data pairs with x scores of 0 and 1, respectively, and y is the population standard

deviation for the y data, and Y is the height of the standardized normal distribution at

the point z, where P(z'<z)=q and P(z'>z)=p. Since the factor involving p, q, and the

height is always greater than 1, the biserial is always greater than the point-biserial.

4.2.5 Tetrachoric Correlation Coefficient

The tetrachoric correlation coefficient, rtet, is used when both variables are

dichotomous, like the phi, but we need also to be able to assume both variables really

are continuous and normally distributed. Thus it is applied to ordinal vs. ordinal data



which has this characteristic. Ranks are discrete so in this manner it differs from the

Spearman. The formula involves a trigonometric function called cosine. The cosine

function, in its simplest form, is the ratio of two side lengths in a right triangle,

specifically, the side adjacent to the reference angle divided by the length of the

hypotenuse. The formula is:

rtet = cos (180/ (1 + sqrt (BC/AD)).

4.2.6 Rank-Biserial Correlation Coefficient

The rank-biserial correlation coefficient, rrb, is used for dichotomous nominal

data vs. rankings (ordinal). The formula is usually expressed as rrb = 2 •(Y1 - Y0)/n,

where n is the number of data pairs, and Y0 and Y1, again, are the Y score means for

data pairs with an x score of 0 and 1, respectively. These Y scores are ranks. This

formula assumes no tied ranks are present. This may be the same as a Somer's D

statistic for which an online calculator is available.

4.3 FORMULATION

For computing correlation measure, we divide frames into Ts sections of size

mxm. The correlation values of each section are then averaged. The correlation is

measurement for three colour channel values red, green and blue. The correlation

difference CDp,q,s,c of a colour channel ‘c’ between two corresponding sections ‘s’

of frame p and q is defined as

Where s=1………Ts; c=red, green, blue; fic=mean value of channel c for the

frame i ; fj,c=mean value of channel c for the frame j.

The correlations of all sections of frame i and j are averaged to obtain the overall

correlation CDi, j, c for a colour channel.



Then, the overall correlation difference measure CDi, j between frames i and j is

obtained by averaging the value of each colour channel.



CHAPTER 5

EDGE ORIENTATION HISTOGRAM

5.1 INTRODUCTION

Edge detection is one of the most commonly used operations in image

analysis, and there are probably more algorithms in the literature for enhancing and

detecting edges. The reason for this is that edges form the outline of an object. An

edge is the boundary between an object and the background, and indicates the

boundary between overlapping objects. This means that if the edges in an image can

be identified accurately, all of the objects can be located and basic properties such as

area, perimeter, and shape can be measured. Edges define the boundaries between

regions in an image, which helps with segmentation and object recognition. They can

show where shadows fall in an image or any other distinct change in the intensity of

an image. Edge detection is a fundamental of low-level image processing and good

edges are necessary for higher level processing. The problem is that in general edge

detectors behave very poorly. The quality of edge detection is highly dependent on

lighting conditions, the presence of objects of similar intensities, density of edges in

the scene, and noise. The detection of edges is shown in Fig.5.1.

Fig.5.1: Edge detection results



5.2 FUNDAMENTALS OF EDGE DETECTION:

Edge detection refers to the process of identifying and locating sharp

discontinuities in an image. The discontinuities are abrupt changes in pixel intensity

which characterize boundaries of objects in a scene. Classical methods of edge

detection involve convolving the image with an operator (a 2-D filter), which is

constructed to be sensitive to large gradients in the image while returning values of

zero in uniform regions. There are an extremely large number of edge detection

operators available, each designed to be sensitive to certain types of edges. Variables

involved in the selection of an edge detection operator include:

(1) Edge orientation: The geometry of the operator determines a

characteristic direction in which it is most sensitive to edges. Operators

can be optimized to look for horizontal, vertical, or diagonal edges.

(2) Noise environment: Edge detection is difficult in noisy images, since both

the noise and the edges contain high-frequency content. Attempts to

reduce the noise result in blurred and distorted edges. Operators used on

noisy images are typically larger in scope, so they can average enough data

to discount localized noisy pixels. This results in less accurate localization

of the detected edges.

(3) Edge structure: Not all edges involve a step change in intensity. Effects

such as refraction or poor focus can result in objects with boundaries

defined by a gradual change in intensity. The operator needs to be chosen

to be responsive to such a gradual change in those cases. Newer wavelet-

based techniques uses actually characterize the nature of the transition for

each edge in order to distinguish, for example, edges associated with hair

from edges associated with a face.

5.3 EDGE DETECTION OPERATORS

5.3.1 Prewitt’s operator

Prewitt operator is similar to the Sobel operator and is used for detecting

vertical and horizontal edges in images. The Prewitt operator is used in image

processing, particularly within detection algorithms. Technically, it is a discrete


http://en.wikipedia.org/wiki/Difference_operator

http://en.wikipedia.org/wiki/Image_processing

http://en.wikipedia.org/wiki/Image_processing


differentiation operator, computing an approximation of the gradient of the image

intensity function. At each point in the image, the result of the Prewitt operator is

either the corresponding gradient vector or the norm of this vector. The Prewitt

operator is based on convolving the image with a small, separable, and integer valued

filter in horizontal and vertical direction and is therefore relatively inexpensive in

terms of computations. On the other hand, the gradient approximation which it

produces is relatively crude, in particular for high frequency variations in the image.

5.3.2 Canny Operator

Another approach to edge detection using colour information is simply to

extend a traditional intensity based edge detector into the colour space. This method

seeks to take advantage of the known strengths of the traditional edge detector and

tries to overcome its weaknesses by providing more information in the form of three

colour channels rather than a single intensity channel. As the Canny edge detector is

the current standard for intensity based edge detection, it seemed logical to use this

operator as the basis for colour edge detection.

The algorithm runs in 5 separate steps:

1 .Smoothing: Blurring of the image to remove noise.

2 Finding gradients: The edges should be marked where the gradients of the image

has large magnitudes.

3. Non-maximum suppression: Only local maxima should be marked as edges.

4. Double thresholding: Potential edges are determined by thresholding.

5. Edge tracking by hysteresis: Final edges are determined by suppressing all edges

that are not connected to a very certain (strong) edge.

5.3.3 Sobel operator

The Sobel operator is used in image processing, particularly within edge

detection algorithms. Technically, it is a discrete differentiation operator, computing

an approximation of the opposite of the gradient of the image intensity function. At

each point in the image, the result of the Sobel operator is either the corresponding

opposite of the gradient vector or the norm of this vector. The Sobel operator is based

on convolving the image with a small, separable, and integer valued filter in

horizontal and vertical direction and is therefore relatively inexpensive in terms of


http://en.wikipedia.org/wiki/Image_gradient

http://en.wikipedia.org/wiki/Difference_operator


computations. On the other hand, the opposite of the gradient approximation that it

produces is relatively crude, in particular for high frequency variations in the image.

Mathematically, the operator uses two 3×3 kernels which are convolved with the

original image to calculate approximations of the derivatives - one for horizontal

changes, and one for vertical. If we define A as the source image, and Gx and Gy are

two images which at each point contain the horizontal and vertical derivative

approximations, the computations are as follows:

Where * here denotes the 2-dimensional convolution operation.

The x-coordinate is here defined as increasing in the "right"-direction, and the

y-coordinate is defined as increasing in the "down"-direction. At each point in the

image, the resulting Gradient approximations can be combined to give the gradient

magnitude, using

Using this information, we can also calculate the opposite of the gradient's direction:

Fig 5.2(b) shows the application of sobel operator for the original image shown in

Fig.5.2(a)



Fig. 5.2(a): Colour picture of a steam engine

Fig. 5.2(b): sobel operator applied to that image

The Fig 5.2(b) shows the application of sobel operator for the original image

shown in Fig.5.2 (a).



5.4 FORMULATION

The purpose of edge detection in general is to significantly reduce the amount

of data in an image, while preserving the structural properties to be used for further

image processing. The edges are good under illumination changes. The edges are first

computed using horizontal and vertical Sobel operators which are then used to find

gradient and angle of edges. The angles are then used to build a histogram of edge

orientation. For simplicity, we defined only 72 bins for the angles. As in the case of

histograms, we compare histograms of corresponding sections of the two frames. The

edge Histogram difference “ED” between two frames i and j is calculated by taking

the average of the difference measure between each section. The formula for

calculating ED is



CHAPTER 6

RESULTS6.1 FLOW CHART FOR THE EXTRACTION OF KEY FRAMES


Video

Frames from input video

For n=0 n=n+10<n<total number of frames from the video

First frame?

Key frame data basek=0k=k+11<k<n

Current frame

Correlation difference (CD)

Colour histogram difference (HD)

Edge orientation histogram difference (ED)

Threshold

Discard frame

Key frame= current frame

Stop

False

True

Fig 6.1: Flow chart for the extraction of key frames

False

True

Last frame? FalseTrue

Start


6.2 ALGORITHM FOR EXTRACTING KEY FRAMES BASED ON

CORRELATION

The key frame extraction method is composed of the following steps

Step1: All the frames are extracted from the input sports video.

Step2: Consider first frame as a key frame.

Step3: Select the next subsequent frame from the extracted frames and divide frame

into a total of ‘Ts’ sections, each of size mxm (8x8).

Step4: Histogram Creation

Step4.1: Correlation Histogram Creation: The correlation values of each

section are then averaged. The correlation is measured for three color channel values

red, green and blue.

Step4.2: The correlation difference CDp,q,s,c of a color channel ‘c’ between

two corresponding sections ‘s’ of frame p and q is defined as:

Where s =1…T ; c =red, green, blue ;f= mean value of c channel of the frame.

Step4.3: The correlations of all sections of frame i and j are averaged to

Obtain the overall correlation CDi,j,c for a color channel.

Step4.4: Then, the overall correlation difference measure CDi,j between

frames i and j is obtained by averaging the value of each color channel.

Step4.5: CDi,j is compared with the threshold value to detect key frame. The

frames with higher CDi,j as compared to threshold are treated as key frame.



Step5: To detect key frames based on correlation difference measure in entire video

repeat step3 & step4.

6.3 FLOW CHART FOR CORRELATION


Current Frame

Key frame from the database

Division of each frame into Ts sections of size (m*m)

Correlation difference of two corresponding sections of current frame and previous frame (C1, C2 ...Cs) are calculated

Mean of correlation difference values

CD

Fig 6.2: Flow chart for correlation difference


6.4 ALGORITHM FOR EXTRACTING KEY FRAMES BASED ON

COLOUR DIFFERENCE MEASURE




Step3: select the next subsequent frame from the extracted frames and convert RGB

to HSV colour space then divide frame into a total of ‘Ts’ sections, each of size

mxm(8x8).

Step4: Histogram Creation

Step4.1: Colour Histogram Creation: A three dimensional colour histogram is

built by subdividing the HSV colour space into 8:2:4 bins.

Step4.2: The histogram difference HDi,j,s between two corresponding sections

‘s’ of histogram His of frame i and histogram Hjs of frame j is calculated by using the

formula

Step4.3: The histogram difference “HD” between two frames i and j is then

Calculated by taking the average of the difference measure between each section by

the formula

Step4.4: HDi,j is compared with the threshold value to detect key frame. The

frames with lower HDi,j as compared to threshold are treated as key frame.

Step5: To detect key frames based on colour difference measure in entire video

repeat step3 & step4.



6.5 FLOW CHART FOR COLOUR HISTOGRAM


Current Frame


Conversion of RGB to HSV

Colour histogram difference of two corresponding sections of current frame and previous frame (ch1, ch2 ...chs)

Mean of colour difference

values

HD


Fig 6.3: Flow chart of colour histogram difference


6.6 ALGORITHM FOR EXTRACTING KEY FRAMES BASED ON EDGE DIFFERENCE MEASURE




Step3: select the next subsequent frame from the extracted frames and convert RGB

to Gray image then divide frame into a total of ‘Ts’ sections, each of size mxm(8x8).

Step4: Histogram Creation Step4.1: Edge Histogram Creation: The edges are first computed using

horizontal and vertical Sobel operators which are then used to find gradient magnitude

and angle of edges. Gradient’s magnitude is given by

Gradient’s direction is given by

Step4.2: the angles are computed for only those pixels where value of gradient

is above a certain threshold (>3). The angles are then used to build a histogram of

edge orientation. We defined only 82 bins for the angles.

Step4.3: we compare histograms of corresponding sections of the two frames.

The edge histogram difference “ED” between two frames i and j is calculated by

taking the average of the difference measure between each section.

Step4.4: EDi,j is compared with the threshold value to detect key frame. The

frames with higher EDi,j as compared to threshold is treated as key frame.

Step5: To detect key frames based on edge difference measure in entire video repeat

step3 & step4.



6.7 FLOW CHART FOR EDGE ORIENTATION HISTOGRAM


Current Frame


RGB to gray conversion

Correlation difference of two corresponding sections of current frame and previous frame (e1, e2 ...es)

Mean of edge orientation difference values


Evaluate gradients magnitude

for all sections

If gradient magnitude <3

Evaluate gradient direction(ø=arc tan (Gy/Gx))

Eliminate edge False

True

Fig 6.4: Flow chart for edge orientation histogram difference

ED

Calculate gradients ( Gx & Gy )


6.8 COLOUR HISTOGRAM OUTPUT

Fig 6.5 : Reading the frames from the input video

Figure 6.5 indicates the reading of frames from the video as well as the comparisons of frame with the previous frame to find out the key frames.



1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

Fig 6.6: Frames extracted from the (sample) football video



Fig 6.7: colour histogram difference values for the sample (football) video

Figure 6.7 indicates the colour histogram difference values of the current frame and previous frame. Total 19 colour histogram difference values are generated from 20 frames in the football video. The range of colour histogram difference values is -64 to 0.The absolute value of the colour histogram differences are compared with the set of threshold value to extract key frames based on colour histogram. In this the frames with colour histogram difference value greater than the threshold are discarded.



Fig 6.8: output graph of colour histogram

The above Fig 6.8 shows the graph between frames and colour difference value.



Fig 6.9 (a): key frames based on colour histogram for the sample (football) video with the threshold value as 35.

The above figure shows the number of key frames extracted based on colour histogram technique with the threshold value as 35.Total 8 frames are obtained with this threshold value.



Fig 6.9 (b): set of key frames based on colour histogram for the sample (football) video with the threshold value as 35.

With 35 as the threshold value we obtained 8 frames as key frames based on colour histogram.




The above figure shows the number of key frames extracted based on colour histogram technique with the threshold value as 45. Total 12 frames are obtained with this threshold value.



Fig 6.10(b): set of key frames based on colour histogram for the sample(football) video with the threshold value as 45.





The above figure shows the number of key frames extracted based on colour histogram technique with the threshold value as 55. Total 13 frames are obtained with this threshold value.



Fig 6.11(b): set of key frames based on colour histogram for the sample (football) video with the threshold value as 55.




6.9 CORRELATION OUTPUT

Fig 6.12: Correlation difference values for the sample (football) video

Fig 6.12 indicates the correlation difference values of the current frame and previous frame. Total 19 correlation difference values are generated from 20 frames in the football video. The range of correlation difference values is 0 to 1. The absolute value of the correlation differences are compared with the set of threshold value to extract key frames based on correlation. In this the frames with correlation difference value lesser than the threshold are discarded.



Fig 6.13: output graph of correlation

The above figure shows the graph between frames and correlation difference value.



Fig 6.14 (a): key frames based on correlation for the sample (football) video with the threshold value as 0.4.

The above figure shows the number of key frames extracted based on correlation technique with the threshold value as 0.4. Total 4 frames are obtained with this threshold value.



Fig 6.14(b): set of key frames based on correlation for the sample (football) video with the threshold value as 0.4.

With 0.4 as the threshold value we obtained 4 frames as key frames based on correlation.




The above figure shows the number of key frames extracted based on correlation technique with the threshold value as 0.6. Total 2 frames are obtained with this threshold value.



Fig 6.15(b) :set of key frames based on correlation for the sample (football) video with the threshold value as 0.6.

With 0.6 as the threshold value we obtained 2 frames as key frames based on correlation.




The above figure shows the number of key frames extracted based on correlation technique with the threshold value as 0.8. Only one frame is obtained as a key frame.



Fig 6.16(b): set of key frames based on correlation for the sample (football) video with the threshold value as 0.8.

With 0.8 as the threshold value we obtained 1 frame as key frames based on correlation.



6.10 EDGE ORIENTATION HISTOGRAM OUTPUT

Fig 6.17: edge orientation histogram difference values for the sample (football) video

Figure 6.17 indicates the edge orientation histogram difference values of the current frame and previous frame. Total 19 edge orientation histogram difference values are generated from 20 frames in the football video. The range of edge orientation histogram difference values is 0 to 82.the absolute value of the edge orientation histogram differences are compared with the set of threshold value to extract key frames based on edge orientation histogram. In this the frames with edge orientation difference value lesser than the threshold are discarded.



Fig 6.18: output graph of edge orientation histogram

The above figure shows the graph between frames and edge orientation difference value.



Fig 6.19 (a): key frames based on edge orientation histogram for the sample (football) video with the threshold value as 40.

The above figure shows the number of key frames extracted based on edge orientation histogram technique with the threshold value as 40. Total 11 frames are obtained with this threshold value.



Fig 6.19(b): set of key frames based on edge orientation histogram for the sample (football) video with the threshold value as 40.

With 40 as the threshold value we obtained 11 frames as key frames based on edge orientation histogram.



Fig 6.20(b): set of key frames based on edge orientation histogram for the sample (football) video with the threshold value as 50




Fig 6.21(b): set of key frames based on edge orientation histogram for the sample (football) video with the threshold value as 60.




6.11 OUTPUT

For different sport videos the number of key frames for different threshold value based on colour, correlation and edge orientation techniques are shown below.

COLOR HISTOGRAM

Type of video Total no. of frames

Number of key frames for the Threshold value

35 45 55Sample(football) 20 8 12 13

Cricket 455 19 84 231Football 121 1 1 1Hockey 476 1 17 260

Table 6.1: Colour histogram key frames for different frames on different videos.

CORRELATION



0.4 0.6 0.8Sample(football) 20 4 2 1

Cricket 455 192 72 2Football 121 89 27 1Hockey 476 292 101 1Table 6.2: Correlation key frames for different frames on different videos.



EDGE ORIENTATION HISTOGRAM



40 50 60Sample(football) 20 11 6 4

Cricket 455 106 28 3Football 121 22 2 1Hockey 476 49 6 1

Table 6.3: Edge orientation histogram key frames for different frames on different videos.

Colour histogram

correlation Edge orientation histogram

Exactly matched

64 0 0

Partially matched

35 0.6 50

Mismatch 0 1 82

Table 6.4: Frame difference measures

The above table 6.4 gives the behaviour of different frame difference measures.

6.12 PERFORMANCE MEASURES

6.12.1 Accuracy rate:

Accuracy rate is defined as the ratio of number of matched key frames from the automatic summary to the number of key frames from the user summary.

Accuracy rate =



6.12.2 Error rate:

Error rate is defined as the ratio of number of non matched key frames from the automatic summary to the number of key frames from the user summary.

Error rate =

Where Nmas= number of matched key frames from the automatic summary

Nnmas=number of non matched key frames from the automatic summary

Nus= number of key frames from the user summary.

The value of Accuracy Rate varies from 0 to 1, 1 being the best value where

all frames of automated summary matches with all frames of user summaries. The

value of Error Rate ranges from 0 to Nas /Nus where 0 is the best value (Nas is the

number of frames in automatic summary). The quality of a summary is superior if it

has high Accuracy Rate and low Error Rate.

Color histogram correlation Edge orientation histogram

Accuracy rate 0.8 0.7 1.0

Error rate 0.2 0.3 0.0

Table 6.5: Comparison of accuracy and error rates of a sport (football) video

The above table 6.5 clearly shows the accuracy and error rates for a soccer

video. The accuracy rate and error rate of colour histogram are 0.8 and 0.2 with a

threshold of 35. Similarly, for correlation and edge orientation histogram the accuracy

rate and error rate values are 0.7 , 0.3 and 1, 0 for a threshold of 0.2 and 40

respectively. From the above table an error of 0.2 occurs for colour histogram

measure, because in most of the sport videos the camera is mostly concentrated on the

field. In such situation the colour histogram difference is almost similar for the frames

even though there is a change in the frame and an error will occurs in extracting the

key frames. The error in the correlation is due to the pixel wise comparison. So, the

edge orientation histogram feature works well for the sports videos to extract

keyframes.



CHAPTER 7

CONCLUSION AND FUTURE SCOPE

7.1 CONCLUSION

Our proposed system is able to extract the key frames from most of the

sports videos. The methods used are computationally simple and dynamically

determines the number of key frames. Experiments on other type of videos such as

cartoons, documentaries etc., have shown that the method is adaptive to the video

content. The experimental results shows that the frame difference features using edge

orientation histogram has high accuracy rate and low error rate.

7.2 FUTURE SCOPE

In our project we had extracted key frames by using multiple frame

difference features individually. But in general one frame difference feature alone is

not enough to capture all the visual contents of the image. For instance, color

histograms have been a very popular feature for image representation and

computation of key frames. However, key frame methods that use color histograms as

FDM, tends to fail in scenes with illumination changes. For instance, in a video of a

soccer game, where the camera is mostly focused on the field, edge orientation is an

appropriate feature to capture the camera motion.

This means that for a particular genre of videos, different visual features must

be combined with varying weights, giving more weight to the visual feature (or FDM)

which provides more detail about the visual content of the video. Therefore certain

low level features can be combined to get an effective representation of a frame.



REFERENCES

[1] Automatic Video Classification: A Survey of the Literature Darin Brezeale and

Diane J. Cook, Senior Member, IEEE, 2007.

[2] Ciocca G, Schettini R (2006).Innovative Algorithm for Key Frame

Extraction in Video Summarization. J. Real Time Image Process, 1(1): 69-88.

[3] “Classification of sports videos using edge based features and auto associative

neural models”, C.Krishna Mohan, B. Yegnanarayana in Signal, image and video

processing.

[4] Combined Key-frame Extraction and Object-based Video Segmentation, Lijie

Liu, Student Member, IEEE, and Guoliang Fan, Member, IEEE.

[5] Gunsel B, Tekalp AM (1998). Content-based video abstraction. Proceedings of

IEEE International Conference of Image Processing, Chicago, USA, 1998, pp. 128–

132.

[6] International Journal of Computer and Electrical Engineering, Vol. 2, No. 2, April,

2010 1793-8163Integrating Pixel Cluster Indexing, Histogram Intersection And

Discrete Wavelet Transform Methods For Colour Images Content Based Image

Retrieval System.

[7] Jianxinwu and James m.Rehg:”Beyond the Euclidean distance: creating effective

visual code books using the histogram intersection kernel.

[8] J Sklansky, “Image Segmentation and Feature Extraction,” IEEE Trans on

Systems, Man and Cybernetics, vol8, pp237-247, 1978.

[9] Jiang RM, Sadka AH, Crooks D (2009). Advances in Video Summarization and

Skimming. In: Grgic M et al. (eds.) Recent Advances in Multimedia Signal

Processing and Communications, Springer, Berlin, 231: 27-50.

[10] Li Y, Zhang T, Tretter D (2001). An overview of video abstraction techniques.

Tech. Rep., HP-2001-191, HP Laboratory.



[11] Lin Mei and Gred “Kernel biased discriminate analysis using histogram

intersection kernel for content based image”.

[12] Money AG, Agius H (2008). Video summarisation: A conceptual framework and

survey of the state of the art. J. Visual Commun. Image Represent. 19(2): 121-143.

[13] Mundur P, Rao Y, Yesha Y (2006). Key frame-based video summarization using

Delaunay clustering. Int. J. Digital Lib., 6(2): 219-232

[14] N.Dalal and B.triggs “Histogram of oriented gradients for human

detection”InCVPR, volume 1 page 886-893, 2005.

[15]. “Pearson's Correlation Coefficient for Discarding Redundant Information in

Real Time Autonomous Navigation System”, A. Miranda Neto, Member, IEEE, L.

Rittner, Member, IEEE, N. Leite, D. E. Zampieri, R. Lotufo and A. Mendeleck.

[16] Tianming L, Zhang HJ, Qi FH (2003). A novel video key-frame extraction

algorithm based on perceived motion energy model.

[17]. Y.K. Eugene and R.G. Johnston, “The Ineffectiveness of the Correlation

Coefficient for Image Comparisons”, Technical Report LA-UR-96-2474, Los

Alamos, 1996.l. 13(10): 1006-1013.

[18]. Zhang HJ,Wu J, Zhong D, Smoliar SW (1997). An integrated system for

content-based video retrieval and browsing. Pattern Recognit., 30(4): 643–658.


final doc (1)

Documents

population standard deviation

vertical axis represents

popular web browsers

windows media movies

low level features

frame difference measures

high level features

extracted key frame