i implementation of cross search (cs) algorithm...
TRANSCRIPT
i
IMPLEMENTATION OF CROSS SEARCH (CS) ALGORITHM FOR MOTION
ESTIMATION USING MATLAB
RAUDZATUL ADAWIAH BINTI YUNOS
This report is submitted in partial fulfillment of the requirements for the award of
Bachelor of Electronic Engineering (Telecommunication Electronics) With Honours
Faculty of Electronic and Computer Engineering
Universiti Teknikal Malaysia Melaka
ii
DECLARATION
I hereby, declared this thesis entitled “Implementation of Cross Search (CS) Algorithm
for Motion Estimation using MATLAB” is the results of my own research
except as cited in references.
Signature : ………………………………………….
Author‟s Name : RAUDZATUL ADAWIAH BINTI YUNOS
Date : 30 APRIL 2009
iv
ACKNOWLEDGEMENTS
Alhamdulillah, praise to God, with the deepest sense of gratitude of the Almighty
ALLAH who gives strength and ability to complete this project and thesis. First of all, I
would like to thank my family who has constantly been supportive throughout my
studies. I would like to express my sincere appreciation to my project supervisor, Mr.
Redzuan bin Abd. Manap for his support, advice and guidance in completing this
project. Finally, I would like to thank all my friends who have given me a lot of
guidance and cooperation to complete this project.
v
ABSTRACT
This thesis presents the study of techniques to achieve high compression ratio in
video coding. One of these techniques known as Block Matching Algorithm (BMA) for
Motion Estimation has been widely adopted in various coding standards. This technique
is implemented conventionally by exhaustively testing all the candidate blocks within
the search window. This type of implementation, called Full Search (FS) Algorithm,
gives the optimum solution. However, substantial amount of computational workload is
required in this algorithm. To overcome this drawback, many fast BMAs have been
proposed and developed. Different search patterns and strategies are exploited in these
algorithms in order to find the optimum motion vector with minimal number of required
search points. The objective of this project is to study one of these fast BMA‟s which is
called Cross Search (CS) Algorithm. The working concept of CS is taking less time than
the FS. It is because; the search window will only search some area in the frame around
the reference points due to the algorithm itself. To make it works, the algorithm is
implemented in MATLAB and then its performance is compared against FS algorithm
as well as to other fast BMA‟s in terms of the average peak signal-to-noise ratio (PSNR)
produced, number of search points required, computational complexity and elapse
processing time.
vi
ABSTRAK
Projek ini adalah merupakan kajian mengenai salah satu teknik pengkodan video
ataupun dikenali sebagai Algoritma Padanan Blok (Block Matching Algorithm). Dalam
kajian ini, tumpuan diberikan kepada beberapa aspek utama iaitu kajian mengenai BMA
secara amnya, teknik terawal yang digunakan dalam BMA adalah Full Search Algorithm
(FS). Algoritma di dalam teknik padanan FS merangkumi pencarian bagi setiap
koordinat di dalam setiap tetingkap bagi sesuatu video. Proses ini akan mengambil masa
yang lama untuk mendapatkan hasil sebelum video dapat dipadankan kerana proses
pengiraan ralat yang banyak perlu dilakukan. Objektif kajian ini adalah untuk mengkaji
potensi kaedah Cross Search Algorithm untuk menggantikan teknik FS sebelum ini.
Segala proses bagaimana algoritma CS bekerja telah dikaji. Tidak seperti teknik FS,
kaedah CS hanya melibatkan beberapa kawasan carian sahaja untuk mengenalpasti
kedudukan ralat di dalam setiap tetingkap. Kaedah carian CS telah diaplikasi
mengunakan perisian MATLAB dan prestasi alroritma ini dibandingkan dengan FS serta
algoritma-algoritma lain yang disenaraikan di dalam laporan ini dari segi nisbah puncak
isyarat terhadap hingar (PSNR), purata titik carian, kerumitan pengiraan dan masa
pemprosesan algoritma.
vii
CONTENTS
CHAPTER TITLE PAGES
PROJECT TITLE i
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENT iv
ABSTRACT v
ABSTRAK
vi
CONTENTS vii
LIST OF TABLE ix
LIST OF FIGURE x
LIST OF ACRONYMS xii
1 INTRODUCTION
1.1 Project Background 1
1.2 Objective Project 2
1.3 Problem Statement 2
1.4 Scope of Project 3
2 LITERATURE REVIEW
2.1 Video Compression and Coding Technique
Technique
4
viii
2.1.1 Introduction on Video Compression
4
2.1.2 Coding Technique 5
2.1.3 Video 6
2. 2 Motion Estimation 6
2.2.1 Identifies the True Motion 7
2.2.2 Removing Temporal Redundancy 8
2.3 Block Matching Algorithm 9
2.4 Searching Method
10
2.4.1 Full Search Algorithm 11
2.4.2 New Three Steps Search (NTSS) 12
2.4.3 Diamond Search (DS) 13
2.4.4 Cross Diamond Search (CDS) 16
2.4.5 Four Step Search (FSS) 19
3 METHODOLOGY
3.1 Project Planning 22
3.1.2 Data Acquisition on Literature Review
23
3.1.3 Development and Implementation 23
3.1.4 Performance Analysis
24
3.1.5 Presentation and Seminar
Matlab
24
3.1.6 Thesis Writing Submission 25
3.2 Project Flow Chart 25
4 CROSS SEARCH ALGORITHM (CS)
4.1 Introduction to Cross Search Algorithm
26
4.2 CS Steps and Method of Search
27
4.4 CS Flowchart
31
H
ix
5 RESULTS AND DISCUSSIONS
5.1 Performance of CS for single frame sequence
32
5.1.1 Akiyo sequence for frame no. 1 to no. 2.
33
5.1.2 Claire sequence for frame no. 1 to no. 2.
33
5.1.3 Coastguard sequence for frame no. 1 to no. 2.
34
5.1.3 Foreman sequence for frame no. 1 to no. 2. 34
5.1.5 News sequence for frame no. 1 to no. 2.
35
5.1.6 Salesman sequence for frame no. 1 to no. 2.
35
5.1.7 Tennis sequence for frame no. 1 to no. 2.
36
5.2 Average Search Points and PSNR for 1 frame sequence.
37
5.3 Comparison of CS Against all Algorithms
38
5.3.1 Average Search Points for all Algorithms
39
5.3.2 Average PSNR for all Algorithms
44
5.3.3 Elapse Time for all Algorithms 49
5.3.4 Search Points Speed
49
6 CONCLUSION 51
7 REFERENCES 52
8 APPENDICES 53
x
LIST OF TABLES
NO TITLE PAGE
5.1 Average PSNR and Search Points of CSA for 1 Frame 38
5.2 Average Search Points for 1st to 30
th frame 43
5.3 Average PSNR for all Algorithms 47
5.4 Elapse Time for 1-30 frames simulation (s) 49
5.5 Search points Speed 50
xi
LIST OF FIGURES
NO TITLE PAGES
2.1 Video Coding Layer 5
2.2 Predictive sources coding with motion compensation 8
2.3 Macro Block 9
2.4 NTSS Flowchart 13
2.5 Steps of DS 15
2.6 DS Flowchart 15
2.7 CDS steps 17
2.8 CDS Flowchart 18
2.9 Search patterns of the 4SS. 20
2.10 Two different search paths of 4SS. 20
2.11 4SS Flowchart 21
3.1 Flow of the Project 25
4.1 Illustration 1 for CSA steps 29
4.2 Illustration 2 for CSA steps 29
4.3 Illustrations 3 for CSA 30
4.4 CSA Flowchart 31
5.1
(a) Original Image 33
(b) Predicted Image 33
5.2
(a) Original Image 33
xii
(b) Predicted Image 33
5.3
(a) Original Image 34
(b) Predicted Image 34
5.4
(a) Original Image 34
(b) Predicted Image 34
5.5
(a) Original Image 35
(b) Predicted Image 35
5.6
(a) Original Image 35
(b) Predicted Image 35
5.7
(a) Original Image 36
(b) Predicted Image 36
5.8 Average Search Points for Akiyo (1-30) 39
5.9 Average Search Points for Claire (1-30) 39
5.10 Average Search Points for Coastguard (1-30) 40
5.11 Average Search Points for Foreman (1-30) 40
5.12 Average Search Points for News (1-30) 41
5.13 Average Search Points for Salesman (1-30) 41
5.14 Average Search Points for Tennis (1-30) 42
5.15 Average PSNR (dB) for Akiyo sequence 44
5.16 Average PSNR (dB) for Claire sequence 45
5.17 Average PSNR (dB) for Coastguard sequence 45
5.18 Average PSNR (dB) for Foreman sequence 46
5.19 Average PSNR (dB) for Salesman sequence 46
5.20 Average PSNR (dB) for Tennis sequence 47
xiii
LIST OF ACRONYMS
AVI - Audio Video Interleave
WMV - Windows Media Format
MPEG - Moving Pictures Expert Group
BDM – Block Distortion Measure
BMA – Block Matching Algorithm
CCB – Cross Centre Biased
CCITT – International Telegraph & Telephone Consultative Committee
CDS – Cross Diamond Search
CS – Cross Search
DCT – Discrete Cosine Transform
DS – Diamond Search
FS – Full Search
FSS – Four Step Search
GOP – Group Of Picture
IDCT – Inverse Discrete Cosine Transform
JPEG – Joint Photographic Experts Group
LDSP – Large Diamond Search Pattern
LSI – Large Scale Integration
MAC – Media Access Control
MAD – Mean Absolute Difference
xiv
MAE – Mean Absolute Error
MBD – Minimum Block Distortion
ME – Motion Estimation
MPEG – Moving Picture Expert Group
MSE – Mean Square Error
MV- Motion Vector
NTSS – New Three Step Search
PC – Personal Computer
PSNR – Peak Signal To Noise Ratio
SDSP – Small Diamond Search Pattern
VLC – Video LAN Client
LTMCP - Long-Term Memory Motion Compensated Prediction
1
CHAPTER 1
INTRODUCTION
1.1 Background
Motion Estimation (ME) is an important part of any video compression system,
since it can achieve significant compression by exploiting the time-taken redundancy
existing in a video sequence. Unfortunately it is also the most computationally intensive
function of the entire encoding process. In fast search algorithms, the ME process
follows special pattern that checks less point number, such as diamond pattern and cross
pattern. Smaller motion compensation block sizes can produce better ME results.
However, a smaller block size leads to increased complexity (more search operations
must be carried out) and increases in the number of MV that need to be transmitted.
Sending each MV requires bits to be sent and the extra overhead for vectors may
overbalance the benefit of reduced residual energy. An effective compromise is to adapt
the block size to the picture characteristics, for example choosing a large block size in
flat, homogeneous regions of a frame and choosing a small block size around areas of
high detail and complex motion.
2
1.2 Objectives
The main aim of this project is to implement the Cross Search (CS) Algorithm
that can overcome the problem faced when using the Full Search (FS) Algorithm in
achieving high compression ratio in video coding. To achieve this main aim, the
objectives of this project are as follow:
1. To study how the Block Matching Algorithm (BMA), FS Algorithm and Cross
Search Algorithm works as they been implemented into MATLAB.
2. To understand and observe the difference between the FS and CS on their way of
process, time-taken and the quality of output produced in various types of video.
3. To know and understand the basic functions of the others fast BMAs with CS
and compare their performances with CS in difference aspects.
4. To conclude and justify the best algorithm developed due to some aspects of
assessments.
1.3 Problem Statement
A substantial amount of computational workload is required during the execution
of Full Search algorithm; however this drawback can be overcame by many types of fast
BMA‟s which have been proposed and developed. Different search patterns and
strategies are exploited in these fast BMA algorithms in order to find the optimum MV
with minimal number of required search point.
3
1.4 Scope
This project will focus on three main areas which are literature review on video
coding, BMAs and CS, the development and implementation of CS algorithm using
MATLAB platform and the performance analysis of CS to FS algorithm and CS to other
BMAs‟. To undergo all of these scopes, there are some sorts of stuff that need to be
considered. The literature review on video coding, BMAs and CS will be discussed
further in Chapter 2. Chapter 3 will be discussing the methodology of the project
including the development and implementation of CS algorithm using MATLAB. All
the performance analysis and result of the implementation will be discussed in the
results and discussion of Chapter 4. Finally, the conclusion and justification of the
project will be stated in Chapter 5.
4
CHAPTER 2
LITERATURE REVIEW
In this chapter, the background study of the project will be evaluated. The
important features in this project such as video and the algorithm details are going to be
described further.
2.1 Video Compression and Coding Technique
In this subchapter, the needs of video compression, the coding technique and
some explanation about selected video also will be included.
2.1.1 Introduction on Video Compression
A video is produced by two elements which are image and video data itself. To
compress a video is exactly to compress these two elements. Image and video data
compression are a process in which the amount of data used to represent image and
video, is reduced to meet a bit rate requirement, below or at most equal to the maximum
available bit rate. Although the data are reduced, the quality of the complexity of
computation involved is affordable for the application.
5
Image and video data compression has been found to be necessary in several
important applications such as visual transmission and storage. This is because, the
huge amount of data involved in these and other applications, usually very much
exceeds the capability of existing hardware although the technologies in related
industries are growing up.
Data representing information carried and the quantity of data exactly can be
measured. In the context of digital image and video, data are usually measured by the
number of binary units or bits. The bit rate which also known as the coding rate, is an
important parameter in image and video compression and is frequently expressed in a
unit of bits per pixel (bpp). The term pixel is an abbreviation for picture element as is
sometimes referred to as pel. In information source coding, the bit rate is sometimes
expressed in a unit of bits per symbol.
2.1.2 Coding Technique
The video coding layer consists of a hybrid of temporal and spatial prediction, in
conjunction with transform coding. Figure 2.1 shows a block diagram of the video
coding layer for a macroblock. In summary, the picture is split into blocks. The first
picture of a sequence or a random access point is typically “Intra” coded, i.e., without
using information other than that contained in the picture itself.
6
Figure 2.1 Video Coding Layer [1]
2.1.3 Video
There are many formats of video that have developed. Some common types
been uses are as follows:
i. Audio Video Interleave (AVI) format. Videos stored in the AVI format
havethe extension .avi.
ii. Windows Media Format (WMV). Videos stored in the WMV format have
the extension .wmv.
iii. Moving Pictures Expert Group (MPEG). Videos stored in the MPEG format
have the extension .mpg or mpeg.
iv. Quick Time format. Videos stored in this format have the extension
.mov.
v. RealVideo format. Videos stored in this format have the extension .rm or
ram.
7
For this project, the videos that have been chosen for implementation are in AVI format.
The standard Common Intermediate Format (CIF) video sequences used in this kind of
project are Akiyo.avi, Claire.avi, Coastguard.avi, Foreman.avi, Salesman.avi and
Tennis.avi. All these videos have been used as the standard reference video in ME
research.
2.2 Motion Estimation
ME is a process to estimate the pels or pixels of the current frame from reference
frame(s). The temporal prediction technique used in video is based on ME. The basic
premise of ME is that in most cases, consecutive video frames will be similar except for
changes induced by objects moving within the frames.
These techniques is using the block matching technique which exploit different
search patterns and search strategies for finding the optimum MV for particular motion
estimation which reduced the number of search points. It efficiently removes the
temporal redundancy between successive frames by BMA.
Block-based ME is the most practical approach to obtain motion compensated
prediction frames. It divides frames into equally sized rectangular blocks and finds out
the displacement of the best-matched block from previous frame as the MV to the block
in the current frame within a search window.
The benefits of long-term memory motion compensated prediction (LTMCP) [2]
have been emphasized in recent years. Consequently, these tools have been adopted by
several recent standards like H.263+ and H.264iMPEG-4 AVC [3]. As continuously
dropping the costs of semiconductors, notably higher prediction gain can be achieved by
estimating more reference frames in the memory buffer. Nevertheless, an obvious
drawback is the complexity will increase proportionally. Extra data are also needed to
describe the reference indices.
8
In the early 1980s, some conventional fast algorithms were proposed, such as the
Three Step Search (TSS), the 2D logarithmic search, etc.[4] Among the algorithms, TSS
becomes the most popular one for low bit-rate video application, owing to its simplicity
and effectiveness. However, TSS uses a uniformly allocated search pattern in its first
step, which is not very efficient to catch small motion appearing in stationary or quasi-
stationary blocks.
To remedy this problem, several adaptive techniques have been suggested to
make the search more adaptable to motion scale and uncertainty. The uncertainty is
estimated by the difference of block distortion measure among the checked points. A
smaller difference indicates a large uncertainty and hence the search scope will be
increased in the next step.
2.2.1 Identifies the True Motion
The first type of ME algorithms targets to accurately track the true motion of
objects/features in video sequences. Video sequences are generated by projecting a 3D
real world onto a series of 2D images. When objects in the 3D real world move, the
brightness or pixel intensity of the 2D images change correspondingly. The 2D motion
projected from the movement of a point in the 3D real world is referred to as the “true
motion” [5]. One of the many potential applications of true motion is in computer
vision, the goal of which is to identify the unknown environment via the moving camera.
2.2.2 Removing Temporal Redundancy
The other type of ME algorithm target is to remove temporal redundancy in
video compression. A natural way to exploit redundancy between frames is for current
9
frame t determines predicted frame t from the frame (t-Δt) or from the frame (t+Δt).
Motion estimation and compensation are used to predict frame t to be coded between
successive frames. Motion compensation works by estimating motion between two
image frames. The motion is described by motion field of motion vectors. Consequently,
the prediction error is transmitted instead of the frame itself as shown in Figure 2.2.
Along with the prediction error, the motion information is also transmitted to the
decoder, for it to be able to estimate the motion. The very good proportion between
motion overhead and prediction error has block-based motion representation. It uses one
MV per one macroblock.
Figure 2.2 Predictive sources coding with motion compensation [6]
In this project, the kind of ME preferred is the second type which is removing
temporal redundancy. For the MV detection and calculation, BMA technique will be
implemented. The criteria of BMA also will be described in this chapter.