i implementation of cross search (cs) algorithm...

24

Upload: hoangnhu

Post on 30-Apr-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

i

IMPLEMENTATION OF CROSS SEARCH (CS) ALGORITHM FOR MOTION

ESTIMATION USING MATLAB

RAUDZATUL ADAWIAH BINTI YUNOS

This report is submitted in partial fulfillment of the requirements for the award of

Bachelor of Electronic Engineering (Telecommunication Electronics) With Honours

Faculty of Electronic and Computer Engineering

Universiti Teknikal Malaysia Melaka

ii

DECLARATION

I hereby, declared this thesis entitled “Implementation of Cross Search (CS) Algorithm

for Motion Estimation using MATLAB” is the results of my own research

except as cited in references.

Signature : ………………………………………….

Author‟s Name : RAUDZATUL ADAWIAH BINTI YUNOS

Date : 30 APRIL 2009

iii

To my beloved mother and the loving memory of my father

iv

ACKNOWLEDGEMENTS

Alhamdulillah, praise to God, with the deepest sense of gratitude of the Almighty

ALLAH who gives strength and ability to complete this project and thesis. First of all, I

would like to thank my family who has constantly been supportive throughout my

studies. I would like to express my sincere appreciation to my project supervisor, Mr.

Redzuan bin Abd. Manap for his support, advice and guidance in completing this

project. Finally, I would like to thank all my friends who have given me a lot of

guidance and cooperation to complete this project.

v

ABSTRACT

This thesis presents the study of techniques to achieve high compression ratio in

video coding. One of these techniques known as Block Matching Algorithm (BMA) for

Motion Estimation has been widely adopted in various coding standards. This technique

is implemented conventionally by exhaustively testing all the candidate blocks within

the search window. This type of implementation, called Full Search (FS) Algorithm,

gives the optimum solution. However, substantial amount of computational workload is

required in this algorithm. To overcome this drawback, many fast BMAs have been

proposed and developed. Different search patterns and strategies are exploited in these

algorithms in order to find the optimum motion vector with minimal number of required

search points. The objective of this project is to study one of these fast BMA‟s which is

called Cross Search (CS) Algorithm. The working concept of CS is taking less time than

the FS. It is because; the search window will only search some area in the frame around

the reference points due to the algorithm itself. To make it works, the algorithm is

implemented in MATLAB and then its performance is compared against FS algorithm

as well as to other fast BMA‟s in terms of the average peak signal-to-noise ratio (PSNR)

produced, number of search points required, computational complexity and elapse

processing time.

vi

ABSTRAK

Projek ini adalah merupakan kajian mengenai salah satu teknik pengkodan video

ataupun dikenali sebagai Algoritma Padanan Blok (Block Matching Algorithm). Dalam

kajian ini, tumpuan diberikan kepada beberapa aspek utama iaitu kajian mengenai BMA

secara amnya, teknik terawal yang digunakan dalam BMA adalah Full Search Algorithm

(FS). Algoritma di dalam teknik padanan FS merangkumi pencarian bagi setiap

koordinat di dalam setiap tetingkap bagi sesuatu video. Proses ini akan mengambil masa

yang lama untuk mendapatkan hasil sebelum video dapat dipadankan kerana proses

pengiraan ralat yang banyak perlu dilakukan. Objektif kajian ini adalah untuk mengkaji

potensi kaedah Cross Search Algorithm untuk menggantikan teknik FS sebelum ini.

Segala proses bagaimana algoritma CS bekerja telah dikaji. Tidak seperti teknik FS,

kaedah CS hanya melibatkan beberapa kawasan carian sahaja untuk mengenalpasti

kedudukan ralat di dalam setiap tetingkap. Kaedah carian CS telah diaplikasi

mengunakan perisian MATLAB dan prestasi alroritma ini dibandingkan dengan FS serta

algoritma-algoritma lain yang disenaraikan di dalam laporan ini dari segi nisbah puncak

isyarat terhadap hingar (PSNR), purata titik carian, kerumitan pengiraan dan masa

pemprosesan algoritma.

vii

CONTENTS

CHAPTER TITLE PAGES

PROJECT TITLE i

DECLARATION ii

DEDICATION iii

ACKNOWLEDGEMENT iv

ABSTRACT v

ABSTRAK

vi

CONTENTS vii

LIST OF TABLE ix

LIST OF FIGURE x

LIST OF ACRONYMS xii

1 INTRODUCTION

1.1 Project Background 1

1.2 Objective Project 2

1.3 Problem Statement 2

1.4 Scope of Project 3

2 LITERATURE REVIEW

2.1 Video Compression and Coding Technique

Technique

4

viii

2.1.1 Introduction on Video Compression

4

2.1.2 Coding Technique 5

2.1.3 Video 6

2. 2 Motion Estimation 6

2.2.1 Identifies the True Motion 7

2.2.2 Removing Temporal Redundancy 8

2.3 Block Matching Algorithm 9

2.4 Searching Method

10

2.4.1 Full Search Algorithm 11

2.4.2 New Three Steps Search (NTSS) 12

2.4.3 Diamond Search (DS) 13

2.4.4 Cross Diamond Search (CDS) 16

2.4.5 Four Step Search (FSS) 19

3 METHODOLOGY

3.1 Project Planning 22

3.1.2 Data Acquisition on Literature Review

23

3.1.3 Development and Implementation 23

3.1.4 Performance Analysis

24

3.1.5 Presentation and Seminar

Matlab

24

3.1.6 Thesis Writing Submission 25

3.2 Project Flow Chart 25

4 CROSS SEARCH ALGORITHM (CS)

4.1 Introduction to Cross Search Algorithm

26

4.2 CS Steps and Method of Search

27

4.4 CS Flowchart

31

H

ix

5 RESULTS AND DISCUSSIONS

5.1 Performance of CS for single frame sequence

32

5.1.1 Akiyo sequence for frame no. 1 to no. 2.

33

5.1.2 Claire sequence for frame no. 1 to no. 2.

33

5.1.3 Coastguard sequence for frame no. 1 to no. 2.

34

5.1.3 Foreman sequence for frame no. 1 to no. 2. 34

5.1.5 News sequence for frame no. 1 to no. 2.

35

5.1.6 Salesman sequence for frame no. 1 to no. 2.

35

5.1.7 Tennis sequence for frame no. 1 to no. 2.

36

5.2 Average Search Points and PSNR for 1 frame sequence.

37

5.3 Comparison of CS Against all Algorithms

38

5.3.1 Average Search Points for all Algorithms

39

5.3.2 Average PSNR for all Algorithms

44

5.3.3 Elapse Time for all Algorithms 49

5.3.4 Search Points Speed

49

6 CONCLUSION 51

7 REFERENCES 52

8 APPENDICES 53

x

LIST OF TABLES

NO TITLE PAGE

5.1 Average PSNR and Search Points of CSA for 1 Frame 38

5.2 Average Search Points for 1st to 30

th frame 43

5.3 Average PSNR for all Algorithms 47

5.4 Elapse Time for 1-30 frames simulation (s) 49

5.5 Search points Speed 50

xi

LIST OF FIGURES

NO TITLE PAGES

2.1 Video Coding Layer 5

2.2 Predictive sources coding with motion compensation 8

2.3 Macro Block 9

2.4 NTSS Flowchart 13

2.5 Steps of DS 15

2.6 DS Flowchart 15

2.7 CDS steps 17

2.8 CDS Flowchart 18

2.9 Search patterns of the 4SS. 20

2.10 Two different search paths of 4SS. 20

2.11 4SS Flowchart 21

3.1 Flow of the Project 25

4.1 Illustration 1 for CSA steps 29

4.2 Illustration 2 for CSA steps 29

4.3 Illustrations 3 for CSA 30

4.4 CSA Flowchart 31

5.1

(a) Original Image 33

(b) Predicted Image 33

5.2

(a) Original Image 33

xii

(b) Predicted Image 33

5.3

(a) Original Image 34

(b) Predicted Image 34

5.4

(a) Original Image 34

(b) Predicted Image 34

5.5

(a) Original Image 35

(b) Predicted Image 35

5.6

(a) Original Image 35

(b) Predicted Image 35

5.7

(a) Original Image 36

(b) Predicted Image 36

5.8 Average Search Points for Akiyo (1-30) 39

5.9 Average Search Points for Claire (1-30) 39

5.10 Average Search Points for Coastguard (1-30) 40

5.11 Average Search Points for Foreman (1-30) 40

5.12 Average Search Points for News (1-30) 41

5.13 Average Search Points for Salesman (1-30) 41

5.14 Average Search Points for Tennis (1-30) 42

5.15 Average PSNR (dB) for Akiyo sequence 44

5.16 Average PSNR (dB) for Claire sequence 45

5.17 Average PSNR (dB) for Coastguard sequence 45

5.18 Average PSNR (dB) for Foreman sequence 46

5.19 Average PSNR (dB) for Salesman sequence 46

5.20 Average PSNR (dB) for Tennis sequence 47

xiii

LIST OF ACRONYMS

AVI - Audio Video Interleave

WMV - Windows Media Format

MPEG - Moving Pictures Expert Group

BDM – Block Distortion Measure

BMA – Block Matching Algorithm

CCB – Cross Centre Biased

CCITT – International Telegraph & Telephone Consultative Committee

CDS – Cross Diamond Search

CS – Cross Search

DCT – Discrete Cosine Transform

DS – Diamond Search

FS – Full Search

FSS – Four Step Search

GOP – Group Of Picture

IDCT – Inverse Discrete Cosine Transform

JPEG – Joint Photographic Experts Group

LDSP – Large Diamond Search Pattern

LSI – Large Scale Integration

MAC – Media Access Control

MAD – Mean Absolute Difference

xiv

MAE – Mean Absolute Error

MBD – Minimum Block Distortion

ME – Motion Estimation

MPEG – Moving Picture Expert Group

MSE – Mean Square Error

MV- Motion Vector

NTSS – New Three Step Search

PC – Personal Computer

PSNR – Peak Signal To Noise Ratio

SDSP – Small Diamond Search Pattern

VLC – Video LAN Client

LTMCP - Long-Term Memory Motion Compensated Prediction

1

CHAPTER 1

INTRODUCTION

1.1 Background

Motion Estimation (ME) is an important part of any video compression system,

since it can achieve significant compression by exploiting the time-taken redundancy

existing in a video sequence. Unfortunately it is also the most computationally intensive

function of the entire encoding process. In fast search algorithms, the ME process

follows special pattern that checks less point number, such as diamond pattern and cross

pattern. Smaller motion compensation block sizes can produce better ME results.

However, a smaller block size leads to increased complexity (more search operations

must be carried out) and increases in the number of MV that need to be transmitted.

Sending each MV requires bits to be sent and the extra overhead for vectors may

overbalance the benefit of reduced residual energy. An effective compromise is to adapt

the block size to the picture characteristics, for example choosing a large block size in

flat, homogeneous regions of a frame and choosing a small block size around areas of

high detail and complex motion.

2

1.2 Objectives

The main aim of this project is to implement the Cross Search (CS) Algorithm

that can overcome the problem faced when using the Full Search (FS) Algorithm in

achieving high compression ratio in video coding. To achieve this main aim, the

objectives of this project are as follow:

1. To study how the Block Matching Algorithm (BMA), FS Algorithm and Cross

Search Algorithm works as they been implemented into MATLAB.

2. To understand and observe the difference between the FS and CS on their way of

process, time-taken and the quality of output produced in various types of video.

3. To know and understand the basic functions of the others fast BMAs with CS

and compare their performances with CS in difference aspects.

4. To conclude and justify the best algorithm developed due to some aspects of

assessments.

1.3 Problem Statement

A substantial amount of computational workload is required during the execution

of Full Search algorithm; however this drawback can be overcame by many types of fast

BMA‟s which have been proposed and developed. Different search patterns and

strategies are exploited in these fast BMA algorithms in order to find the optimum MV

with minimal number of required search point.

3

1.4 Scope

This project will focus on three main areas which are literature review on video

coding, BMAs and CS, the development and implementation of CS algorithm using

MATLAB platform and the performance analysis of CS to FS algorithm and CS to other

BMAs‟. To undergo all of these scopes, there are some sorts of stuff that need to be

considered. The literature review on video coding, BMAs and CS will be discussed

further in Chapter 2. Chapter 3 will be discussing the methodology of the project

including the development and implementation of CS algorithm using MATLAB. All

the performance analysis and result of the implementation will be discussed in the

results and discussion of Chapter 4. Finally, the conclusion and justification of the

project will be stated in Chapter 5.

4

CHAPTER 2

LITERATURE REVIEW

In this chapter, the background study of the project will be evaluated. The

important features in this project such as video and the algorithm details are going to be

described further.

2.1 Video Compression and Coding Technique

In this subchapter, the needs of video compression, the coding technique and

some explanation about selected video also will be included.

2.1.1 Introduction on Video Compression

A video is produced by two elements which are image and video data itself. To

compress a video is exactly to compress these two elements. Image and video data

compression are a process in which the amount of data used to represent image and

video, is reduced to meet a bit rate requirement, below or at most equal to the maximum

available bit rate. Although the data are reduced, the quality of the complexity of

computation involved is affordable for the application.

5

Image and video data compression has been found to be necessary in several

important applications such as visual transmission and storage. This is because, the

huge amount of data involved in these and other applications, usually very much

exceeds the capability of existing hardware although the technologies in related

industries are growing up.

Data representing information carried and the quantity of data exactly can be

measured. In the context of digital image and video, data are usually measured by the

number of binary units or bits. The bit rate which also known as the coding rate, is an

important parameter in image and video compression and is frequently expressed in a

unit of bits per pixel (bpp). The term pixel is an abbreviation for picture element as is

sometimes referred to as pel. In information source coding, the bit rate is sometimes

expressed in a unit of bits per symbol.

2.1.2 Coding Technique

The video coding layer consists of a hybrid of temporal and spatial prediction, in

conjunction with transform coding. Figure 2.1 shows a block diagram of the video

coding layer for a macroblock. In summary, the picture is split into blocks. The first

picture of a sequence or a random access point is typically “Intra” coded, i.e., without

using information other than that contained in the picture itself.

6

Figure 2.1 Video Coding Layer [1]

2.1.3 Video

There are many formats of video that have developed. Some common types

been uses are as follows:

i. Audio Video Interleave (AVI) format. Videos stored in the AVI format

havethe extension .avi.

ii. Windows Media Format (WMV). Videos stored in the WMV format have

the extension .wmv.

iii. Moving Pictures Expert Group (MPEG). Videos stored in the MPEG format

have the extension .mpg or mpeg.

iv. Quick Time format. Videos stored in this format have the extension

.mov.

v. RealVideo format. Videos stored in this format have the extension .rm or

ram.

7

For this project, the videos that have been chosen for implementation are in AVI format.

The standard Common Intermediate Format (CIF) video sequences used in this kind of

project are Akiyo.avi, Claire.avi, Coastguard.avi, Foreman.avi, Salesman.avi and

Tennis.avi. All these videos have been used as the standard reference video in ME

research.

2.2 Motion Estimation

ME is a process to estimate the pels or pixels of the current frame from reference

frame(s). The temporal prediction technique used in video is based on ME. The basic

premise of ME is that in most cases, consecutive video frames will be similar except for

changes induced by objects moving within the frames.

These techniques is using the block matching technique which exploit different

search patterns and search strategies for finding the optimum MV for particular motion

estimation which reduced the number of search points. It efficiently removes the

temporal redundancy between successive frames by BMA.

Block-based ME is the most practical approach to obtain motion compensated

prediction frames. It divides frames into equally sized rectangular blocks and finds out

the displacement of the best-matched block from previous frame as the MV to the block

in the current frame within a search window.

The benefits of long-term memory motion compensated prediction (LTMCP) [2]

have been emphasized in recent years. Consequently, these tools have been adopted by

several recent standards like H.263+ and H.264iMPEG-4 AVC [3]. As continuously

dropping the costs of semiconductors, notably higher prediction gain can be achieved by

estimating more reference frames in the memory buffer. Nevertheless, an obvious

drawback is the complexity will increase proportionally. Extra data are also needed to

describe the reference indices.

8

In the early 1980s, some conventional fast algorithms were proposed, such as the

Three Step Search (TSS), the 2D logarithmic search, etc.[4] Among the algorithms, TSS

becomes the most popular one for low bit-rate video application, owing to its simplicity

and effectiveness. However, TSS uses a uniformly allocated search pattern in its first

step, which is not very efficient to catch small motion appearing in stationary or quasi-

stationary blocks.

To remedy this problem, several adaptive techniques have been suggested to

make the search more adaptable to motion scale and uncertainty. The uncertainty is

estimated by the difference of block distortion measure among the checked points. A

smaller difference indicates a large uncertainty and hence the search scope will be

increased in the next step.

2.2.1 Identifies the True Motion

The first type of ME algorithms targets to accurately track the true motion of

objects/features in video sequences. Video sequences are generated by projecting a 3D

real world onto a series of 2D images. When objects in the 3D real world move, the

brightness or pixel intensity of the 2D images change correspondingly. The 2D motion

projected from the movement of a point in the 3D real world is referred to as the “true

motion” [5]. One of the many potential applications of true motion is in computer

vision, the goal of which is to identify the unknown environment via the moving camera.

2.2.2 Removing Temporal Redundancy

The other type of ME algorithm target is to remove temporal redundancy in

video compression. A natural way to exploit redundancy between frames is for current

9

frame t determines predicted frame t from the frame (t-Δt) or from the frame (t+Δt).

Motion estimation and compensation are used to predict frame t to be coded between

successive frames. Motion compensation works by estimating motion between two

image frames. The motion is described by motion field of motion vectors. Consequently,

the prediction error is transmitted instead of the frame itself as shown in Figure 2.2.

Along with the prediction error, the motion information is also transmitted to the

decoder, for it to be able to estimate the motion. The very good proportion between

motion overhead and prediction error has block-based motion representation. It uses one

MV per one macroblock.

Figure 2.2 Predictive sources coding with motion compensation [6]

In this project, the kind of ME preferred is the second type which is removing

temporal redundancy. For the MV detection and calculation, BMA technique will be

implemented. The criteria of BMA also will be described in this chapter.