badripatro dissertation 09307903

116
Real-Time Video and Image Processing for Object Tracking using DaVinci Processor A dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology by Badri Narayan Patro (Roll No. 09307903) Under the guidance of Prof. V. Rajbabu DEPARTMENT OF ELECTRICAL ENGINEERING INDIANINSTITUTE OF TECHNOLOGY–BOMBAY July 15, 2012

Upload: sitaram1

Post on 27-Jan-2016

26 views

Category:

Documents


15 download

DESCRIPTION

real time video processing

TRANSCRIPT

Page 1: Badripatro Dissertation 09307903

Real-Time Video and Image Processing for ObjectTracking using DaVinci Processor

A dissertation submitted in partial fulfillment of

the requirements for the degree of

Master of Technology

by

Badri Narayan Patro(Roll No. 09307903)

Under the guidance of

Prof. V. Rajbabu

DEPARTMENT OF ELECTRICAL ENGINEERING

INDIAN INSTITUTE OF TECHNOLOGY–BOMBAY

July 15, 2012

Page 2: Badripatro Dissertation 09307903

This work is dedicated to my family and friends.

I am thankful for their motivation and support.

Page 3: Badripatro Dissertation 09307903
Page 4: Badripatro Dissertation 09307903

Abstract

A video surveillance system is primarily designed to track key objects, or people exhibiting sus-

picious behavior as they move from one position to another and record it for possible future use.

Object tracking is an essential part of surveillance systems. As part of this project, an algorithm

for object tracking in video based on image segmentation and blob detection and identification

was implemented on Texas Instrument’s (TI’s) TMS320DM6437 Davinci multi media proces-

sor. Using back ground subtraction, all objects present in the image can be detected irrespective

of they are moving or not. With the help of image segmentation, the subtracted image is fil-

tered out and free from salt pepper noise. The segmented image is processed for detecting and

identifying the blobs presents, which is going to be tracked. The object tracking is carried out

by feature extraction and center of mass calculation in feature space of the image segmentation

results of successive frames. Consequently, this algorithm can be applied to multiple moving

and still objects in the case of a fixed camera.

In this project we develop and demonstrate a framework for real-time implementation

of image and video processing algorithms such as object tracking and image inversion using

Davinci processor. More specifically we track single object and two object present in the scene

captured by a CC camera that acts as the video input device and output is displayed in LCD

display. The tracking happens in real-time consuming 30 frames per second (fps) and is robust

to background and illumination changes. The performance of single object tracking using back-

ground subtraction and blob detection was very efficient in speed and accuracy as compared to

a PC (Matlab) implementation of a similar algorithm. Execution time for different blocks of

single object tracking were estimated using the profile and accuracy of the detection is veri-

fied using the debugger provided by TI code composer studio (CCS). We demonstrate that the

TMS320DM6437 processor provides at least ten-times speed-up and is able to track a moving

object in real-time.

ii

Page 5: Badripatro Dissertation 09307903
Page 6: Badripatro Dissertation 09307903

Contents

Abstract ii

List of Figures vii

List of Tables ix

List of Abbreviations x

1 Introduction 1

2 DaVinci Digital Media Processor 5

2.1 DaVinci Processor and Family . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 DaVinci Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.2 Davinci Vs OMAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Introduction to TMS320DM6437 . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Main Components of TMS320DM6437 DVDP . . . . . . . . . . . . . 7

2.2.2 DM6437 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.3 CPU Core:TMS320C64x+ DSP . . . . . . . . . . . . . . . . . . . . . 9

2.2.4 Ethernet and PCI Interface . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.5 External Memory: On-Board Memory . . . . . . . . . . . . . . . . . . 10

2.2.6 Code Composer Studio . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Davinci Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1 Basic working functionality of the DaVinci processor . . . . . . . . . . 12

2.3.2 Signal Processing Layer (SPL) . . . . . . . . . . . . . . . . . . . . . . 12

2.3.3 Input output Layer (IOL) . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.4 Application Layer (APL) . . . . . . . . . . . . . . . . . . . . . . . . . 13

iv

Page 7: Badripatro Dissertation 09307903

2.4 VPSS and EPSI APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 VPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.2 EPSI APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 APIs and Codec Engine (CE) . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5.1 xDM and xDAIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5.2 VISA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5.3 CODEC ENGINE(CE) . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Video Processing Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6.1 Video standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6.2 Video timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6.3 Video Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6.4 Video Sampling Format . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6.5 Video IO Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Target Object Tracking 24

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.1.1 Conventional approaches for target tracking . . . . . . . . . . . . . . 24

3.2 Image preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Background subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4 Image segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 Blob detection and identification . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5.1 Basic steps of blob detection . . . . . . . . . . . . . . . . . . . . . . . 30

3.5.2 Blob detection and identification method . . . . . . . . . . . . . . . . 32

3.6 Feature extraction for blobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.7 Tracking: Centroid calculation . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Video and Image Processing Algorithms on TMS320DM6437 37

4.1 Implementation of Single target tracking on TMS320DM6437 . . . . . . . . . 38

4.1.1 Debugging and profiling results . . . . . . . . . . . . . . . . . . . . . 45

4.2 Implementation of multiple object tracking on DM6437 . . . . . . . . . . . . . 48

4.2.1 Debugging and profiling results . . . . . . . . . . . . . . . . . . . . . 54

4.3 Implementation object tracking algorithm in matlab . . . . . . . . . . . . . . . 57

v

Page 8: Badripatro Dissertation 09307903

5 Summary 61

6 APPENDIX 63

6.1 APPENDIX A:Real-Time Video Processing using Matlab simulink interface

with ccs studio3.3 in DM6437 DVDP . . . . . . . . . . . . . . . . . . . . . . 63

6.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.1.2 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.1.3 Software Requirements: . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.1.4 Configuration Parameters for C6000 Hardware . . . . . . . . . . . . . 66

6.2 APPENDIX B:Edge Detection using Video and Image library . . . . . . . . . 71

6.3 APPENDIX C: Video Processing Tokens . . . . . . . . . . . . . . . . . . . . 72

6.3.1 Video standard ( NTSC & PAL) . . . . . . . . . . . . . . . . . . . . . 72

6.3.2 Video timing(Interlaced Vs Progressive) . . . . . . . . . . . . . . . . . 74

6.3.3 Video Resolution(HD, ED, SD) . . . . . . . . . . . . . . . . . . . . . 74

6.3.4 Video file format(YUV420, YCbCr) . . . . . . . . . . . . . . . . . . . 75

6.3.5 Video IO Interface(Composite, component, S-Video) . . . . . . . . . . 76

6.4 APPENDIX D:YUV to RGB Conversion . . . . . . . . . . . . . . . . . . . . 78

6.4.1 YUV format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.4.2 8-Bit YUV Formats for Video . . . . . . . . . . . . . . . . . . . . . . 81

6.4.3 Color Space Conversion . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.5 APPENDIX E: Single object tracking on DM6437 CODE . . . . . . . . . . . . 88

vi

Page 9: Badripatro Dissertation 09307903

List of Figures

1.1 Object tracking for visual surveillance system . . . . . . . . . . . . . . . . . . 3

2.1 TMS320DM6437 hardware block diagram . . . . . . . . . . . . . . . . . . . . 8

2.2 TMS320DM6437 hardware component diagram . . . . . . . . . . . . . . . . . 10

2.3 DaVinci SW Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Video Processing Sub system . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 APIs of XDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.6 VISA Work Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.7 CE Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.8 CE Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1 Steps in an object tracking system . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Image segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.1 Single object tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2 Evm board setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 EVM board setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4 Result of single object tracking algorithm . . . . . . . . . . . . . . . . . . . . 47

4.5 Result of single object tracking algorithm . . . . . . . . . . . . . . . . . . . . 48

4.6 Multiple object tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.7 scale=.3,Result of multi object tracking algorithm . . . . . . . . . . . . . . . . 55

4.8 Result of multi object tracking algorithm . . . . . . . . . . . . . . . . . . . . . 55

4.9 Debugging results of Three target tracking . . . . . . . . . . . . . . . . . . . . 56

4.10 Different steps for object tracking using segmentation and pattern matching . . 57

4.11 Flow chat of Object tracking based on segmentation & pattern matching. . . . 58

vii

Page 10: Badripatro Dissertation 09307903

4.12 Featured extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.13 Result of object tracking algorithm . . . . . . . . . . . . . . . . . . . . . . . . 59

4.14 Results of tracking algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.1 DM6437 board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.2 Open simulink lib browser 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.3 Video capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.4 Add video display 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.5 Video capture conf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.6 Video display conf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.7 Video preview 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.8 Video image toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.9 Target selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.10 Video complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.11 Video sobal edge detection 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.12 Simulink conf 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.13 Simulink conf 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.14 Simulink conf 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.15 Video and image library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.16 4:4:4 YCbCr, 4:2:2 YCbCr, 4:2:0 YCbCr color sampling format respectable . . 76

6.17 YUY2 memory layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.18 UYVY memory layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.19 RGB2UYVY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.20 YUV sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.21 Picture ascept ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.22 Pixel ascept ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

viii

Page 11: Badripatro Dissertation 09307903

List of Tables

4.1 Single object tracking profiler data . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 Multiple object tracking profiler data . . . . . . . . . . . . . . . . . . . . . . 55

6.1 RGB and YCbCr values for various colors using BT.601 . . . . . . . . . . . . 80

ix

Page 12: Badripatro Dissertation 09307903

List of Abbreviations

APL Application Layer

DVDP Digital Video Development Platform

DVSDK Digital Video Software Development Kit

EPSI Embedded Peripheral Software Interface

EVM Evaluation Module

GPP General Purpose processor

HD High Definition

IOL Input output Layer

NTSC National Television System Committee

PAL Phase Alternating Line

SD Standard Defination

SPL Signal Processing Layer

VISA Video Image Speech Audio

VPBE Video Processing Back End

VPFE Video Processing Front End

VPSS Video Processing Sub System

xDAIS eXpressDSP Algorithm Interface Standard

xDM eXpressDSP Digital Media

x

Page 13: Badripatro Dissertation 09307903

Chapter 1

Introduction

Surveillance systems are used for monitoring, screening and tracking of activities in public

places such as banks, in order to ensure security. Various aspects like screening objects and

people, bio metric identification and video surveillance, maintaining the database of potential

threats etc., are used for monitoring the activity. Moving object tracking in video has attracted

a great deal of interest in computer vision [1]. For object recognition, navigation systems and

surveillance systems, object tracking is an first-step. The object tracking [13,16] methods may

broadly be categorized as segmentation-based method, template-based method, probabilistic

and pixel-wise. In segmentation-based tracking or “blob detector”, the basic idea is aimed

at detecting points and/or regions in the image that are either brighter or darker than the sur-

rounding. They are easy to implement and fast to compute but may lack accuracy for some

application [21]. Template concepts are based on matching the direct appearance from frame

to frame. These methods offer a great deal of accuracy but are computationally expensive [22].

The probabilistic method uses intelligent searching strategy for tracking the target object. Sim-

ilarly, the similarity matching techniques are used for tracking the target object in pixel-based

methods.

Most tracking algorithms are based on difference evaluation between the current image

and a previous image or a background image [2] . However, algorithms based on the difference

of images have problems in the following cases. (1) Still objects included in the tracking task

exist. (2) Multiple moving objects are present in the same frame. (3) The camera is moving.

(4) Occlusion of objects occurs. This can be solved by using an algorithm for object tracking

[1], based on image segmentation and pattern matching. But we use a novel approach image

segmentation algorithm, in order to extract all objects in the input image.

1

Page 14: Badripatro Dissertation 09307903

In all these applications fixed cameras are used with respect to static background (e.g.

stationary surveillance camera) and a common approach of background subtraction is used to

obtain an initial estimate of moving objects. First perform background modeling to yield ref-

erence model. This reference model is used in background subtraction in which each video

sequence is compared against the reference model to determine possible variation. The vari-

ations between current video frames to that of the reference frame in terms of pixels signify

existence of moving objects. The variation which also represents the foreground pixels are

further processed for object localization and tracking. Ideally, background subtraction should

detect real moving objects with high accuracy and limiting false negatives (not detected) as

much as possible. At the same time, it should extract pixels of moving objects with maximum

possible pixels, avoiding shadows, static objects and noise.

Image segmentation is the process of identifying components of the image. Segmentation

involves operations[3] such as thersholding, boundary detection, region growing etc. Thresh-

olding is the process of reducing the grey levels in the image. Many algorithms exist for thresh-

olding [19,20]. Boundary detection finds out edges in the image. Any differential operator can

be used for boundary detection [1,2].

Blobs are binary objects i.e. points which are in same state. Blobs can be differentiated

on the basis of color, shape, area, perimeter etc. An image with various shapes and colors has

to undergo various processes before actual blob detection. Blob detection is a corner stone for

object detection and recognition. A critical aspect of blob detection [7] is the precision of de-

tected pixels at the blob’s border. Usually a blob shows a descend in gradient of intensity at

the border. In threshold based detection this causes the detection of false-positives and leads

to imprecise results if the image material contains too much noise. The number of identified

pixels, which belong to the Blob, estimate the blob’s characteristics. A possible solution is

the parallel processing with detection procedures, in different image resolutions [11]. But this

requires multiple copies of the image data and transform it into different image resolutions.

In addition the results for the multiple copies have to be merged to one general result. Other

approaches use fixed parameters for the detection of blobs. For example the reduction of the

application environment to a fixed background [18] to apply foreground-background segmen-

tation. These methods are vulnerable for changing parameters, like illumination conditions or

changing perspectives.

A precise computation of the blob’s center-point is very dependent on the precision of

2

Page 15: Badripatro Dissertation 09307903

the blob detection. Especially for blobs that are not perfectly circular- or squarely-shaped, the

detection and computation methods need to be exact. Precision is an important factor because

with only a few number of false-positive pixels the computed center point gets shifted. This

will cause a large error in the computations for the position and orientation of the light emitting

device

In case of feature extraction and tracking, a common method to compute center-points

of blobs is a bounding box which refers to the minimum and maximum positions in the XY-

coordinate system. The method of inner circle creates a circle of maximum size that fits into

the blob area without intersecting the area around it. Both methods do not solve the problem

of precision, in reference to not perfectly circular- or squarely-shaped blobs. A very common

method is center-of-mass that refers to the number of pixels in relation to the coordinates of the

pixels [10]. It computes the center of the blob based on the number of pixels and weights them

by a related value, for example the brightness of the pixel.

It is possible to increase precision by applying a higher resolution, but this is the point

where GPP architectures reach their performance barrier. This is a big problem of computer

vision. With more data to analyze, the maximum frame rate goes down and the system is not

able to achieve real-time processing speed any longer.

Figure 1.1: Object tracking for visual surveillance system

Video surveillance systems require high resolution video, large bandwidth and higher

computational speed at low cost and power. DaVinci devices are suitable for surveillance ap-

plications as they provide, ASIC-like low cost and power for the complex processing, pro-

grammable DSPs with high performance. It also provide function accelerators in the video

processing subsystems(VPSS) for common video processing tasks such as encoding, decoding

3

Page 16: Badripatro Dissertation 09307903

and display. DaVinci devices can be used across all product lines i.e. digital video recorders

(DVR), digital video servers (DVS) and surveillance IP modules. Surveillance applications

are constantly evolving, adding new features like analytic, image stabilization, image recog-

nition, motion estimation, and target tracking. The TMS320DM6437, most remarkable im-

provements are a new C64x+ core, more level-1 (L1) memory and a new DMA architecture

over TMS320DM642 [59] . Two DMA controllers are available within the DM6437 DSP. The

IDMA can perform transfers from/to internal memory only, while the EDMA3 can perform

transfers from/to all kinds of memory. Its main disadvantage is a smaller level-2 (L2) memory.

TMS320DM6437 based on TI multimedia Digital Signal Processor (DSP)can be used for

real-time Object tracking [17][19] in video as it provides highly efficient video coding standard

H.264 for compression algorithm and wireless intelligent video monitoring system(WiFi and

WiMAX) [21]. The system can do real-time processing of video data acquired from CCD cam-

era by using the codec engine framework to call the video processing algorithm library, which

is used to implement image encoding and video object tracking according to user algorithm

and to provide a display on displaying device. In this project, a video object tracking approach

based on Image segmentation [1], background subtraction and blob detection and identification

was implemented on TMS320DM6437. Mixture of thershloding based segmentation, centroid

based tracking and novel idea for blob detection and identification algorithm are introduced

to improve video object tracking and it provides fast, accurate and good video object tracking

services.

4

Page 17: Badripatro Dissertation 09307903

Chapter 2

DaVinci Digital Media Processor

2.1 DaVinci Processor and Family

The DaVinci Technology is a family of processors with integrated with software and hardware

tools package for a flexible solution of the host of applications from cameras to phones to hand

held devices to automotive gadgets. DaVinci Technology is the combination of raw processing

power and software needed to simplify and speed up the production of digital multi-media and

video equipment.

DaVinci technology consists of:

• DaVinci Processors: Scalable, programmable DSPs and DSP-based SoCs(system on

chip) tailored from DSP cores, accelerators, peripherals and ARM processors optimized

to match performance, various price and feature requirements in a spectrum of digital-

video end equipments. i.e TMS320DM6437 and TMS320DM6467.

• DaVinci Software:It is inter communicable, optimized, video and audio standards.

codecs leveraging DSP and integrated accelerators, APIs within operating systems(Linux)

for rapid software implementation. i.e Codec Engine, DSP BIOS, NDK, Audio and Video

Codec.

• DaVinci Development Tools/Kits: Complete development kits along with reference

designs, DM6437 DVSDK, Code Composer Studio, Green Hill, Virtual linux.

DaVinci Video Processor solution are the tailored for digital audio, video, image and vi-

sion application. The DaVinci platform includes a general purpose processor(GPP), video ac-

celerators, an optional DSP, and related peripherals

5

Page 18: Badripatro Dissertation 09307903

2.1.1 DaVinci Family

• Dual-core(ARM and DSP) models : Those Davinci Processor, which are having Dual-

core(ARM+DSP) model are DM6443, DM6446, DM6467.

– DM6443, This processor Contains ARM9 + TI C64x+ DSP + DaVinci Video (De-

code).Which is used for Video Accelerator and Networking for display.

– DM6446 , Which is having ARM9 + TI C64x+ DSP + DaVinci Video (Encode

and Decode).Which is used for Video Accelerator and Networking for capture and

display.

– DM6467, This processor contains ARM9 + TI C64x+ DSP + DaVinci Video (En-

code and Decode). This processor is used for Video Accelerator and Networking

for high definition capture and display.

• Only-DSP models: The DaVinci chip which contains only DSP Processor are DM643x

and DM64x DM643x and DM64x are having DSP core as TI C64x+.

• Only-ARM models: The processor which contains ARM architect are DM335, DM355,

DM357 and DM365.

– DM335 is a variant (pin compatible) without the MJCP.

– DM355, which contains ARM9 + DaVinci Video (Encode and Decode) which is

used for MPEG4/JPEG co processor (MJCP).

– DM357 is a DM6446 variant (pin-compatible) with the DSP replaced by a dedicated

video co processor (HMJCP).

– DM365, which is an enhanced DM355, including addition of a high def second

video co processor (HDVICP)

• The DaVinci family of processors now scales from multiple core devices (e.g. DM644x)

to single core DSP devices (e.g. DM643x) to single core ARM devices (e.g. DM355)

• These processors are available today (TMS320DM647, TMS320DM648, TMS320DM643x,

TMS320DM6446, TMS320DM355, TMS320DM6467).

6

Page 19: Badripatro Dissertation 09307903

2.1.2 Davinci Vs OMAP

In case of DaVinci processor, we will get better DSP core performance and in case of OMAP

processor, we will get better on ARM core performance.The DaVinci processor is more suitable

toward DSP Design and on the other hand, OMAP has a much more powerful ARM core, so

for general purpose processing (i.e. GPP/ARM) the OMAP designs is much more suitable. For

OMAP3 in particular the DSP will always be slower than the ARM, where as for DM6467, the

ARM will always be slower than the DSP.

2.2 Introduction to TMS320DM6437

The TMS320DM6437 Digital Video Development Platform (DVDP) includes high performance

software programmable Digital Media Processors (DMP), which reduces time-to-market for

development of new multimedia applications. DMP, being programmable, provides an impor-

tant feature of flexibility that helps in easy development and debugging of multimedia applica-

tions. The kit is designed to support various Videoover- IP applications such as set-top boxes,

surveillance cameras, digital video recorders and encoding and decoding flexibility for various

industry-standards available in market.

There are many advantages of using DMP over FPGA or ASIC. ASIC has few disadvan-

tages amongst which its higher cost and longer time-to-market are crucial in mass production

and marketing. FPGAs too are considerably harder to program in certain applications. DMP

provides a good balance in terms of cost, flexibility, easily programmable and the time-to-

market compared to rest two. The new DaVinci series processor included in this DVDP supports

new set of extended instructions that helps to increase its overall performance.

2.2.1 Main Components of TMS320DM6437 DVDP

This platform contains both stand-alone and PCI-based evaluation and development of any ap-

plication that uses TI’s DaVinci processors. The TMS320DM6437 (DM6437) contains a single

C64x+ core along with a video input (VPFE) and video output macro (VPBE). Key features of

this platform are:

1. TI’s DM6437 processor with operating frequency of 600 MHz

7

Page 20: Badripatro Dissertation 09307903

2. 10/100 MBS Ethernet / PCI Bus Interface

3. 128 Mbytes of DDR2 SDRAM

4. 16 Mbyte non-volatile Flash memory + 64 Mbyte NAND Flash + 2 Mbyte SRAM

5. One video decoder(TVP5146M2) that support composite or S-video

6. Configurable BIOS load option

7. Embedded JTAG Emulation Interface

8. Four video DAC outputs that support component, composite and RGB

9. AIC33 stereo codec

10. four LEDs and four position DIP switch are present for user input output testing.

Figure 2.1: TMS320DM6437 hardware block diagram

2.2.2 DM6437

DM6437 is a high performance processor from TI’s DaVinci Family that supports clock rates

of 400, 500, 600, 660 and 700 MHz. The DSP core contains eight functional units, two general-

purpose register files and 2 data paths. Eight functional units can execute eight instructions

simultaneously. Each functional unit is assigned a dedicated task of multiplication, arithmetic,

8

Page 21: Badripatro Dissertation 09307903

logical and branch operation, load data from memory into registers and store data from registers

into memory. The two general purpose register files, namely A and B, contains 32 32-bit reg-

isters, providing a total of 64 registers. These registers support data types that include packed

8-bit data, packed 16- bit data, 32-bit data, 40-bit data and 64-bit data values. In case of values

exceeding 32- bits, a pair of registers is used to represent 40-bit and 64-bit data values.

2.2.3 CPU Core:TMS320C64x+ DSP

The TMS320C64x+ DSPs are the highest-performance fixed-point DSP generation in the TMS320-

C6000 DSP platform. The CPU core consists of Level 1 Program (L1P) cache, Level 1 Data

(L1D) cache and Level 2 (L2) Unified cache. L1P cache has a size of 32 Kbytes and can be

configured either as a memory-mapped or direct-mapped cache. L1D cache has a size of 80

Kbytes, out of which 48 Kbytes is configured as memory-mapped and 32 Kbytes can be con-

figured either as a memory-mapped or 2-way set associative cache. L2 cache can be upto 256

Kbytes in size and shared as program as well as data. L2 memory can be configured as a stan-

dalone SRAM or as a combination of Cache an SRAM. The size of L2 cache can be varied by

making changes in the configuration of system. These changes can be performed in the gel file

(evmdm6437.gel) by changing the parameter CACHE_L2CFG. If the value of this parameter

is set to 0, it indicates that no L2 cache is configured and whole memory is used as SRAM.

By changing the value of CACHE_L2CFG to 1, 2, 3 or 7, one can get L2 cache size of 32KB,

64KB, 128KB or 256KB respectively [5].

The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable

of executing one instruction every clock cycle. The .M functional units perform all multiply

operations. The .S and .L units perform a general set of arithmetic, logical, and branch func-

tions. The .D units primarily load data from memory to the register file and store results from

the register file into memory.

2.2.4 Ethernet and PCI Interface

The Ethernet interface on DM6437 provides an interface between the board and the external

network. This interface supports both 10 Mbps and 100 Mbps network connections. The Pe-

ripheral Component Interconnect (PCI) provides an interface to connect DM6437 with other

PCI-compliant devices. This connection provides an easy way for movement of data from one

9

Page 22: Badripatro Dissertation 09307903

Figure 2.2: TMS320DM6437 hardware component diagram

device to another.

2.2.5 External Memory: On-Board Memory

The TMS320DM6437 consist of 128 Mbytes of DDR2 SDRAM. The SDRAM is used for

storage of program, video or data. Also, the board contains 16 Mbytes NOR Flash, 2 Mbytes

SRAM and 64 Mbytes NAND Flash. NAND and NOR Flash are used mainly as boot-loaders,

while SRAM is mainly used for debugging application code.

2.2.6 Code Composer Studio

Code Composer Studio (CCS) provides an Integrated Development Environment (IDE) to in-

corporate the software tools used to develop applications targeted to Texas Instruments Digital

Signal Processors. CCS includes tools for code generation,such as a C compiler, an assembler,

and a linker. It has graphical capabilities and supports real-time debugging. It provides an easy-

to-use software tool to build and debug programs.

10

Page 23: Badripatro Dissertation 09307903

The C compiler compiles a C source program with extension .c to produce an assem-

bly source file with extension .asm.The assembler assembles an.asm source file to produce a

machine language object file with extension.obj. The linker combines object files and object

libraries as input to produce an executable file with extension.out. This executable file repre-

sents a linked common object file format (COFF), popular in Unix-based systems and adopted

by several makers of digital signal processors.

To create an application project, one can “add” the appropriate files to the project. Com-

piler/linker options can readily be specified. A number of debugging features are available,

including setting breakpoints and watching variables; viewing memory, registers, and mixed C

and assembly code; graphing results; and monitoring execution time. One can step through a

program in different ways (step into, over, or out).

Real-time analysis can be performed using real-time data exchange (RTDX). RTDX allows

for data exchange between the host PC and the target DVDP, as well as analysis in real time

without stopping the target. Key statistics and performance can be monitored in real time.

Through the joint team action group (JTAG), communication with on-chip emulation support

occurs to control and monitor program execution. The DM6437 EVM board includes a JTAG

interface through the USB port.

CCS provides a single IDE to develop an application by offering following features:

• Programming DSP using C/C++

• Ready-to-use built-in functions for video and image processing

• Run-time debugging on the hardware

• Debugging an application using software breakpoints

2.3 Davinci Software Architecture

DaVinci Software Architecture consist of three layer that is Signal Processing Layer (SPL),Input-

Output Layer (IOL) and Application Layer(APL). Signal Processing Layer take care of all the

processing functions. Similarly, all all the input and output functions are grouped into another

layer that is Input-Output Layer (IOL). The third layer is the Application Layer(APL) which is

the most important part in developing new algorithm. Of course, Most of time we will develop

components for either the SPL or IOLs.

11

Page 24: Badripatro Dissertation 09307903

2.3.1 Basic working functionality of the DaVinci processor

let us take example of video capture driver, for example, reads data from a video port or periph-

eral and starts filling a memory buffer. When this input buffer is full, an interrupt is generated

by the IOL to the APL and a pointer to this full buffer is passed to the APL.The APL picks up

this buffer pointer and in turn generates an interrupt to the SPL and passes the pointer. The SPL

now processes the data in this input buffer and when complete, generates an interrupt back to

the APL and passes the pointer of the output buffer that it created. The APL passes this output

buffer pointer to the IOL commanding it to display it or send it out on the network.Note that

only pointers are passed while the buffers remain in place.The overhead passing the pointers is

negligible.

All these three layers and different APIs and Different Driver and component are shown

in Figure 2.1

2.3.2 Signal Processing Layer (SPL)

SPL consists of the entire signal processing functions or algorithms that run on the device. For

example, a video codec, such as MPEG4-SP or H.264, will run in this layer. These algorithms

are wrapped with expressed Digital Media(xDM) API. In between xDM and (Video, Image,

Speech, Audio)VISA are the Codec Engine, Link and DSP/BIOS. Memory buffers, along with

their pointers, provide input and output to the xDM functions. This decouples the SPL from all

other layers. The Signal Processing layer (SPL) presents VISA APIs to all other layers. The

main component of the SPL are xDM, XDAIS, VISA APIS and Codec Engine Interface.

2.3.3 Input output Layer (IOL)

The Input Output Layer (IOL) covers all the peripheral drivers and generates buffers for input or

output data. Whenever a buffer is full or empty, an interrupt is generated to the APL. Typically,

these buffers reside in shared memory, and only pointers are passed from IOL to the APL and

eventually to SPL. The IOL is delivered as drivers integrated into an Operating System such

as Linux OS or WinCE. In the case of Linux, these drivers reside in the kernel space of Linux

OS. The Input Output layer (IOL) presents the OS-provided APIs as well as EPSI APIs to all

other layers. IOL contains Video Processing Subsystem(VPSS) device driver used for video

capturing and displaying, USB driver to capture video to USB based media, debug is done by

12

Page 25: Badripatro Dissertation 09307903

Figure 2.3: DaVinci SW Architecture

using UART serial port driver for console application, when we want to captured video is sent

over the network we need for Ethernet driver that is EMAC and VPFE driver internally uses

I2C driver for communication protocol, for audio processing system Multichannel Audio Serial

Port (McASP) driver are used, for buffering of stream data we are using Multichannel Buffered

Serial Port (McBSP) driver are used.

2.3.4 Application Layer (APL)

The Application layer interacts with IOL and SPL. It makes calls to IOL for data input and

output, and to SPL for processing.The Sample Application Thread (SAT) is a sample application

component that shows how to call EPSI and VISA APIS and interfaces with SPL and IOL as

built in library functions. All other application components are left to the developer. He may

develop them or leverage the vast open source community software. These include, but not

limited to, Graphical User Interfaces (GUI), middle ware, networking stack, etc. Master thread

is the highest level thread such as an audio or video thread that handles the opening of I/O

resources (through EPSI API), the creation of processing algorithm instances (through VISA

API), as well as the freeing of these resources. Once necessary resources for a given task

are acquired, the master thread specifies an input source for data (usually driver or file), the

processing to be performed on the input data (such as compression or decompression) and an

output source for the processed data (usually driver or file).

13

Page 26: Badripatro Dissertation 09307903

The Network Developer’s Kit(NDK)provides services such as HTTP server, DHCP client/server,

DNS server, etc. that reside in the application layer. Note that these services use the socket in-

terface of the NDK, which resides in the I/O layer, so the NDK spans both layers.

2.4 VPSS and EPSI APIs

VPSS is a video processing driver present in IOL. Embedded Peripheral Software Interface(EPSI),

which is span to both APL and IOL layer. APL makes calls to IOL for data input and output

driver and APIs.

2.4.1 VPSS

VPSS provides an input interface Video Processing Front End (VPFE) on the DM6437 for

external imaging peripherals such as image sensors, video decoders and digital camera in order

to capture the image and an output interface video processing back end, (VPBE) for display

devices, such as analog SDTV displays, digital LCD panels, HDTV video encoders in order to

display the processed image. VPSS (Video Processing Sub System) block level diagram, which

mainly consists of VPFE and VPBE subsystems is shown in Figure 2.2.

Figure 2.4: Video Processing Sub system

The common buffer memory and direct memory access (DMA) controls ensure the effi-

cient use of the DDR2 memory controller burst bandwidth and other peripherals. The shared

buffer memory logic provides primary source/sink to all of VPFE and VPBE modules.VPSS

14

Page 27: Badripatro Dissertation 09307903

uses DDR2 bandwidth efficiently due to both its large bandwidth requirements and the real-

time requirements of the VPSS modules.

1. Video Processing Front End (VPFE)

The VPFE block is comprised of a charge-coupled device (CCD) controller (CCDC),

preview engine image pipe (IPIPE), hardware 3A statistic generator (H3A), resizer and

histogram.

The CCD controller is responsible for accepting raw unprocessed image/video data from a

sensor (CMOS or CCD). The preview engine image pipe (IPIPE) is responsible for trans-

forming raw (unprocessed) image/video data from a sensor (CMOS or CCD) into YCbCr

422 data which is easily controlled for compression or display Typically, the output of

the preview engine is used for both video compression and displaying it on an external

display device, such as a NTSC/PAL analog encoder or a digital LCD. The output of Pre-

view engine or DDR2 is the input to the Resizer, which can be resized to the 720x480

pixels per frame. The output of the resizer module will be sent to the SDRAM/DDRAM.

Then Resizer is free to the Preview Engine Pipe for the further processing.

The H3A module is designed to support the control loops for auto focus (AF), auto white

balance (AWB), and auto exposure (AE) by collecting metrics about the imaging/video

data, where AF engine extracts and filters RGB data from the input image/video data and

provides either the accumulation or peaks of the data in a specified region and AE/AWB

engine accumulates the values and checks for saturated values in a sub-sampling of the

video data. Histogram allows luminance intensity distribution of the pixels of the im-

age/frame to be represented.

2. Video Processing Back End (VPBE) VPBE is responsible for displaying the processed

image on different display devices such as TV, LCD or HDTV. The VPBE block is com-

prised of the on-screen display (OSD) and the video encoder (VENC) modules.

OSD is a graphic accelerator,which is responsible for resizing of images to either NTSC

format or PAL format (640x480 to 720x576) on the output devices and it combines dis-

play windows into a single display frame, which helps VENC module to output the video

data. The primary function of the OSD module is to gather and combine video data

and display/bitmap data and then pass it to the video encoder (VENC) in YCbCr for-

15

Page 28: Badripatro Dissertation 09307903

mat.VENC converts the display frame from the OSD into the correctly formatted, desired

output signals in order to interface it to different display devices.

The VENC takes the display frame from the on-screen display (OSD) and formats it into

the desired output format and output signals (including data, clocks, sync, etc.) that are

required to interface to display devices. The VENC consists of three primary sub-blocks,

analog video encoder which generates required signals to interface to NTSC/PAL system

also includes video A/D converter,second is timing generator, responsible for generate the

specific timing required for analog video output and lastly digital LCD controller ,which

supports various LCD display formats, YUV outputs for interface to high-definition video

encoders and/or DVI/HDMI interface devices.

2.4.2 EPSI APIs

Device driver APIs vary from OS to OS (Linux, DSP/BIOS,Win CE, etc.) . For example, the

device driver APIs for Linux are different from the device driver APIs for DSP/BIOS. Usage

of device driver APIs creates portability issues when an application is migrated from one OS

to another, that is DM6446 with Linux to DM6437 with DSP/BIOS. So EPSI is a common

interface across all OS and have a separate glue layer that maps the EPSI APIs to device driver

specific APIs. Each OS has a separate glue layer called EPSI to Driver Mapping (EDM) for

each device. The DSP/BIOS EDM glue layer maps the EPSI APIs to DSP/BIOS device driver

APIs as shown in figure 2.3. Definition of EPSI APIs does not mask or prevent the usage of

device driver APIs directly .

The Different ESPI APIs are DEV_open(), DEV_read(), DEV_write(), DEV_close(),

DEV_control(), DEV_getBuffer(), DEV_returnBuffer()

1. VPFE_OPEN: This function initializes the device and returns a handle. The FVID handle

is then used to configure the video input (composite, component, s-video), video standard

(NTSC, PAL, SECAM, AUTO), video file format (UYVY, YUYV, YUV420, etc.). These

configurations represent the actual physical connection and the file format supported by

the driver.

2. VPFE_GETBUFFER: The VPFE driver has a queue of buffers for capturing video frames.

Whenever a buffer is filled up, it is moved to the back of the queue. Instead of FVID_dequeue()

and FVID_queue() pair of API calls for checking status of the buffer full or not, the FVID

16

Page 29: Badripatro Dissertation 09307903

APIs provide another API FVID_exchange(). This API removes a buffer from the VPFE

buffer queue, takes a buffer from application and adds it buffer to the VPFE buffer queue.

The buffer dequeued from the VPFE buffer queue is returned to the application. This

function exchanges a processed buffer for a buffer that is ready for processing. Once the

buffer is exchanged, it is available the application for further processing. The buffer to be

processed is accessed via vpfeHdl- >buf structure.

FVID_queue() returns an empty buffer to the driver to be filled (input driver) or passes

a full buffer to the driver to be displayed (output driver). FVID_dequeue() aquires a full

buffer from the driver (input driver) or acquires an empty buffer from the driver for app to

fill (output driver). if no buffers are already available in the stream, this call can block until

a buffer is made available from the driver pBuf is passed by reference for FVID_dequeue

to modify with the address of the return buffer.

3. VPFE_RETURNBUFFER: In Linux, the VPFE_getBuffer() dequeues a buffer from the

VPFE queue. It needs to be returned back to the VPFE queue using VPFE_returnBuffer()

function. However, in DSP/BIOS, the buffer is dequeued and queued at once using the

FVID_exchange() API inside the VPFE_getBuffer() IOL API. Hence, there is no need to

return the buffer back to the VPFE queue.

4. VPFE_CLOSE: This function is used to uninitialize the VPFE device. The buffers allo-

cated at the VPFE driver are freed using the FVID_free() API. The device is then unini-

tialized using FVID_delete() API.

2.5 APIs and Codec Engine (CE)

DaVinci processor have three APIs. These are eXpressDSP Digital Media (xDM), Video Image

Speech Audio (VISA) and EPSI APIs and also has an transparent interface Codec Engine (CE).

Codec Engine is a piece of software, developed by Texas Instruments, that manages the system

resources and translates VISA calls into xDM calls. EPSI is discussed in previous section .

2.5.1 xDM and xDAIS

When we want to create an algorithm which is basic eXpressDSP-compliant (xdc). It requires

to have a standard interfacing algorithm. This standard interfacing is provided by xDAIS and

17

Page 30: Badripatro Dissertation 09307903

xDM .

• xDAIS

An eXpressDSP Algorithm Interface Standard (xDAIS) [33] algorithm is a module that

implements the abstract interface internal algorithm (IALG). The IALG API takes allows

the user application to allocate memory for the algorithm and share memory between

algorithms. The algorithm has to contain these basic APIs that is Resource allocation/de-

allocation, initialization and start/stop APIs in order to satisfy xDAIS standard. These

APIs are :

- algAlloc()

- algInit()

- algActivate()

- algDeactivate()

- algFree()

• xDM

An xDM standard defines a uniform set of APIs for multimedia compression algorithms

(codecs) with the main intent of providing ease of replaceability and insulate the applica-

tion from component level changes. xDM components may run on either the DSP or the

ARM processor.

xDIAS is a the base class which is having various API functions i.e.algoAlloc(), algInit()

etc. where xDM is the child class which is inherited from its base class(xDIAS), which

contain all the function of the base class and it own function algProcess()process and

algControl() as shown in figure 2.4. Run time process & control APIs.

2.5.2 VISA

CE presents a simple and consistent set of interfaces to the application developer called VISA,

which stands for Video, Imaging, Speech and Audio.

Lets take an example of an Video Encoder application, the codec engine software basi-

cally translates these create, control, process and delete VISA APIs to their respective xDM

APIs, while managing the system resources and inter-processor communication. create() API

18

Page 31: Badripatro Dissertation 09307903

Figure 2.5: APIs of XDM

Figure 2.6: VISA Work Flow

creates an instance of xDM algorithm and allocates the required resources for the algorithm to

run. Create() API queries the xDM algorithm for the resources that it needs, and based on the

algorithm’s requirements, it allocates them with the help of Codec Engine. CE checks the re-

source pool and responds to the queries sent by create API. Note that xDM-compliant functions

cannot allocate resources directly; they can only request for the resources; the Codec Engine is

always in control of the resources and manages them across multiple functions running in SPL

as shown in Figure 2.5.

Control() API allows the APL to modify parameters that control the algorithm. For ex-

ample, in a MPEG4 video codec, we want to change the bit-rate or resolution. Process () API

filters the input buffer to get the output buffer e.g., encode or decode function. For example,

an MPEG4 algorithm would use the input buffer, encode it and create an encoded frame in an

output buffer.delete() API deletes the algorithm instance and reclaims the resources.

The process and control API of VISA are a direct reflection of the low-level process and

control functions of the xDM algorithm. As a result, we are providing low-level control of

codecs along with high level abstraction of the details . In Figure 2.4, we show the specific

VISA and xDM APIs. The APL has to understand only these four APIs and the signature of

19

Page 32: Badripatro Dissertation 09307903

these APIs is held constant which helps easy to replace one codec with another .

VISA - Four SPL Functions Complexities of Signal Processing Layer (SPL) are abstracted

into four functions: _create, _delete, _process, _control.

• Create: creates an instance of an algorithm that is, it malloc’s the required memory and

initializes the algorithm.

/* allocate and initialize video decoder on the engine */

dec = VIDDEC_create(ce, decoderName, &vdecParams);

• Process: invokes the algorithm, calls the algorithms processing function passing descrip-

tors for in and out buffers.

/* decode the frame */

status = VIDDEC_process(dec, &encodedBufDesc, &outBufDesc,

&decInArgs, &decOutArgs);

• Control: used to change algorithm settings, algorithm developers can provide user con-

trollable parameters.

/* Set Dynamic Params for Decoder */

status = VIDDEC_control(dec, XDM_SETPARAMS, &decDynParams,

&decStatus);

• Delete: deletes an instance of an algorithm, opposite of "create", this deletes the memory

set aside for a specific instance of an algorithm .

/* teardown the codecs */

VIDDEC_delete(dec);

The codec engine software basically translates these create, control, process and delete APIs to

their respective xDM APIs, while managing the system resources and inter-processor commu-

nication.

20

Page 33: Badripatro Dissertation 09307903

2.5.3 CODEC ENGINE(CE)

Codec Engine is a set of APIs that use to instantiate and run xDAIS algorithms. A VISA

interface is provided as well for interacting with xDM-compliant xDAIS algorithms. Codec

Engine(CE) comes into picture to manage the system resources and functions. All these lower

level management and control functions are now handled by the Codec Engine which manages

the xDM component and abstracts the application developer from the signal processing layer.

Codec Engine is a set of APIs which are used to call xDAIS algorithm. Users can in-

stantiate and call xDAIS algorithm by them. They also provide VISA interface which have

compatibility with xDM . CE interface of algorithm is the same regardless of operating system

is Linux, VxWorks, DSPBIOS or WinCE and processor (ARM or DSP) hardware. Core Engine

API(CE API) are :

• Engine_open: Open a Codec Engine;

• Engine_close: Close a Codec Engine;

• Engine _getCpuLoad: Obtain the percentage of usage of CPU;

• Engine _ getLastError: Obtain the last error code caused by operation;

• Engine_getUsedMem: Obtain the usage of memory.

Codec Creation (Instantiation)

Figure 2.7: CE Interface

In Figure 2.7, when VIDENC_create()(VISA interface) application sends a request to cre-

ate an instance via codec engine. Codec Engine gathers algorithm resource requirements from

21

Page 34: Badripatro Dissertation 09307903

algorithm via iAlg(algorithm interface) and iDMA(DMA interface) interfaces. then Codec En-

gine secures resources from resource pool then Codec Engine grants resources to Algo via

iAlg and iDMA interfaces. Then using xDAIS APIs (algNumAlloc, algAlloc, algInit, algFree,

algActivate, algDeactivate, algMoved) will create an instant for codec.

Figure 2.8: CE Algorithm

The codec engine provides a robust, consistent interface for dynamically creating/deleting

algorithms and accessing/controlling algorithm instances. This allows algorithms of the same

class to be easily exchanged without any modification to application code and also allows the

same application code to be used across a variety of platforms without modification are shown

in figure 2.8.

2.6 Video Processing Tokens

There are five different video processing tokens presents. These are video standard, video

timing, video resolution, video sampling format, video IO Interface.

2.6.1 Video standard

NTSC ( National Television System Committee) and PAL ( Phase Alternating Line) are two

video standard. The different functionality of these two are described in appendix C.

2.6.2 Video timing

There are two types of video timing is present, that is interlaced and progressive. The difference

between the interlaced and progressive are prenest in appendix C.

22

Page 35: Badripatro Dissertation 09307903

2.6.3 Video Resolution

Different types of video resolution are standard definition(SD), enhanced definition(ED), high

definition(HD). The difference between these three resolution are present in appendix C.

2.6.4 Video Sampling Format

Different type of the file format are 4:4:4 YUV sampling Format, 4:2:2 YUV sampling Format,

4:2:0 YUV. detail of sampling format, color conversion, 8 bit format, benifit of YUV over other

are present in appendix C. sampling Format.

2.6.5 Video IO Interface

There are three type of video IO interface present, these are composite, component, S-video.

Basic explanation of composite,S video and component are present in appendix C.

2.6.6 Summary

In this chapter we discussed the details of the DaVinci family and how it differs from OMAP

processor. We also provided an introduction to the software architecture, APIs,and major sub-

systems of the DaVinci processor TMS320DM6437.

23

Page 36: Badripatro Dissertation 09307903

Chapter 3

Target Object Tracking

3.1 Introduction

Object tracking can be defined as the process of segmenting an object of interest from a video

scene and keeping track of its motion, orientation, occlusion etc. in order to extract useful

information. Object tracking algorithm is applied to different application such as, automated

video surveillance, robot vision, traffic monitoring, animation etc.

3.1.1 Conventional approaches for target tracking

There are two major components of a visual tracking system [7]; target representation and lo-

calization, and filtering and data association. Target representation and localization is mostly

a bottom-up process. These methods give a variety of tools for identifying the moving object.

Locating and tracking the target object successfully depends on the algorithm. Typically the

computational complexity of these algorithms is low. The following are some common target

representation and localization algorithms:

• Point tracking (Blob tracking): Segmentation of object interior (e.g. blob detection,

block-based correlation).

• Kernel-based tracking (Mean-shift tracking): An iterative localization procedure based

on the maximization of a similarity measure such as Bhattacharyya coefficient.

• Silhouette tracking (Contour tracking): Detection of object boundary(e.g. active con-

tours or condensation algorithm, watershed algorithm).

24

Page 37: Badripatro Dissertation 09307903

Filtering and data association is mostly a top-down process, which involves incorporating prior

information about the scene or object, dealing with object dynamics, and evaluation of different

hypotheses. The computational complexity for these algorithms is usually much higher.The

following are some common filtering and data association algorithms:

• Kalman filter: An optimal recursive Bayesian filter for linear functions subjected to

Gaussian noise.

• Particle filter: Useful for sampling the underlying state-space distribution of non-linear

and non-Gaussian processes.

The major steps for object tracking are as shown in Figure 4.1. Here is the different steps

Figure 3.1: Steps in an object tracking system

in object tracking:

1. Image preprocessing.

2. Background subtraction

3. Image segmentation.

4. Blob detection and identification.

5. Feature extraction.

6. Object tracking.

25

Page 38: Badripatro Dissertation 09307903

3.2 Image preprocessing

The image captured by a surveillance camera is affected by various system noise, and output

data format may be uncompressed or compressed. In order to remove the noise preprocessing

of the image is essential. Preprocessing of image includes, filtering and noise removal data.

3.3 Background subtraction

Background subtraction[14,16] is a widely used approach for detecting moving objects in videos

from static cameras. The rationale in this approach is that of detecting the moving objects from

the difference between the current frame and a reference frame, often called the “background

image”, or “background model”. It is required that the background image must be a represen-

tation of the scene with no moving objects and must be kept regularly updated so as to adapt to

the varying luminance conditions and geometry settings.

The main motivation for the background subtraction is to detect all the foreground objects

in a frame sequence from fixed camera. In order to detect the foreground objects, the difference

between the current frame and an image of the scene’s static background is compared with a

threshold. The detection equation is expressed as:

|frame(i)− background(i)| > Th (3.1)

The background image varies due to many factors such as illumination changes (gradual or

sudden changes due to clouds in the background), changes due to camera oscillations, changes

due to high-frequencies background objects (such as tree branches, sea waves etc.).

The basic methods for background subtraction are

1. Frame difference

|frame(i)− frame(i− 1)| > Th (3.2)

Here the previous frame is used as an background estimate. This evidently works only in

particular conditions of object’s speed and frame rate, and is very sensitive to the threshold

.

2. Average or median

26

Page 39: Badripatro Dissertation 09307903

Background image is obtained as the average or the median of the previous n frames.

This method is rather fast, but needs large memory. The memory requirement is n *

size(frame).

3. Background obtained as the running average

B(i+ 1) = α ∗ F (i) + (1− α) ∗B(i) (3.3)

where α, the learning rate, is typically 0.05 and no more memory requirements.

There are various other simple approaches aiming to, maximize speed, limit the mem-

ory requirements, to achieve the highest possible accuracy under any possible circumstances.

These approaches include, running gaussian average, temporal median filter, mixture of gaus-

sians, kernel density estimation (KDE), sequential KD approximation, co-occurrence of image

variations, eigenbackgrounds. Since all the approaches focus on real-time performance, a lower

bound on speed always exists. A short review of all these approaches is described below.

• Median, running average give the fastest speed [14]. Mixture of gaussians, KDE, eigen-

backgrounds, SKDA, optimized mean-shift gives intermediate speed while standard mean-

shift gives slowest speed.

• For memory requirements, average, median, KDE[14], mean-shift consumes highest mem-

ory. Mixture of Gaussian, eigenbackgrounds , SKDA consumes intermediate and running

average consumes very low memory.

• For accuracy parameter, mixture of gaussians and eigen backgrounds provide good ac-

curacy and the simple methods such as standard average, running average, median can

provide acceptable accuracy in specific applications.

3.4 Image segmentation

Image segmentation [3] is very essential and critical to image processing and pattern recogni-

tion. The basic aspects in image segmentation include thresholding, clustering, edge detection

and region growing. Image segmentation is the process of dividing an image into regions that

have same characteristics and then extracting the interested regions. It has applications in target

tracking, automatic control, biomedical image analysis, agricultural engineering and other fields

27

Page 40: Badripatro Dissertation 09307903

of image processing. In this process each region which is union of any two adjacent regions is

not homogeneous. If P( ) is a homogeneity predicate defined on groups of connected pixels[1],

then segmentation is a partition of the set F into connected subsets [3] or regions (S1, S2,..., Sn)

such thatn⋃

i=0

Si with Si ∩ Sj = φ where i 6=j.

The uniformity predicate P(Si)=true for all regions, Si ,and P(Si ∩ Sj)=false, when i6=j and

Si and Sj are neighbors.

The various steps in image segmentation are :

1. Thresholding [3]

Thresholding means classify image histogram by one or several thresholds. The pixels

are classified based on gray scale values lying within a gray scale class. The process of

thresholding involves deciding a gray scale value to distinguish different classes and this

gray scale value is called “threshold”. Threshold based classification can be classified

as global-threshold dividing and local-threshold dividing. Global-threshold dividing in-

volves obtaining threshold by entire image information and dividing entire image. Local-

threshold dividing involves obtaining thresholds in different regions and dividing each

region based on it.

In threshold segmentation, selecting threshold is the key. In traditional segmentation,

threshold is determined by one-dimension histogram. However, one dimension histogram

only reflects the distribution of image gray scale, without the space correlation between

image pixels. It may lead to error in segmentation and dis satisfactory result. Other image

segmentation algorithm include region growing, edge detection, clustering etc. Among

these thresholding and region growing are generally not used alone, but used in a series of

treatment process. The disadvantage is that it has inherent dependence on the selection of

seed region and the order in which pixels and regions are examined;the resulting segments

by region splitting may appear too square due to the splitting scheme.

2. Novel image segmentation approach: A novel image segmentation algorithm proposed

in[6] uses crisp fuzzier, smooth filter and median filter in the order as shown in figure

below.

In this approach, crisp fuzzier is used for finding out the relevant gray value information

and the output data of crisp fuzzier is then processed in order to eliminate isolated points

28

Page 41: Badripatro Dissertation 09307903

Figure 3.2: Image segmentation

and noise. Elimination of isolated points and noise removal is done by using a binary

smoothing filter and a median filter respectively. The processed image is then input to

the object detection and identification[6],[8],[9],[10] algorithm to search for image blobs.

Each blob is enclosed by rectangle which is elastic in nature, i.e., it can stretch in vertical,

horizontal and downward directions until the whole blob is enclosed in a rectangle. The

process is then repeated for all blobs that are present in the image. The statistical fea-

ture[1] of each blob such as, the approximate location of the center gravity, the size of the

rectangular enclosure, the actual size or pixel count of the blob, and volume bounded by

the membership function value are calculated. Tracking is carried out by calculating the

center of mass[7] of the the identified blobs. The image segmentation process is discussed

next.

Image filtering algorithm In image filtering algorithm is as follows:

(a) Read the pixel data of the input video frame . The data is processed through a crisp

fuzzier. Let the pixel value , say p, is in the range of PL and PH ( PL ≤ p ≤ PH),

then the data is assigned a membership function value of 1.0. Else the data is set to

zero. The process is repeated until all data are processed, where PL and PH are the

lower and upper limits of the pixel values of a color.

(b) The image data obtained in step 1 is input to a binary smoothing filter which is used

to

i. remove isolated points;

ii. fill in small holes in otherwise dark areas; and

iii. fill in small notches in straight-edge segments.

29

Page 42: Badripatro Dissertation 09307903

(c) The image data obtained in step 2 is filtered using a median filter to remove noise

in the blobs and to force points with very distinct intensities to be more like their

neighbors.The median filter is a nonlinear digital filtering technique, it replaces the

center value in the window with the median of all the pixel values in the window.

However, its performance is not better than Gaussian blur for high levels of noise,

where as for speckle noise and salt and pepper noise (impulsive noise), it is particu-

larly effective.

3.5 Blob detection and identification

For different kinds of blobs, different detection methods are needed. Blob detection algorithm

need to fulfill number of requirements listed as below:

• Reliability / noise insensitiveness: a low level vision method must be robust against under

and over segmentation to noise.

• Accuracy: In many applications highly accurate results in sub pixel resolutions are needed.

• Scalability:the algorithm should be scalable, such that blobs of different sizes can also be

detected.

• Speed: The algorithm should be applicable to real-time processing,

• Few and semantically meaningful parameters for initialization: The algorithm’s param-

eter should be easy to understand for non experts and the deliveried results should be

predictable, when changing the algorithm’s parameters.

• Other than the important aspect of a blob detection algorithm is the capability of extract-

ing geometric and radiometric attributes to allow for a subsequent classification of blobs.

3.5.1 Basic steps of blob detection

The blob detection in real time is the first step for tracking application. Blob detection can be

divided into three major steps [12]:

1. Blob detection (segmentation).

30

Page 43: Badripatro Dissertation 09307903

2. Blob specification.

3. Blob trajectory.

Blob detection

The aim of blob detection is to determine the center point of the blobs in the current frame

encoded in XY-coordinates. In this project, a blob consists of white and light gray pixels while

the background pixels are black. The number of blobs in the video frames can vary, which

complicates the conditions for the detection approach. To simplify the problem, an upper bound

for the number of blobs to detect has been defined. Two simple constraints are sufficient to

decide if a pixel belongs to a blob:

1. Is the brightness value of the pixel above a specified threshold?

2. Is the pixel adjacent to pixels of a detected blob?

The threshold is represented as a natural number value. It can be configured by the user or

computed by averaging arbitrary attribute values of all pixels in the current frame. Usually the

color value or the brightness value is used for this estimation process. The averaging requires

the application of a frame buffer to allow multiple processing of the same frame or a continuous

adjustment of the threshold, while performing the blob detection with some initial threshold

value. To combine detected pixels to blobs, a test of pixel adjacency needs to be performed.

One way is to check every single pixel on the frame for its adjacency to pixels which are already

detected and assigned to a blob. For the adjacency, it is common to distinguish between a four

pixel neighborhood or an eight pixel neighborhood.

Blob specification

Blob detection step generates a binary image where white represents the foreground objects .

The main goal of blob specification step is to assign a label that identifies the different white

blobs, and to track them.

The parameters[12] taken into account during specification are:

• Multi-blob tracking: is used as it is needed to track more than one blob in our surveillance

applications.

31

Page 44: Badripatro Dissertation 09307903

• Distance: it represents the distance between the blobs detected, to have a good perfor-

mance of tracking smallest value possible of distance “1” is used. If the distance between

blobs is less than this value only one blob will be detected.

• Area of the blob : It represents the number of pixels included in blob.

• color: color feature of each blob.

• Width and length of blob.

Blob trajectory

After tracking, for each blob a trajectory consisting of the temporal sequence of the points

provided as input is generated. Trajectory of the blob provides the direction of the blob and size

of the blob.

3.5.2 Blob detection and identification method

Various approaches developed for blob detection can be grouped into following categories:

scale-space analysis, matched filters or template matching, watershed detection, sub-pixel pre-

cise blob detection, effective maxima line detection which are having both advantage and dis-

advantages. A new method for blob detection and identification is presented below.

The program operates based on the following assumptions:

1. The blobs do not overlap.

2. The rectangular enclosure for each blob does not intersect from each other.

The algorithm works based on a scanning search technique. The search starts from (0,0)

of the processed image frame or data and then proceeds from left to right, i.e., from column 0

to the last column of the image data, and then from top to right, i.e., next row and continue up

to the last row of the image. The image data is the output data of image segmentation.The steps

of the algorithm are as follows:

1. Read the image frame data, p . If the data p is zero, read another data until a nonzero data

is read.

2. If a nonzero data is obtained, then a blob is detected and the following decisions are made:

32

Page 45: Badripatro Dissertation 09307903

(a) If it is the first blob then calculate statistics of the blob i.e., centroid measured with

respect to, point (0,0), area of rectangle, actual area, and volume. Actual area is the

actual nonzero count of pixels in the rectangle, and the volume is the volume formed

by the membership function values of the blob.

(b) If it is not the first blob, the data point is checked whether or not it is part of an

existing blob’s rectangular enclosure (Note that only one rectangle will satisfy this

condition due to assumption (2)). If it is part of the rectangle, then move the search

pointer to the upper right corner of the rectangle plus one and continue the search

until a new blob is detected. If it is not part of existing rectangle then a new blob is

detected. Once new blob is detected, the blob is enclosed in a rectangle.

3. The process is repeated until all blobs are detected and enclosed in rectangles.

Each time a first nonzero point p in a blob is detected, the point is enclosed initially by a

rectangle of the size 2x2 ( or a 2x2 square). The point p is located at the upper-left corner of the

rectangle as shown in a figure 1a and the starting points left dimension count, right dimension

count and bottom-dimension count, respectively as shown in figure 1b.

The search for expansion is done within the boundaries of the image frame. The search is

conducted to the left, to the right and to the bottom of the square, in that order.

1. For search-left, the data points immediately to the left of the rectangle are checked. If a

nonzero pixel value is detected, the left-dimension of the rectangle is increased by one

(left expansion). If not the left-dimension stays the same.

2. For search-right, the data points immediately to the right of the rectangular are checked.

if non zero pixel value is detected, the right-dimension of the rectangle is increased by

one (right expansion). If not the right- dimension stays the same.

3. For search bottom, the data points immediately below the bottom are checked. If a

nonzero pixel value is detected, the bottom-dimension is increased by one. If not the

following steps are done. (a) Search for lower-left, and (b) search for lower-right. For

the lower-left search, if nonzero pixel is detected,the bottom-dimension is increased by

one , And if the left- dimension did not change in (1), then increase the left-dimension

by one. Else, the left-dimension remains the same. If the lower-left pixel is zero, then

the bottom-dimension remains unchanged. For the lower-right search, if a nonzero pixel

33

Page 46: Badripatro Dissertation 09307903

is detected, the bottom-dimension is increased by one. And if the right dimension did

not change in (2), then increase the right-dimension by one. Else, the right-dimension

remains the same. If the lower-right pixel is zero, then the bottom-dimension remains

unchanged.

4. The process is repeated until all the dimensions (left-dimension, right-dimension, and

bottom-dimension ) remain unchanged. This indicates that a blob is fully enclosed in a

rectangle. Note that the size of the rectangular enclosure stretches to the left, to the right,

or to the bottom as the search procedures discussed above are successively executed.

Thus, one can see that the size of the rectangle keeps on expanding like

3.6 Feature extraction for blobs

In this section, we describe the extracted features of identified blob. Figure 3.3 shows an exam-

ple of a blob.

1. Blob area: By counting the number of pixels included in blob i of the t-th frame, we

calculate the area of the object ai (t).

Figure 3.3: Feature extraction

2. Blob width and height: We extract the positions of the pixel Pxmax (or Pxmin ) which

has the maximum (or minimum) x-component:

Pxmax = (Xmaxx, Xmaxy), (3.4a)

34

Page 47: Badripatro Dissertation 09307903

Pxmin = (Xminx, Xminy) (3.4b)

where Xmax, x, Xmax, y, Xmin, x, and Xmin, y are the x and y-coordinates of the

rightmost and leftmost boundary of segment i, respectively. In addition, we also extract

Pymax = (Y maxx, Y maxy), (3.5a)

Pymin = (Y minx, Y miny) (3.5b)

Then we calculate the width w and the height h of the objects as follows

wi(t) = XmaxxXminx, (3.6a)

hi(t) = Y maxyY miny (3.6b)

3. Position: We define the positions of each object in the frame as follows

xi(t) =(Xmaxx +Xminx)

2, (3.7a)

yi(t) =(Y maxy + Y miny)

2(3.7b)

4. Color: Using the image data at Pxmax , Pxmin , Pymax and Pymin , we define the color

feature of each object for the Y (Limunious) component as

Y i(t) =[Y (Pxmax) + Y (Pxmin) + Y (Pymax) + Y (Pymin)]

4, (3.8)

as well as by equivalent equations for the U and V components.

3.7 Tracking: Centroid calculation

For the computation of the blob’s center-point different methods could be applied. Bounding

box measures the center-point of the blob by checking for the minimal and maximal XY coor-

dinates of the blob. If a new pixel is adjacent or in the range of the estimated XY coordinates

of the blob, it will be added to the blob and an adjustment of the min-/max-values is performed.

For the computation of the center-point the bounding box offers not enough information about

the blob to extract very precise results. The computation can be implement very efficiently and

will not cause big performance issues.

BLOBs_Xcenter_position = minX_position+maxX_position−minX_position

2(3.9a)

35

Page 48: Badripatro Dissertation 09307903

BLOBs_Y _center_position = minY _position+maxY _position−minY _position

2(3.9b)

Division by two can be realized by a bit shift and mathematic operations, such as addition and

subtraction, are not computationally expensive either. The center coordinates are strongly af-

fected by the pixels at the blob’s border. With a threshold based detection the pixels at the

border can show flickering. This effect becomes even stronger for blobs in motion. The move-

ment of the device by the user will cause motion blur on the blob’s shape. These effects will

increase the flickering of pixels at the blob’s border and will cause a flickering in the computed

center-point. For reducing these flickering effects another method for the computation of the

blob center point is the center-of-mass method. All pixels of the detected blob are taken into

account for the computation of the center point. The algorithm applies the coordinate values of

the detected pixels as weighted sum and calculates an averaged center coordinate.

BLOBs_X_center_position =

∑(X_position_of_all_BLOB_pixels)

number_of_all_BLOB_pixels(3.10a)

BLOBs_Y _center_position =

∑(Y _position_of_all_BLOB_pixels)

number_of_all_BLOB_pixels(3.10b)

To get an even higher precision the brightness values of the pixels could be applied as weights

as well, which increases the precision of the center-point of a blob.

center_position =pixel_positionpixel_brightness

pixel_brightness(3.11)

Since the available video material is converted into greyscale, the border of the blobs can

show a significant flickering, dependent on the threshold value and the color gradient. To avoid

a similar flickering in the computed center-position of the blob, the application of the proposed

averaging is recommended. This can be realized by applying a running summation of the values

during the detection phase. The division by the number of pixels can be done after all merging

procedures for the blobs has been executed.

36

Page 49: Badripatro Dissertation 09307903

Chapter 4

Video and Image Processing Algorithms

on TMS320DM6437

In this project we develop and demonstrate a framework for real-time object tracking using the

Texas Instrument’s (TI’s) TMS320DM6437 Davinci multi media processor. More specifically

we track single object and two object present in the scene captured by a camera that acts as the

video input device. The tracking happens in real-time consuming 30 frames per second(fps) and

is robust to background, illumination changes. The approach involves an off-line detection al-

gorithm which is a pre-processing step. As part of the real-time tracking the proposed approach

does background subtraction, image segmentation, blobs detection and identification, feature

extraction and calculation of center of mass for target tracking. Also Real-time implementation

of basic image processing algorithms such as image inversion, edge detection using Sobel op-

erator and Image compression were carried out on DaVinci digital media processor DM6437.

The input was captured using a CCD camera and output is displayed in LCD display. Blob

detection and identification was very slow due to high computational complexity due limiting

speed of processor ,coding style and the algorithm processes of multiple blob detection, iden-

tification and centroid calculation.Execution time for different blocks of single object tracking

were estimated using the profiler and acquricy of the detection is test by debugger provided by

TI code composer studio (CCS).

There are two approaches for video processing algorithms on TMS320DM6437 : the Real

time approach by using CCD camera and Display and second one is that we can perform this

by using read file from hard drive and process( ENC DEC) and also other video process it

and store that file into hard drive . Second case no need for configure capture and display

37

Page 50: Badripatro Dissertation 09307903

channel and checking PAL or NTSC etc. First case is used real time video preview and video

Encoder/Decoder where as second one is used in video recorder or video copy or video en-

coder/decoder to/from file.

4.1 Implementation of Single target tracking on TMS320DM6437

In this chapter, describes the steps to implement the single object target tracking algorithm on

an TMS320DM6437 hardware kit containing DaVinci DSP Processor. The tarcking algorithm

implemented in this project consists of steps like capturing a frame, copying the frame extracting

Y component, removing of noise, dilation, processing of image and region of interest extraction.

A given image consists of a luminance (Y) and a chrominance (C) component. The luminance

component mainly represents the gray scale of a pixel. This information is essential in further

processing using this algorithm. The chrominance component is not essential, as the algorithm

is independent of such information. The Y component is extracted from the frame and stored in

a two dimensional array. After this, the Y components are processed for noise removal.

Objective of this project is to track single object movement by using TMS320DM6437

EVM board. Tracking of single object consist of different steps Capturing input frame, Copy

frame, Frame subtraction, tracking. Capture the input frame buffer store frame into an array,

copy the array into another array for storing the foreground image than process the array for

frame subtraction. In frame subtraction function, ist delay the frame store into another array than

subtract the incoming frame from delayed frame than the resultant frame is now segmented by

using thersholding. After segmentation we proceed for feature extraction. In feature extraction

process we will find the centroid of the moving image then we will track the moving image with

the movement of the centroid as shown in fig. 5.6.

The working principle of Single target tracking module on TMS320DM6437 is given be-

low

• Step 0: Object of this project is to track single object movement by using TMS320DM6437.

• Step 1:Start CCSv3.3 using the desktop Shortcut icon and before to start ccs, make sure

that the DM6437 kit is connected to PC (via USB emulation cable) or external emulator,

38

Page 51: Badripatro Dissertation 09307903

Figure 4.1: Single object tracking

the input from CCD camera is connected to video input port of DM6437 board via com-

posite to BNC connector, output to the display device is connected to one of the three

output port of the kit via composite cable(input output connection can be done by using

s-video interface also) and powered on all the peripherals and board.

• Step 2:Load the video_preview.pjt project from the dvsdk example directory

• Step 3: In video_preview (void) function, first we declare

FVID_Frame *frameBuffTable[FRAME_BUFF_CNT];

FVID_Frame *frameBuffPtr;

FVID_Frame is structure which contain I(interlaced), P(progressive) frame buffer, lines,

bpp, pitch, colore format (Ycrcb422, RGB888, RGB565) etc. frameBuffTable is a array

of pointer of type FVID_Frame structure. The size of frameBuffTable is 6, that is 3

for capture frame and 3 for display frame. We can increase the input frame queue form

3 to any no or display frame queue from 3 to any no by increase the value of macro

FRAME_BUFF_CNT. frameBuffPtr is pointer to FVID_frame.

This file is present in “fvid.h”

• Step 4: GIO_Handle is a pointer to Gobal input output object:GIO_obj this is used to de-

clare pointer to mini driver channel object like (hGioVpfeCcdc, hGioVpbeVid0, hGioVp-

39

Page 52: Badripatro Dissertation 09307903

beVenc).

GIO_Handle hGioVpfeCcdc;

GIO_Handle hGioVpbeVid0;

GIO_Handle hGioVpbeVenc;

The params keyword lets you specify a method parameter that takes an argument where

the number of arguments is variable. The PARAM element is used to pass values to an

embedded OBJECT (usually an embedded program). PSP is tends for Platform Support

Package. This param are used to set for video display(VPBEOsdConfigParams)/capture(VPFECcdcConfigParams)

driver configuration params to defaults. Also used to set TVP5146 Video decoder config-

uration (VPFE_TVP5146_ConfigParams).

PSP_VPFE_TVP5146_ConfigParams

tvp5146Params = VID_PARAMS_TVP5146_DEFAULT;

PSP_VPFECcdcConfigParams

vpfeCcdcConfigParams = VID_PARAMS_CCDC_DEFAULT_D1;

• Step 5: This is used to create video input channel CCDC and create video output chan-

nel Vid0 or VENC. FVID_create() is allocates and initializes an GIO_Obj structure.

FVID_create() returns a non-NULL GIO_Handle object on success and NULL for failure.

/* create video input channel */

hGioVpfeCcdc = FVID_create("/VPFE0",IOM_INOUT,NULL,

&vpfeChannelParams,NULL);

/* create video output channel, plane 0 */

hGioVpbeVid0 = FVID_create("/VPBE0",IOM_INOUT,NULL,

&vpbeChannelParams,NULL);

• Step 6:FVID_allocBuffer() is used by the application to allocate a frame buffer using the

driver’s memory allocation routines.

40

Page 53: Badripatro Dissertation 09307903

/* allocate some frame buffers */

result = FVID_allocBuffer(hGioVpfeCcdc, &frameBuffTable[i]);

• Step 7:In this block of code video capture and video display channels are in a queue of

size 3 we can increse the queue size by increase value of the variable “FRAME_BUFF_CNT”

/* prime up the video capture channel */

FVID_queue(hGioVpfeCcdc, frameBuffTable[0]);

FVID_queue(hGioVpfeCcdc, frameBuffTable[1]);

FVID_queue(hGioVpfeCcdc, frameBuffTable[2]);

/* prime up the video display channel */

FVID_queue(hGioVpbeVid0, frameBuffTable[3]);

FVID_queue(hGioVpbeVid0, frameBuffTable[4]);

FVID_queue(hGioVpbeVid0, frameBuffTable[5]);

• Step 8:The code for single object tracking can be written inside these two function.

FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);

\*Write your code inside the thses two function */

FVID_exchange(hGioVpbeVid0, &frameBuffPtr);

FVID_exchange():Exchange one driver-managed buffer for another driver-managed buffer.

This operation is similar to an FVID_free()/alloc() pair but has less overhead since it in-

volves only one call into the driver.

FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);

hGioVpfeCcdc is a captured input frame GIO_handler and &frameBuffPtr is the address

of the frame buffer table.

• Step 9: Example of single object tracking

FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);

extract_uyvy ((frameBuffPtr->frame.frameBufferPtr));

copy_frame();

41

Page 54: Badripatro Dissertation 09307903

frame_subtract();

tracking();

write_uyvy ((frameBuffPtr->frame.frameBufferPtr));

/* display the video frame */

FVID_exchange(hGioVpbeVid0, &frameBuffPtr);

1. Function A: extract_uyvy ((frameBuffPtr->frame.frameBufferPtr)):Capture the

input frame In this function the Captured the input frame is stored in an array . The

captured frame is a standard definition NTSC type color image in YUV22 sampling

format given in fvid.h file. So we have to declare an array of size 480*720 for Y,

480*360 for U and V. The extracting logic is given in the function

I_u1[r][c] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 0);

I_y1[r][2*c] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 1);

I_v1[r][c] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 2);

I_y1[r][2*c+1] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 3);

extract_uyvy ((frameBuffPtr->frame.frameBufferPtr)) :

In this function,(frameBuffPtr->frame.frameBufferPtr) parameter is explained as

frame is a union, which contains y/c frame buffer(iFrm, pFrm) and row frame

buffer(riFrm, rpFrm) and row Frame buffer pointer (frameBufferPtr). frame.frameBufferPtr

: means accessing the member pointer frameBufferPtr of a unoin frame. frameBuffPtr-

>frame.frameBufferPtr: means a structure type pointer frameBuffPtr of type FVID_Frame

pointing to member frame union.

2. Function B:copy_frame(); Copy the original frame into another array for storing the

foreground image. And Copied ayyay(I_y, I_u, I_v) should be written into display

frame.

I_u[r][c] = I_u1[r][c];

42

Page 55: Badripatro Dissertation 09307903

I_y[r][2*c] = I_y1[r][2*c] ;

I_v[r][c] = I_v1[r][c] ;

I_y[r][2*c+1] = I_y1[r][2*c+1] ;

3. Function C:frame_subtract(); In frame subtraction function is consist of three parts

that is Delay frame, Subtract Frame, Image Segmentation Delay frame : delay the

frame by storing frame into another array in a concurrent loop call

I_u2[r][c] = I_u1[r][c];

I_y2[r][2*c] = I_y1[r][2*c] ;

I_v2[r][c] = I_v1[r][c] ;

I_y2[r][2*c+1] = I_y1[r][2*c+1] ;

Subtract Frame: Subtract the incoming frame from delayed frame store it into

another array.

I_u3[r][c] = I_u1[r][c] - I_u2[r][c] ;

I_y3[r][2*c] = I_y1[r][2*c] - I_y2[r][2*c];

I_v3[r][c] =I_v1[r][c] -I_v2[r][c] ;

I_y3[r][2*c+1] = I_y1[r][2*c+1]- I_y2[r][2*c+1];

Image Segmentation : Resultant subtracted frame is now segmented by using ther-

sholding. In yuv422 format, in case luminous the “0” gray value is equal to “16”,

“1” gray value is equal to “235”.

if((I_u3[m][n]<45 || I_u3[m][n]>200) && (I_y3[m][2*n]<45

||I_y3[m][2*n]>200) && (I_v3[m][n] <45 || I_v3[m][n]>200) &&

(I_y3[m][2*n+1] <45 || I_y3[m][2*n+1]>200){

I_u3[m][n] = 128 ;

I_y3[m][2*n] = 16 ;

I_v3[m][n] = 128 ;

I_y3[m][2*n+1] = 16;}

else{ I_u3[m][n] = 128 ;

I_y3[m][2*n] = 235;

43

Page 56: Badripatro Dissertation 09307903

I_v3[m][n] = 128 ;

I_y3[m][2*n+1] = 235;}

4. Function D:Tracking(): This function comprises of three parts that is feature_extraction,

tracking and creating rectangle.

Feature extraction: In this case, we will extract the feature of the moving object

that is position of the centroid of the moving object that is centroid_x, centroid_y.

{

cent_x= cent_x + m ;

cent_y= cent_y + n ;

cent_z= cent_z + 1 ;

}

centroid_x= (cent_x/cent_z);

centroid_y= (cent_y/cent_z);

Tracking Object: The movement of the object is nothing but the movemnet of

centroid of the object calculated.

Creating Rectangular: Once the position(x,y) of the centroid is found, than we can

create an rectangle around the the centroid of the object.

for(p =centroid_x-10 ; p < centroid_x+10; p++){

for(q = centroid_y-10; q < centroid_y+10; q++){

if(p== centroid_x-10 || p==centroid_x+9 || q ==

centroid_y-10 || q==centroid_y+9){

I_u[p][q] = 255;

I_y[p][2*q] = 255;

I_v[p][q] = 255;

I_y[p][2*q+1] = 255; }

else {

I_u[p][q] = I_u[p][q];

I_y[p][2*q] = I_y[p][2*q];

I_v[p][q] = I_v[p][q];

I_y[p][2*q+1] = I_y[p][2*q+1];}}}

44

Page 57: Badripatro Dissertation 09307903

5. Function E:write_uyvy ((frameBuffPtr->frame.frameBufferPtr)); In this func-

tion, First we copied the array into the current output foreground frame which is

going to be displayed. The current frame, which is going to be displayed is a stan-

dard definition NTSC type color image in YUV22 sampling format given in fvid.h

file. The display logic for frame is given in the function.

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 0)

= I_u[r][c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 1)

= I_y[r][2*c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 2)

= I_v[r][c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 3)

= I_y[r][2*c+1];

• Step 10: FVID_exchange for Display frame

FVID_exchange(hGioVpbeVid0, &frameBuffPtr);

hGioVpbeVid0 is a display output frame GIO_handler and &frameBuffPtr is the address

of the frame buffer table.

4.1.1 Debugging and profiling results

Profiling results

Profiling is used to measure code performance and make sure for efficient use of the DSP targets

resources during debug and development sessions. Profiling is used on different function of

single object tracking and the time taken to execute each function is measured by considering

its inclusive and exclusive cycle, access count and processor clock frequency.

Object tracking setup

1. Verify board jumper JP1 is set to the correct display format either NTSC or PAL as ac-

cording to DM6437 DVDP getting started guide. .

45

Page 58: Badripatro Dissertation 09307903

Table 4.1: Single object tracking profiler data

Function name Access count Incl Cycle Excl cycle Incl time taken(s)

write_uyvy_fun 7007 23402017 22402007 0.000005

copy_frame_fun 1 17944199 17944199 0.02563

frame_subtract_fun 1 38848159 38848159 0.05550

tracking_fun 1 157644657 94935656 0.22521

read_JP1_fun 1 632311 1804 0.00090

extract_uyvy_fun 9007 33402017 33402017 0.000005

main_fun 1 78070 300 0.00011

video_preview_fun 1 180255249 46289 0.25751

centriod_loop 8594 156441001 94241711 0.00026

Rectangle_loop 24 151250796 91165143 0.009003

frame_buffer_int_loop 6 5542 5542 0.000001

allocate_frame_buffers_loop 6 29750 11315 0.000007

while_loop_vid_capture_disp_all_fun 1 5655 6 0.000008

2. Verify board jumpers and switches are set as according to getting started guide so that the

boot mode is EMIF boot.

3. Connect a composite video cable from an NTSC video camera input device to the EVM

board’s Video in RCA jack J5 as shown in Figure 4.2.a.

4. Connect a composite video cable from a video display to the EVM board’s DAC D Video

Out RCA jack J4 as shown in Figure 4.2.b.

5. On board USB emulator cable to connect the EVM’s USB connector to a PC as shown

Figure 4.3.a. USB connection enables debugging via Code Composer Studio.

6. Plug in the video camera, video LCD display.

7. Connect the provided +5V power supply to an AC power source. Connect the provided

+5V power supply to the EVM board’s power connector as shown in Figure 4.3.b.

46

Page 59: Badripatro Dissertation 09307903

(a) (b)

Figure 4.2: Evm board setup: (a)Video input connection,and (b)Video output connection

(a) (b)

Figure 4.3: EVM board setup : (a)USB on board emulator ,and (b)Power connection

Debugging results

Debugging results of single object tracking is shown in the following Figure 4.4 and 4.5. In

Figure 4.4.a shows the DVDP EVM6437 board, Figure 4.4.b shows the complete set up for

object tracking, where CCD camera input is given to EVM board with the help of composite

connector. The output of the target is shown in LCD display with the help of composite cable

which is connected to output port of the EVM board. The original ball which is going to be

tracked is shown in the Figure 4.4.c.

In Figure 4.5 shows the debugging result of the single ball tracking. The result of the

background subtraction with out any filter is shown in the Figure 4.5.a. The results of the

background subtraction after filtering is shown in the Figure 4.5.b and Figure 4.5.c shows the

debugging result in code composer studio.

(a) (b) (c)

Figure 4.4: Debugging results : (a)TMS320DM6437 EVM board, (b)Target tracking setup with DM6437 board, and (c)Original ball for

tracking

47

Page 60: Badripatro Dissertation 09307903

(a) (b) (c)

Figure 4.5: Debugging results : (a)Results of background subtraction with out filtering, (b)Background subtraction with filtering, and

(c)Debugging output for I_y1_u1_v1

4.2 Implementation of multiple object tracking on DM6437

Objective of part of the project is to track two object movement by using TMS320DM6437

EVM board. Tracking of single object consist of different steps Capturing input frame, Copy

frame, Frame subtraction,Blob Detection and identification, feature extraction, tracking. Cap-

ture the input frame buffer store frame into an array, copy the array into another array for storing

the foreground image than process the array for frame subtraction. In frame subtraction func-

tion, frist delay the frame store into another array than subtract the incoming frame from delayed

frame than the resultant frame is now segmented by using thersholding. than detect and identify

the blobs present in the frame . Extract the feature of blobs present in the frame. Than track the

blobs by centroid calculation.

Figure 4.6: Multiple object tracking

• Step 1: The code for Multiple object tracking can be written inside these two function.

FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);

\*Write your code inside the thses two function */

48

Page 61: Badripatro Dissertation 09307903

FVID_exchange(hGioVpbeVid0, &frameBuffPtr);

FVID_exchange():Exchange one driver-managed buffer for another driver-managed buffer.

This operation is similar to an FVID_free()/alloc() pair but has less overhead since it in-

volves only one call into the driver.

FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);

hGioVpfeCcdc is a captured input frame GIO_handler and &frameBuffPtr is the address

of the frame buffer table.

• Step 2: Example of Multiple object tracking

FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);

extract_uyvy ((frameBuffPtr->frame.frameBufferPtr));

copy_frame();

frame_subtract();

write_uyvy ((frameBuffPtr->frame.frameBufferPtr));

FVID_exchange(hGioVpbeVid0, &frameBuffPtr);

1. Function A: extract_uyvy ((frameBuffPtr->frame.frameBufferPtr)):Capture the

input frame In this function the Captured the input frame is stored in an array . The

captured frame is a standard definition NTSC type color image in YUV22 sampling

format given in fvid.h file. So we have to declare an array of size 480*720 for Y,

480*360 for U and V. The extracting logic is given in the function

I_u1[r][c] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 0);

I_y1[r][2*c] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 1);

I_v1[r][c] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 2);

I_y1[r][2*c+1] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 3);

49

Page 62: Badripatro Dissertation 09307903

extract_uyvy ((frameBuffPtr->frame.frameBufferPtr))

In this function(frameBuffPtr->frame.frameBufferPtr) is explained as frame is a

union, which contains y/c frame buffer(iFrm, pFrm) and row frame buffer(riFrm,

rpFrm) and row Frame buffer pointer(frameBufferPtr). frame.frameBufferPtr : means

accessing the member pointer frameBufferPtr of a unoin frame. frameBuffPtr-

>frame.frameBufferPtr: means a structure type pointer frameBuffPtr of type FVID

_Frame pointing to member frame union.

2. Function B:copy_frame(); Copy the original frame into another array for storing the

foreground image. And Copied ayyay(I_y, I_u, I_v) should be written into display

frame.

I_u[r][c] = I_u1[r][c];

I_y[r][2*c] = I_y1[r][2*c] ;

I_v[r][c] = I_v1[r][c] ;

I_y[r][2*c+1] = I_y1[r][2*c+1] ;

3. Function C:frame_subtract();

In frame subtraction function is consist of different parts that is Delay frame, Sub-

tract Frame, Image Segmentation, blob detection and identification, feature extrac-

tion, Tracking.

Delay frame : delay the frame by storing frame into another array in a concurrent

loop call

I_u2[r][c] = I_u1[r][c];

I_y2[r][2*c] = I_y1[r][2*c] ;

I_v2[r][c] = I_v1[r][c] ;

I_y2[r][2*c+1] = I_y1[r][2*c+1] ;

Subtract Frame: Subtract the incoming frame from delayed frame store it into

another array.

I_u5[r][c] = I_u1[r][c] - I_u2[r][c] ;

50

Page 63: Badripatro Dissertation 09307903

I_y5[r][2*c] = I_y1[r][2*c] - I_y2[r][2*c];

I_v5[r][c] = I_v1[r][c] - I_v2[r][c] ;

I_y5[r][2*c+1] = I_y1[r][2*c+1]- I_y2[r][2*c+1];

Image Segmentation : Resultant subtracted frame is now segmented by using ther-

sholding.IN yuv422 format, in case luminious the “0” gray value is equal to “16”,

“1” gray value is equal to “235”.

if((I_u5[m][n]<45 || I_u5[m][n]>200) && (I_y5[m][2*n]<45

|| I_y5[m][2*n]>200) && (I_v5[m][n] <45 || I_v5[m][n]>200)

&& (I_y5[m][2*n+1] <45 || I_y5[m][2*n+1]>200)) {

I_u5[m][n] = 128 ;

I_y5[m][2*n] = 16 ;

I_v5[m][n] = 128 ;

I_y5[m][2*n+1] = 16;}

else { I_u5[m][n] = 128 ;

I_y5[m][2*n] = 235;

I_v5[m][n] = 128 ;

I_y5[m][2*n+1] = 235; }

Blob detection and Identification

(a) Read the image frame data, p . If the data,p, is zero, read another data until a

nonzero data is read.

(b) Each time a first nonzero point p in a blob is detected, the point is enclosed

initially by a rectangle of the size 2x2 ( or a 2x2 square). Expand the rectangle

by searching value present in the boundary of the rectangle, if non zero value is

found than increase the length(rhigh) or width(chigh) of the rectangle.

if(chigh!=LAST_COL && rlow<LAST_ROW ){//along horz

for (rtemp =rlow;rtemp<=rhigh;rtemp++){

if (I_y3[rtemp][chigh+1]>16){

chigh=chigh+1;

flag=1;

51

Page 64: Badripatro Dissertation 09307903

break;

if(rhigh!=LAST_ROW && clow<LAST_COL ){//along vert

for (ctemp =clow;ctemp<=chigh;ctemp++){

if(I_y3[rhigh+1][ctemp]>16){

rhigh=rhigh+1;

flag=1;

break;

feature Extraction:If it is the first blob then Calculate statistics of the blob

that is - centroid measured with respect to, point (0,0), area of rectangle, actual

area(count), and volume. Actual area is the actual nonzero count of pixels in

the rectangle, and the volume is the volume formed by the membership function

values of the blob. this gives the rectangle area.

count=0;

LL=rhigh-rlow+1;

LH=chigh-clow+1;

t_area=LL*LH;

This gives the actual area of the blob

for (ix = rlow;ix<=rhigh;ix++)

for (jx = clow;jx<=chigh;jx++)

if(I_y3[ix][jx]>16)

count=count +1;

Also we will find maximum length and width of the blob.

Store the blob dimention into an array. that ia clow, rlow, chigh, rhigh.

if(count>(t_area/2)&& (t_area>100) ){//% for selecting blob

iblob=iblob+1;

arr[k]=rlow;

arr[k+1]=clow;

arr[k+2]=rhigh;

arr[k+3]=chigh;

k=k+4;

52

Page 65: Badripatro Dissertation 09307903

4. Function D:tracking(): This function is comprise of three part that is Feature ex-

traction, tracking and creating rectangule.

Feature extraction: In this case, we will extract ht feature of the moving object that

is position of the centroid of the moving object that is centroid_x,centroid_y.

for( a=1;a<=4*iblob;a=a+4){

for (m=arr[a];m<=arr[a+2];m++){

for (n=arr[a+1];n<=arr[a+3];n++){

if((I_y3[m][n] <45 || I_y3[m][n]>200)){

I_y3[m][n] = 16 ;}{

cent_x= cent_x + m ;

cent_y= cent_y + n ;

cent_z= cent_z + 1 ;}

centroid_x[a]= (cent_x/cent_z);

centroid_y[a]=(cent_y/cent_z);

Tracking: The movement of the object is nothing but the movemnet of centroid of

the object calculted. Creating Rectangular: Once the position(x,y) of the centroid

is found, than we can create an rectangle around the the centroid of the object.

for(l=1;l<=4*iblob;l=l+4){

for(p =centroid_x[l]-10; p < centroid_x[l]+10; p++){

for(q = centroid_y[l]-10; q < centroid_y[l]+10; q++){

if(p== centroid_x[l]-10 || p==centroid_x[l]+9 ||

q ==centroid_y[l]-10 || q==centroid_y[l]+9){

I_y[p][q] = 235;}

else {I_y[p][q] =I_y[p][q];}

5. Function E:write_uyvy ((frameBuffPtr->frame.frameBufferPtr)); In this func-

tion the display the output foreground is which is copied in an array . The display

frame is a standard definition NTSC type color image in YUV22 sampling format

given in fvid.h file. The Writing frame into display logic is given in the function.

53

Page 66: Badripatro Dissertation 09307903

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 0)

= I_u[r][c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 1)

= I_y[r][2*c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 2)

= I_v[r][c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 3)

• Step 3: FVID_exchange for Display frame

FVID_exchange(hGioVpbeVid0, &frameBuffPtr);

hGioVpbeVid0 is a display output frame GIO_handler and &frameBuffPtr is the address

of the frame buffer table.

4.2.1 Debugging and profiling results

Profiling results

Profiling is used to measure code performance and make sure for efficient use of the DSP targets

resources during debug and development sessions. Profiling is used on different function of

Multiple object tracking and the time taken to execute each function is measured by considering

its inclusive and exclusive cycle, access count and processor clock frequency.

Debugging results

Debugging results of multiple object tracking is shown in the following Figures 4.7, 4.8 and

4.9. Two ball tracking is show Figure 4.7.a .4.7.b shows two object tracking and Result of

the background subtraction with filtering is shown in figure 4.7.c. In Figure 4.8 shows the

debugging result of two object tracking , in which Figure 4.8.a. shows debugging results of two

object tracking, in Figure 4.8.b shows the result of the background subtraction of two object

with filtering and debugging results of two object tracking in code composer studio. is shown

in Figure 4.8.c. and the result of three object tracking is shown in Figure 4.9.

54

Page 67: Badripatro Dissertation 09307903

Table 4.2: Multiple object tracking profiler data

Function name Access count Incl Cycle Excl cycle Incl time taken(s)

write_uyvy_fun 7007 23402017 22402007 0.000005

copy_frame_fun 1 17944199 17944199 0.02563

frame_subtract_fun 1 988491609 98848159 0.141212

read_JP1_fun 1 632311 1804 0.00090

extract_uyvy_fun 9007 33402017 33402017 0.000005

main_fun 1 78070 300 0.00011

video_preview_fun 1 3802552121 46289 1.114650

centriod_loop 200 656441001 94241711 0.061174

Rectangle_loop 24 651240712 91165143 0.038765

frame_buffer_int_loop 6 5542 5542 0.000001

allocate_frame_buffers_loop 6 29750 11315 0.000007

while_loop_vid_capture_disp_all_fun 1 5655 6 0.000008

(a) (b) (c)

Figure 4.7: Debugging results : (a)Two ball tracking, (b)Two object tracking, and (c)Results of background subtraction with filtering

(a) (b) (c)

Figure 4.8: Debugging results : (a)Two object targeting results, (b)Results of background subtraction of 2 object with filtering, and

(c)Debugging result of 2 object tracking

55

Page 68: Badripatro Dissertation 09307903

Figure 4.9: Debugging results of Three target tracking

56

Page 69: Badripatro Dissertation 09307903

4.3 Implementation object tracking algorithm in matlab

In the proposed object tracking algorithm, number of features are extracted for all segmented

objects. Then pattern matching with the objects of the previous frame is carried out. A high level

flow chart of the proposed algorithm is shown in Figure 5.13. This algorithm is implemented in

Matlab. The detailed processing consists of the following steps are shown in fig.5.14

Figure 4.10: Different steps for object tracking using segmentation and pattern matching

• Step 1: With the image segmentation algorithm, we extract all objects in the input image.

• Step 2: Then we extract coordinates of four object-pixels which are indicated in Figure.

Pxmax and Pxmin have the maximum and minimum x-component, while Pymax and

Pymin have the maximum and minimum y component, respectively.

• Step 3: Next we calculate characteristic features of the object to be tracked in the video

based on image segment and pattern matching. The segmented object, that is, object po-

sition (x, y), object size (width, height), color information (R, G, B), and object area,

respectively. Object position (x, y), width w and height h are calculated according to

below equations.

w = Xxmax - Xxmin, h= Yymax - Yymin,

x = (Xxmax + Xxmin )/2, y= (Yymax + Yymin)/2

The object area is determined by counting the number of its constituting pixels. As object

color information, average RGB data of the 4 pixels, Pxmax, Pxmin, Pymax and Pymin,

are used. Feature extraction calculation is as shown in figure 5.15.

57

Page 70: Badripatro Dissertation 09307903

Figure 4.11: Flow chat of Object tracking based on segmentation & pattern matching.

• Step 4: The minimum distance search in the feature space is performed between each

object in the current frame and all objects in the preceding frame. Then we identify each

object in the current frame with the object in the preceding frame which has the minimum

distance or in other words which is the most similar object.

• Step 5: Next, we calculate the motion vector (mx(t- 1), my(t- 1)) from the difference in

position between the object in the current frame and matching object in the preceding

frame. By adding the motion vector (mx(t-1), my(t-1)) to the current position (x(t-1),

y(t-1)) of the object, we determine an estimate for the object?s position (x’(t), y’(t)) in the

next frame. This estimate position is exploited instead of the extracted position (x(t - 1),

y(t - 1)) for pattern matching after a start-up phase from the third frame onwards.

58

Page 71: Badripatro Dissertation 09307903

(a) (b)

Figure 4.12: Different templates : (a) Featured extraction, (b) Estimation of positions in the next frame

• Step 6: By carrying out this matching procedure with all segments obtained for the cur-

rent frame, we can identify all objects one by one and can maintain tracking of all objects

between frames.

Results A video frame is captured and image thresholding is carried out according algo-

rithm present in paper [1]. Region growing technique is applied to obtain the different

regions in image. Initially object is marked as a point and then it was allowed to grow to

form a segment. This segment is searched in the next frame for tracking the target. Fig-

ure 5.16.(a) shows the y component of raw video frame. Figure 5.16.(b) shows the image

after threshold. Figure 5.16.(c) shows the selected portion for tracking. The process of

region growing is shown in the figure 5.17. (a) and the image segment is shown in figure

5.17. (b). Tracking of the segment is shown in figure 5.17. (c).

(a) (b) (c)

Figure 4.13: Different templates : (a)y color image, (b)thresholding, and (c)select region become point

59

Page 72: Badripatro Dissertation 09307903

(a) (b) (c)

Figure 4.14: Different templates : (a) Region growing, (b) segmented region, and (c) track the segment

Summary Simulation results for single object tracking and multiple object tracking is

verified on Davinci proecessor and matlab using suitable algorithm. A video frame is captured

and then background subtraction followed by image segmentation,which is carried out by us-

ing thresholding was carried out according to algorithm present in paper [1].The segmentation

image is processed for blob detection and identification and follows by feature extraction and

tracking is carried out according to algorithm present in paper [1]. A point based image seg-

mentation was used and a pattern matching was carried out using Manhatten distance to track

human beings in a video.

60

Page 73: Badripatro Dissertation 09307903

Chapter 5

Summary

In this dissertation, various components of the target tracking has been discussed in detail. Real-

time object tracking algorithm was implemented on TMS320DM6437 with input form CCD

camera and output to LCD display. Profiling of different function of single object tracking,

multiple object tracking were carried out with the help of CCS studio. DaVinci processor helps

to reuse API, using video and image processing library and optimize the coding efficiency.

An object tracking in video based on image segmentation and pattern matching algorithm was

implemented in Matlab for comparision. This algorithm was simulated in Matlab and results of

the simulation were verified for segmentation and tracking of an person.

This dissertation has been primarily focused on implementation of Video and image pro-

cessing algorithms on Davinci processor. The aim was to implement the object tracking al-

gorithm on TMS320DM6437 processor. For object tracking model back ground subtraction,

image segmentation, blob detection & identification and tracking were implemented. After suc-

cessful implementation of the algorithms, tracking is carried out with the help of center of mass

calculation and movement. In case of multiple object movement, inaccuracy is observed in the

algorithm. In order to avoid this problem blob detection and identification is implemented. A

GMM is implementation in the processor was very slow and memory consuming. Then an novel

idea for blob detection and identification is carried out. In this algorithm, first back ground sub-

traction is implemented then image segmentation followed by blob detection and identification

followed by feature extraction and then tracking is done by using center of mass calculation.

Profiling of different functions of single object tracking was carried out with the help of CCS

studio. Based on profiling results tracking which is implemented in processor is much much

faster than Matlab.

61

Page 74: Badripatro Dissertation 09307903

The proposed approach can be improved by using a better background subtraction model,

using good image segmentation algorithm like region growing, template matching. Further

improvement can be obtained by using Kalman filter or particle filter.

62

Page 75: Badripatro Dissertation 09307903

Chapter 6

APPENDIX

6.1 APPENDIX A:Real-Time Video Processing using Matlab

simulink interface with ccs studio3.3 in DM6437 DVDP

6.1.1 Introduction

This chapter describes real time video processing of image inversion, edge detection, median

filtering using matlab simulink interface with ccs studio3.3 in DM6437 DVDP.

6.1.2 Hardware Requirements

• Texas Instruments DM6437 Digital Video Development Platform (DVDP)

• PC

• Video camera

• LCD Display

6.1.3 Software Requirements:

Mathworks Products

• MATLAB R2008a

• Simulink

63

Page 76: Badripatro Dissertation 09307903

• Image and video Processing Toolbox

• Signal Processing Toolbox

• Signal Processing Blockset for Simulink

• Real Time Workshop (w/o Embedder Coder)

• Link for Code Composer Studio

• Embedded Target for TI C6000 DSP.

Texas Instruments Products

- Code Composer Studio(CCS) v3.3

Hardware Setup

1. Connect the EVM6437 eval board to the PC using the USB cable.

2. Connect power to the board.

3. Don’t press any buttons on the board.

4. Ensure that all of the software products are installed.

Start by creating a new model in SimulinkÂo.

The procedure for capture/display video using the DM6437 is shown in Figure 5.28

Figure 6.1: DM6437 board

1. Open the Simulink library browser as shown in figure 5.29

2. In the new window, add the "Video Capture" and "Video Display" from the "DM6437

EVM Board Support" group of the "Target Support Package TC6" Blockset as shown in

Figure 5.30.

64

Page 77: Badripatro Dissertation 09307903

Figure 6.2: Open simulink lib browser 1

Figure 6.3: Video capture

3. Double-click the “Video Capture” block and change the Sample Time (and the Video

capture mode only if you are using the components in the PAL/NTSC mode) as shown in

Figure 5.32.

4. Double-click the “Video Display” block and change the Video capture mode only if you

are using the components in the PAL2 mode as shown in Figure.

5. Save the model as “Video_preview.mdl”as shown in figure 5.34,

6. Add the "Complement block" from the "Sources" group of the Video and Image Process-

ing Blockset as shown in figure 5.35.

7. Connect the blocks as shown in Figure 5.34.

8. Select the target from C6000 lib that is EVM6437 as shown in figure 5.36.

9. Different simulink(.mdl)file are shown in figure 5.37 and 5.38, these are “Video_complement.mdl”

and “Video_edge_sobal.mdl”,

10. Generate code & create project. Double-click the " Generate code &.." block.

65

Page 78: Badripatro Dissertation 09307903

Figure 6.4: Add video display 1

Figure 6.5: Video capture conf

11. Build the project. Double-click the “Build Project” block.

12. Load the project. Double-click the “Load Project” block.

13. Run the target. Double-click the “Run” block.

6.1.4 Configuration Parameters for C6000 Hardware

1. Launch Matlab

2. At the Matlab command line, type simulinkto launch Simulink

3. Create a new model in Simulink.

4. To open the Configuration Parameters, select Simulation Configuration Parameters.

as shown in figure 5.41.

5. In the Select tree, chose the Real-Time Workshop category.

66

Page 79: Badripatro Dissertation 09307903

Figure 6.6: Video display conf

Figure 6.7: Video preview 1

6. For Target Selection, choose the file ti c6000.tlc. Real-Time Workshop will automati-

cally change the Make command and Template makefile selections as shown in figure

5.39.

7. Choose the Optimization category in the Select tree. For Simulation and Code gener-

ation, unselect Block reduction optimization and Implement logic signals as shown in

figure 5.40.

8. Choose the TI C6000 target sel. Set Code generation target type to DM6437 DVDP.

9. Choose the TI C6000 compiler. Set Symbolic debugging.

10. In the Select tree, choose the Debug category. Select Verbose build here.

11. In the Select tree, choose the Solver category. Ensure that Solver is set to Fixed type /

67

Page 80: Badripatro Dissertation 09307903

Figure 6.8: Video image toolbox

Figure 6.9: Target selection

discrete.

68

Page 81: Badripatro Dissertation 09307903

Figure 6.10: Video complement

Figure 6.11: Video sobal edge detection 1

Figure 6.12: Simulink conf 1

69

Page 82: Badripatro Dissertation 09307903

Figure 6.13: Simulink conf 2

Figure 6.14: Simulink conf 3

70

Page 83: Badripatro Dissertation 09307903

6.2 APPENDIX B:Edge Detection using Video and Image li-

brary

The Texas Instruments C64x+ IMGLIB is an optimized Image/Video Processing Functions Li-

brary for C programmers using TMS320C64x+ devices. It includes many C-callable, assembly-

optimized, general-purpose image/video processing routines. These routines are used in real-

time applications where ptimal execution speed is critical. Using these routines assures exe-

cution speeds considerably faster than equivalent code written in standard ANSI C language.

In addition, by providing ready-to-use DSP functions, TI IMGLIB can significantly shorten

image/video processing application development time.

IN case of Code Composer Studio, IMGLIB can be added by selecting Add Files to Project

from the Project menu, and choosing imglib2.l64P from the list of libraries under the c64plus

than in imglib_v2xx folder. Also, ensure that it have linked to the correct run time support

library (rts64plus.lib). An alternate to include the above two libraries in your project is to add the

following lines in your linker command file: -lrts64plus.lib -limglib2.l64P The include directory

contains the header files necessary to be included in the C code when you call an IMGLIB2

function from C code, and should be added to the "include path" in CCS build options. The

Image and Video processing Library (IMGLIB)[] is which is having 70 building block kernels

that can be used for image and video processing applications. IMGLIB includes:

• Compression and Decompression : DCT, motion estimation, quantization, wavelet Pro-

cessing

• Image Analysis: Boundary and perimeter estimation, morphological operations, edge

detection, image histogram, image thresholding

• Image Filtering & Format Conversion: image convolution, image Correlation, median

filtering, color space conversion

VLIB is software library having more than 40 kernels from TI accelerates video analytics

development and increases performance up to 10 times. This 40+ kernels provide the ability to

perform:

• Background Modeling & Subtraction

• Object Feature Extraction

71

Page 84: Badripatro Dissertation 09307903

• Tracking & Recognition

• Low-level Pixel Processing

Edge detection using video and image library as shown in figure

Figure 6.15: Video and image library

Step 1: open Video preview project;video_preview.pjt. Step 2: Add these two library for

sobal and median filter function.

#include <C:\dvsdk_1_01_00_15\include\IMG_sobel_3x3_8.h>

#include <C:\dvsdk_1_01_00_15\include\IMG_median_3x3_8.h>

Step 3: Add these two caller function following parameter. frameBuffPtr is a array of structue

“frame”, in order to access access the frame buffer pointer use “frame.frameBufferPtr”. and

480, 1440 are the length and width of the frame.

FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);

IMG_sobel_3x3_8((frameBuffPtr->frame.frameBufferPtr),

(frameBuffPtr->frame.frameBufferPtr),480,1440);

IMG_median_3x3_8((frameBuffPtr->frame.frameBufferPtr),

8,(frameBuffPtr->frame.frameBufferPtr));

FVID_exchange(hGioVpbeVid0, &frameBuffPtr);

6.3 APPENDIX C: Video Processing Tokens

6.3.1 Video standard ( NTSC & PAL)

NTSC ( National Television System Committee), which consists of 29.97 interlaced frames of

video per second. Each frame consists of a total of 525 scanlines, of which 486 make up the

visible raster. The remainder (the vertical blanking interval) are used for synchronization and

vertical retrace.

72

Page 85: Badripatro Dissertation 09307903

PAL( Phase Alternating Line), is an analogue television colour encoding system used in

broadcast television systems in many countries. Analogue television further describe frame

rates, image resolution and audio modulation. For discussion of the 625-line / 50 field (25

frame) per second television standard, see 576i.The term PAL is often used informally to refer

to a 625-line/50 Hz (576i) television system, and to differentiate from a 525-line/60 Hz (480i)

NTSC system.

PAL specifies 786 pixels per line, 625 lines per screen, 25 frames per second, and a primary

power of 220 volts PAL delivers 625 lines at 50 half-frames per second while NTSC delivers

525 lines of resolution at 60 half-frames per second.

Difference between NTSC and PAL

NTSC is the video system or standard used in North America and most of South America. In

NTSC, 30 frames are transmitted each second. Each frame is made up of 525 individual scan

lines.PAL is the predominant video system or standard mostly used overseas. In PAL, 25 frames

are transmitted each second. Each frame is made up of 625 individual scan lines.

720 x 576 = 414, 720 for 4:3 aspect ratio PAL.

720 x 480 = 345, 600 for 4:3 aspect ratio NTSC

Frame and Field Rates

In video:PAL is higher in resolution (576 horizontal lines) than NTSC (480 horizontal lines),

but NTSC updates the on-screen image more frequently than PAL (30 times per second versus

25 times per second) NTSC video is lower in resolution than PAL video, but because the screen

updates more frequently, motion is rendered better in NTSC video than it is in PAL video.

There is less jerkiness visible in NTSC. When video source material is transferred to DVD, it is

usually transferred in the format it was created in - PAL or NTSC, and the subsequent image has

either higher temporal resolution (more frames per second - NTSC) or higher spatial resolution

(more lines per image - PAL)

Movies Movies the world over are shown at a frame rate of 24 frames per second. That is,

24 images are projected onto the cinema screen every second. Issue Of Resolution is PAL

DVDs have a compelling advantage over NTSC DVDs. PAL DVDs have 576 pixels of vertical

resolution versus 480 pixels of vertical resolution. That is a 20 percent increase in resolution

for a PAL DVD as compared to an NTSC DVD. Increased resolution translates into a better

73

Page 86: Badripatro Dissertation 09307903

looking image.

6.3.2 Video timing(Interlaced Vs Progressive)

Progressive Displays “Paint” the Lines of An Image Consecutively, One After Another Inter-

laced Displays “Paint” First One-Half of the Image (Odd Lines), Then the Other Half (Even

Lines). A CRT, each image is displayed starting at the top left corner of the display, moving to

the right edge of the display. Then scanning then moves down one line, and repeats scanning

left-to-right This process is repeated until the entire screen is refreshed.

Interlacing was used to reduce the amount of information sent for each image. By transferring

the odd-numbered lines, followed by the even-numbered lines .The amount of information sent

for each image was halved.

A progressive display has no limit on the line-to-line changes, so is capable of providing a

higher resolution image (vertically) without flicker. LCD.Ex plasma and computer displays are

progressive

6.3.3 Video Resolution(HD, ED, SD)

Video resolution is one of those “fuzzy" things in life. It is common to see video resolutions

of 720 x 480 or 1920 x 1080. However, those are just the number of horizontal samples and

vertical scan lines & do not necessarily convey the amount of useful information.

Ex: Analog video signal can be sampled at 13.5 MHz to generate 720 samples per line.

Sampling the same signal at 27 MHz would generate 1440 samples per line. However, only the

number of samples per line has changed, not the resolution of the content. Therefore, video is

usually measured using “lines of resolution". In essence, how many distinct black and white

vertical lines can be seen across the display. This number is then normalized to a 1:1 display

aspect ratio (dividing the number by 3/4 for a 4:3 display, or by 9/16 for a 16:9 display).Aspect

ratio : is the ratio of picture width to its height . Ex 4:3 aspect ratio & 16:9 aspect ratio.

Standard Definition(SD): is usually defined as having 480 or 576 interlaced active scan

lines, and is commonly called "480i" & "576i" respectively. For a fixed-pixel (non-CRT) con-

sumer display with a 4:3 aspect ratio, this translates into an resolution of 720 x 480i or 720 x

576i. For a 16:9 aspect ratio, this translates into an active resolution of 960 x 480i or 960 x

576i.

74

Page 87: Badripatro Dissertation 09307903

Enhanced Definition(ED): Enhanced definition video is usually defined as having 480

or 576 progressive active scan lines, and is commonly called "480p" and "576p" respectively.

Difference between SD and ED is that SD is interlaced, while ED is progressive.

High Definition(HD):is usually defined as having 720 progressive (720p) or 1080 inter-

laced (1080i) active scan lines. CRT-based HDTV’s with a 4:3 aspect ratio and LCD/ plasma

16:9 aspect ratio displays with resolutions of 1024 x 1024p, 1280 x 768p, 1024 x 768p. fixed-

pixel (non-CRT) consumer display with a 16:9 aspect ratio, this translates into an active resolu-

tion of 1280 x 720p or 1920 x 1080i, respectively.

6.3.4 Video file format(YUV420, YCbCr)

The three most popular color models are RGB(used in computer graphics), YIQ, YUV, or

YCbCr(used in video systems) and CMYK (used in color printing). The YUV color space

is used by the PAL, NTSC and SECAM(Sequentiel Couleur Avec M moire or Sequential Color

with Memory) composite color video standards. The black-and-white system used only luma

(Y) information; color information (U and V) was added in such a way that a black-and-white

receiver would still display a normal black-and-white picture.For digital RGB values with a

range of 0:255, Y has a range of 0:255, U a range of 0 to +/-112, and V a range of 0 to +/-157.

YCbCr, or its close relatives, Y’UV, YUV, Y’CbCr, and Y’PbPr are designed to be efficient

at encoding RGB values so they consume less space while retaining the full perceptual value.

YCbCr is a scaled and offset version of YUV color space. Y is defined to have a nominal

8-bit range of 16-235; Cb and Cr are defined to have a nominal range of 16-240. There are

several YCbCr sampling formats, such as 4:4:4, 4:2:2, 4:1:1, and 4:2:0 that are also described

and shown in figure 3.1.

• 4:4:4 YCbCr sampling Format: Each sample has a Y, a Cb and a Cr value. Each sample

is typically 8 bits (consumer applications) or 10 bits (pro-video applications) per compo-

nent. Each sample therefore requires 24 bits (or 30 bits for pro-video applications).

• 4:2:2 YCbCr Format: For every two horizontal Y samples, there is one Cb and Cr sample.

Each sample is typically 8 bits (consumer applications) or 10 bits (pro-video applications)

per component. Each sample therefore requires 16 bits (or 20 bits for pro video applica-

tions), usually formatted.To display 4:2:2 YCbCr data, it is first converted to 4:4:4 YCbCr

data, using interpolation to generate the missing Cb and Cr samples.

75

Page 88: Badripatro Dissertation 09307903

• 4:2:0 YCbCr Format:Rather than the horizontal-only 2:1 reduction of Cb and Cr used

by 4:2:2, 4:2:0 YCbCr implements a 2:1 reduction of Cb and Cr in both the vertical and

horizontal directions. It is commonly used for video compression.

Figure 6.16: 4:4:4 YCbCr, 4:2:2 YCbCr, 4:2:0 YCbCr color sampling format respectable

Also YUV12, (YCbCr 4:1:1)used in some consumer video and DV video compression

applications.RGB - YCbCr conversion equations: are differ for SDTV & HDTV. Gamma-

corrected RGB is notated as R’G’B’. Other color format are YIQ, YDbDr, YpbPr, xxYCC,

HVS and HIS.

6.3.5 Video IO Interface(Composite, component, S-Video)

Composite-Video

Composite video is the format of an analog television (picture only) signal before it is combined

with a sound signal and modulated onto an RF carrier. In contrast to Component(YPrPb) it

contains all required video information, including colors in a single line-level signal. Like

component video, composite video cables do not carry audio and are often paired with audio

cables.

A video stream is composed of a Y signal for luminescence or black and white values and

a C signal for chrominance or color. The Y signal provides brightness and contrast, allowing

for deep rich blacks and startling bright whites. The quality of this signal is especially evident

76

Page 89: Badripatro Dissertation 09307903

in low-lit scenes where a degraded signal will translate to “faded" blacks and muted whites,

making it difficult to differentiate scenery or action. The color signal RGB for red, green and

blue carries the information needed to create changing hues. Composite video is so named

because the Y/C signals are compressed and channeled through a single wire to be separated

by a “comb filter" inside the television set. i.e The color video signal is a linear combination

of the luminance of the picture, and a modulated subcarrier carries the chrominance or color

information, a combination of hue and saturation.

S-Video

The RCA phono connector or BNC connector (pro-video market)transfers a composite NTSC

or PAL video signal, made by adding the intensity (Y) and color (C) video signals together.

The television then has to separate these Y and C video signals in order to display the picture.

The problem is that the Y/C separation process at decoder side is never perfect. Many video

components now support a 4-pin “S1” S-Video connector. This connector keeps the intensity

(Y) and color (C) video signals separate, eliminating the Y/C separation process in the TV. As

a result, the picture is sharper and has less noise

Separate Video more commonly known as S-Video and Y/C, is often referred to by JVC(who

introduced the DIN-connector pictured) as both an S-VHS connector and as Super Video. It is

an analog video transmission scheme, in which video information is encoded on two channels:

luma (luminance, intensity, gray, “Y") and chroma (colour, “C").

More recently, S-Video has been superseded by component video, which isolates not only

the Y signal on its own cable, but the red and blue signals as well, while green values are inferred

from reading the other data streams. Component video requires three cables plus audio cables,

for a total of five cables. The latest enhancement in audiovisual interfaces is High-Definition

Multimedia Interface (HDMI), a true digital interface that combines video and audio into a

single cable while preserving perfect integrity. This all-digital standard is the most desirable

interface currently available.

77

Page 90: Badripatro Dissertation 09307903

6.4 APPENDIX D:YUV to RGB Conversion

6.4.1 YUV format

Digital video is often encoded in a YUV format. An RGB color is encoded using three values:

red, green, and blue. Although RGB is a common way to represent colors, other coordinate

systems are possible. The term YUV refers to a family of color spaces, all of which encode

brightness information separately from color information. Like RGB, YUV uses three values

to represent any color. These values are termed Y’, U, and V. (In fact, this use of the term

"YUV" is technically inaccurate. In computer video, the term YUV almost always refers to one

particular color space named Y’CbCr, discussed in chapter 2. However, YUV is often used as a

general term for any color space that works along the same principles as Y’CbCr.)

The Y’ component, also called luma, represents the brightness value of the color. The

prime symbol (’) is used to differentiate luma from a closely related value, luminance, which

is designated Y. Luminance is derived from linear RGB values, whereas luma is derived from

non-linear (gamma-corrected) RGB values. Luminance is a closer measure of true brightness

but luma is more practical to use for technical reasons. The prime symbol is frequently omitted,

but YUV color spaces always use luma, not luminance.

Y ′ = 0.299R + 0.587G+ 0.114B (6.1)

This formula reflects the fact that the human eye is more sensitive to certain wavelengths of

light than others, which affects the perceived brightness of a color. Blue light appears dimmest,

green appears brightest, and red is somewhere in between. This formula also reflects the phys-

ical characteristics of the phosphors used in early televisions. A newer formula, taking into

account modern television technology, is used for high-definition television:

Y ′ = 0.2125R + 0.7154G+ 0.0721B (6.2a)

The luma equation for standard-definition television is defined in a specification named ITU-R

BT.601. For high-definition television, the relevant specification is ITU-R BT.709. The U and

V components, also called chroma values or color difference values, are derived by subtracting

the Y value from the red and blue components of the original RGB color:

U = B − Y ′ (6.2b)

V = R− Y ′ (6.2c)

78

Page 91: Badripatro Dissertation 09307903

Benefits of YUV

Analog television uses YUV partly for historical reasons. Analog color television signals were

designed to be backward compatible with black-and-white televisions. The color television

signal carries the chroma information (U and V) superimposed onto the luma signal. Black-

and-white televisions ignore the chroma and display the combined signal as a grayscale image.

(The signal is designed so that the chroma does not significantly interfere with the luma signal.)

Color televisions can extract the chroma and convert the signal back to RGB.

YUV has another advantage that is more relevant. The human eye is less sensitive to

changes in hue than changes in brightness. As a result, an image can have less chroma informa-

tion than luma information without sacrificing the perceived quality of the image. For example,

it is common to sample the chroma values at half the horizontal resolution of the luma samples.

In other words, for every two luma samples in a row of pixels, there is one U sample and one

V sample. Assuming that 8 bits are used to encode each value, a total of 4 bytes are needed for

every two pixels (two Y’, one U, and one V), for an average of 16 bits per pixel, or 30% less

than the equivalent 24-bit RGB encoding.

YUV is not inherently any more compact than RGB. Unless the chroma is downsampled,

a YUV pixel is the same size as an RGB pixel. Also, the conversion from RGB to YUV is not

lossy. If there is no downsampling, a YUV pixel can be converted back to RGB with no loss

of information. Downsampling makes a YUV image smaller and also loses some of the color

information. If performed correctly, however, the loss is not perceptually significant.

YUV in Computer Video

The formulas listed previously for YUV are not the exact conversions used in digital video.

Digital video generally uses a form of YUV called Y’CbCr. Essentially, Y’CbCr works by

scaling the YUV components to the following ranges:

These ranges assume 8 bits of precision for the Y’CbCr components. Here is the exact

derivation of Y’CbCr, using the BT.601 definition of luma:

Start with RGB values in the range [0...1]. In other words, pure black is 0 and pure white

is 1. Importantly, these are non-linear (gamma corrected) RGB values. Calculate the luma.

For BT.601, Y’ = 0.299R + 0.587G + 0.114B, as described earlier. Calculate the intermediate

chroma difference values (B - Y’) and (R - Y’). These values have a range of +/- 0.886 for (B -

79

Page 92: Badripatro Dissertation 09307903

Y’), and +/- 0.701 for (R - Y’). Scale the chroma difference values as follows:

Pb = (0.5/(1− 0.114)) ∗ (B − Y ′) (6.3a)

Pr = (0.5/(1− 0.299)) ∗ (R− Y ′) (6.3b)

These scaling factors are designed to give both values the same numerical range, +/- 0.5. To-

gether, they define a YUV color space named Y’PbPr. This color space is used in analog

component video.Scale the Y’PbPr values to get the final Y’CbCr values:

Y ′ = 16 + 219 ∗ Y ′ (6.4a)

Cb = 128 + 224 ∗ Pb (6.4b)

Cr = 128 + 224 ∗ Pr (6.4c)

The following table shows RGB and YCbCr values for various colors, again using the BT.601

definition of luma.

Table 6.1: RGB and YCbCr values for various colors using BT.601

Color R G B Y’ Cb Cr

Black 0 0 0 16 128 128

Red 255 0 0 81 90 240

Green 0 255 0 145 54 34

Blue 0 0 255 41 240 110

Cyan 0 255 255 170 166 16

Magenta 255 0 255 106 202 222

Yellow 255 255 0 210 16 146

White 255 255 255 235 128 128

As this table shows, Cb and Cr do not correspond to intuitive ideas about color. For

example, pure white and pure black both contain neutral levels of Cb and Cr (128). The highest

and lowest values for Cb are blue and yellow, respectively. For Cr, the highest and lowest values

80

Page 93: Badripatro Dissertation 09307903

are red and cyan. Note For the purposes of this article, the term U is equivalent to Cb, and the

term V is equivalent to Cr. YUV Sampling

6.4.2 8-Bit YUV Formats for Video

In this case we use 8 bits per pixel location to encode the Y channel , and use 8 bits per sample

to encode each U or V chroma sample. However, most YUV formats use fewer than 24 bits per

pixel on average, because they contain fewer samples of U and V than of Y. This article does

not cover YUV formats with 10-bit or higher Y channels.

Chroma channels can have a lower sampling rate than the luma channel, without any

dramatic loss of perceptual quality. A notation called the "A:B:C" notation is used to describe

how often U and V are sampled relative to Y:

• 4:4:4 Formats, 32 Bits per Pixel -means no downsampling of the chroma channels. AYUV

is YUV 4:4:4 Formats where each pixel is encoded as four consecutive bytes arranged in

the sequence.

• 4:2:2 Formats, 16 Bits per Pixel -means 2:1 horizontal downsampling, with no vertical

downsampling. Every scan line contains four Y samples for every two U or V samples.

• 4:2:0 Formats, 16 Bits per Pixel-means 2:1 horizontal downsampling, with 2:1 vertical

downsampling. IMC1 and IMC3 are the two YUV 4:2:0 Formats presents.

• 4:1:1 Formats, 12 Bits per Pixel-means 4:1 horizontal downsampling, with no vertical

downsampling. Every scan line contains four Y samples for each U and V sample. 4:1:1

sampling is less common than other formats, and is not discussed in detail in this article.

IMC2, IMC4, YV12 and NV12 are the four YUV 4:1:1 Formats presents.

The following diagrams shows how chroma is sampled for each of the downsampling rates.

Luma samples are represented by a cross, and chroma samples are represented by a circle.

The following diagrams shows how chroma is sampled for each of the downsampling rates.

Luma samples are represented by a cross, and chroma samples are represented by a circle.

Two 4:2:2 formats are recommended, with the following FOURCC codes:YUY2 and

UYVY

Both are packed formats, where each macropixel is two pixels encoded as four consecutive

bytes. This results in horizontal downsampling of the chroma by a factor of two. In YUY2

81

Page 94: Badripatro Dissertation 09307903

format, the data can be treated as an array of unsigned char values, where the first byte contains

the first Y sample, the second byte contains the first U (Cb) sample, the third byte contains the

second Y sample, and the fourth byte contains the first V (Cr) sample, as shown in the following

diagram.

Figure 6.17: YUY2 memory layout

If the image is addressed as an array of little-endian WORD values, the first WORD con-

tains the first Y sample in the least significant bits (LSBs) and the first U (Cb) sample in the

most significant bits (MSBs). he second WORD contains the second Y sample in the LSBs and

the first V (Cr) sample in the MSBs.

YUY2 is the preferred 4:2:2 pixel format for Video Acceleration (DirectX VA). It is ex-

pected to be an intermediate-term requirement for VA accelerators supporting 4:2:2 video.

UYVY format is the same as the YUY2 format except the byte order is reversedâATthat

is, the chroma and luma bytes are flipped (Figure 4). If the image is addressed as an array of

two little-endian WORD values, the first WORD contains U in the LSBs and Y0 in the MSBs,

and the second WORD contains V in the LSBs and Y1 in the MSBs.

Figure 6.18: UYVY memory layout

Figure 6.19: RGB2UYVY

82

Page 95: Badripatro Dissertation 09307903

Surface Definitions

This section describes the 8-bit YUV formats that are recommended for video rendering. These

fall into several categories:

Figure 6.20: YUV sampling

First, you should be aware of the following concepts :

• Surface origin: For the YUV formats described in this article, the origin (0,0) is always

the top left corner of the surface.

• Stride: The stride of a surface, sometimes called the pitch, is the width of the surface in

bytes. Given a surface origin at the top left corner, the stride is always positive.

• Alignment: The alignment of a surface is at the discretion of the graphics display driver.

Packed format versus planar format. YUV formats are divided into packed formats and planar

formats. In a packed format, the Y, U, and V components are stored in a single array. Pixels are

organized into groups of macropixels, whose layout depends on the format. In a planar format,

the Y, U, and V components are stored as three separate planes.

Picture Aspect Ratio

Picture aspect ratio defines the shape of the displayed video image. Picture aspect ratio is

notated X:Y, where X:Y is the ratio of picture width to picture height. Most video standards use

either 4:3 or 16:9 picture aspect ratio. The 16:9 aspect ratio is commonly called widescreen.

Picture aspect ratio is also called display aspect ratio (DAR).

83

Page 96: Badripatro Dissertation 09307903

Figure 6.21: Picture ascept ratio

Pixel Aspect Ratio

Pixel aspect ratio (PAR) measures the shape of a pixel. When a digital image is captured, the

image is sampled both vertically and horizontally, resulting in a rectangular array of quantized

samples, called pixels or pels. Pixel aspect ratio also applies to the display device. The physical

shape of the display device and the physical pixel resolution (across and down) determine the

PAR of the display device. Computer monitors generally use square pixels. If the image PAR

and the display PAR do not match, the image must be scaled in one dimension, either vertically

or horizontally, in order to display correctly. The following formula relates PAR, display aspect

ratio (DAR), and image size in pixels:

DAR =image_width_in_pixelsimage_height_in_pixels

∗ PAR (6.5)

Here is a real-world example: NTSC-M analog video contains 480 scan lines in the active

image area. ITU-R Rec. BT.601 specifies a horizontal sampling rate of 704 visible pixels per

line, yielding a digital image with 704 x 480 pixels. The intended picture aspect ratio is 4:3,

yielding a PAR of 10:11.

• DAR: 4:3

• Width in pixels: 704

• Height in pixels: 480

• PAR: 10/11

84

Page 97: Badripatro Dissertation 09307903

where 4/3 = (704/420) x (10/11)

To display this image correctly on a display device with square pixels, you must scale

either the width by 10/11 or the height by 11/10.

Figure 6.22: Pixel ascept ratio

6.4.3 Color Space Conversion

Conversion from one Y’CbCr space to another requires the following steps.

1. Inverse quantization: Convert the Y’CbCr representation to a Y’PbPr representation, us-

ing the source nominal range.

2. Upsampling: Convert the sampled chroma values to 4:4:4 by interpolating chroma values.

3. YUV to RGB conversion: Convert from Y’PbPr to non-linear R’G’B’, using the source

transfer matrix.

4. Inverse transfer function: Convert non-linear R’G’B’ to linear RGB, using the inverse of

the transfer function.

5. RGB color space conversion: Use the color primaries to convert from the source RGB

space to the target RGB space.

6. Transfer function: Convert linear RGB to non-linear R’G’B, using the target transfer

function.

85

Page 98: Badripatro Dissertation 09307903

7. RGB to YUV conversion: Convert R’G’B’ to Y’PbPr, using the target transfer matrix.

8. Downsampling: Convert 4:4:4 to 4:2:2, 4:2:0, or 4:1:1 by filtering the chroma values.

9. Quantization: Convert Y’PbPr to Y’CbCr, using the target nominal range.

Steps 1 to 4 occur in the source color space, and steps 6 to 9 occur in the target color space.

In the actual implementation, intermediate steps can be approximated and adjacent steps can be

combined. There is generally a trade-off between accuracy and computational cost.

For example, to convert from RT.601 to RT.709 requires the following stages:

1. Inverse quantization: Y’CbCr(601) to Y’PbPr(601)

2. Upsampling: Y’PbPr(601)

3. YUV to RGB: Y’PbPr(601) to R’G’B’(601)

4. Inverse transfer function: R’G’B’(601) to RGB(601)

5. RGB color space conversion: RGB(601) to RGB(709)

6. Transfer function: RGB(709) to R’G’B’(709)

7. RGB to YUV: R’G’B’(709) to Y’PbPr(709)

8. Downsampling: Y’PbPr(709)

9. Quantization: Y’PbPr(709) to Y’CbCr(709)

Converting RGB888 to YUV 4:4:4

In the case of computer RGB input and 8-bit BT.601 YUV output, we believe that the formulas

given in the previous section can be reasonably approximated by the following:

Y = ((66 ∗R + 129 ∗G+ 25 ∗B + 128) >> 8) + 16 (6.6a)

U = ((−38 ∗R− 74 ∗G+ 112 ∗B + 128) >> 8) + 128 (6.6b)

V = ((112 ∗R− 94 ∗G− 18 ∗B + 128) >> 8) + 128 (6.6c)

These formulas produce 8-bit results using coefficients that require no more than 8 bits of

(unsigned) precision. Intermediate results will require up to 16 bits of precision.

86

Page 99: Badripatro Dissertation 09307903

Converting 8-bit YUV to RGB888

From the original RGB-to-YUV formulas, one can derive the following relationships for BT.601.

Y = round(0.256788 ∗R + 0.504129 ∗G+ 0.097906 ∗B) + 16 (6.7a)

U = round(−0.148223 ∗R− 0.290993 ∗G+ 0.439216 ∗B) + 128 (6.7b)

V = round(0.439216 ∗R− 0.367788 ∗G− 0.071427 ∗B) + 128 (6.7c)

Therefore, given C, D, E after subtracting constants: C = Y - 16 D = U - 128 E = V - 128

the formulas to convert YUV to RGB can be derived as follows:

R = clip(round(1.164383 ∗ C + 1.596027 ∗ E)) (6.8a)

G = clip(round(1.164383 ∗ C − (0.391762 ∗D)− (0.812968 ∗ E))) (6.8b)

B = clip(round(1.164383 ∗ C + 2.017232 ∗D)) (6.8c)

where clip() denotes clipping to a range of [0..255]. We believe these formulas can be

reasonably approximated by the following:

R = clip((298 ∗ C + 409 ∗ E + 128) >> 8) (6.9a)

G = clip((298 ∗ C − 100 ∗D − 208 ∗ E + 128) >> 8) (6.9b)

B = clip((298 ∗ C + 516 ∗D + 128) >> 8) (6.9c)

These formulas use some coefficients that require more than 8 bits of precision to produce

each 8-bit result, and intermediate results will require more than 16 bits of precision.

To convert 4:2:0 or 4:2:2 YUV to RGB, we recommend converting the YUV data to 4:4:4

YUV, and then converting from 4:4:4 YUV to RGB. The sections that follow present some

methods for converting 4:2:0 and 4:2:2 formats to 4:4:4.

Conversion between RGB and 4:4:4 YUV

We first describe conversion between RGB and 4:4:4 YUV. To convert 4:2:0 or 4:2:2 YUV to

RGB, we recommend converting the YUV data to 4:4:4 YUV, and then converting from 4:4:4

YUV to RGB. The AYUV format, which is a 4:4:4 format, uses 8 bits each for the Y, U, and V

samples. YUV can also be defined using more than 8 bits per sample for some applications.

87

Page 100: Badripatro Dissertation 09307903

Two dominant YUV conversions from RGB have been defined for digital video. Both are

based on the specification known as ITU-R Recommendation BT.709. The first conversion is

the older YUV form defined for 50-Hz use in BT.709. It is the same as the relation specified

in ITU-R Recommendation BT.601, also known by its older name, CCIR 601. It should be

considered the preferred YUV format for standard-definition TV resolution (720 x 576) and

lower-resolution video. It is characterized by the values of two constants Kr and Kb:

Converting 4:2:0 YUV to 4:2:2 YUV

Converting 4:2:0 YUV to 4:2:2 YUV requires vertical upconversion by a factor of two. This

section describes an example method for performing the upconversion. The method assumes

that the video pictures are progressive scan.

Note The 4:2:0 to 4:2:2 interlaced scan conversion process presents atypical problems and

is difficult to implement. This article does not address the issue of converting interlaced scan

from 4:2:0 to 4:2:2.

Converting 4:2:2 YUV to 4:4:4 YUV

Converting 4:2:2 YUV to 4:4:4 YUV requires horizontal upconversion by a factor of two. The

method described previously for vertical upconversion can also be applied to horizontal upcon-

version. For MPEG-2 and ITU-R BT.601 video, this method will produce samples with the

correct phase alignment. Converting 4:2:0 YUV to 4:4:4 YUV

To convert 4:2:0 YUV to 4:4:4 YUV, you can simply follow the two methods described

previously. Convert the 4:2:0 image to 4:2:2, and then convert the 4:2:2 image to 4:4:4. You

can also switch the order of the two upconversion processes, as the order of operation does not

really matter to the visual quality of the result.

Summary Demo code for video capture, display, encoder and decoder , video copy and

Video preview is imaplemented on DM6437 board. The image invertion and edge detection

using sobel operator is implemented on davinci board .

6.5 APPENDIX E: Single object tracking on DM6437 CODE

/* * ======== video_preview.c ========* */

/* runtime include files */

88

Page 101: Badripatro Dissertation 09307903

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <stdarg.h>

/* BIOS include files */

#include <std.h>

#include <gio.h>

#include <tsk.h>

#include <trc.h>

/* PSP include files */

#include <psp_i2c.h>

#include <psp_vpfe.h>

#include <psp_vpbe.h>

#include <fvid.h>

#include <psp_tvp5146_extVidDecoder.h>

/* CSL include files */

#include <soc.h>

#include <cslr_sysctl.h>

/* BSL include files */

#include <evmdm6437.h>

#include <evmdm6437_dip.h>

/* Video Params Defaults */

#include <vid_params_default.h>

/* This example supports either PAL or NTSC depending

on position of JP1 */

#define STANDARD_PAL 0

#define STANDARD_NTSC 1

#define FRAME_BUFF_CNT 6

static int read_JP1(void);

static CSL_SysctlRegsOvly sysModuleRegs =

(CSL_SysctlRegsOvly )CSL_SYS_0_REGS;

//*******************************************************

89

Page 102: Badripatro Dissertation 09307903

// USER DEFINED FUNCTIONS

//*******************************************************

void extract_uyvy (void * currentFrame);

void write_uyvy (void * currentFrame);

void tracking();

void copy_frame();

void frame_substract();

//*******************************************************

// VARIABLE ARRAYS

//*******************************************************

unsigned char I_y[480][720];

unsigned char I_u[480][360];

unsigned char I_v[480][360];

unsigned char I_y1[480][720];

unsigned char I_u1[480][360];

unsigned char I_v1[480][360];

unsigned char I_y2[480][720];

unsigned char I_u2[480][360];

unsigned char I_v2[480][360];

unsigned char I_y3[480][720];

unsigned char I_u3[480][360];

unsigned char I_v3[480][360];

unsigned char I_y4[480][720];

unsigned char I_u4[480][360];

unsigned char I_v4[480][360];

////////////////////////

/* * ======== main ========* */

void main() {

printf("Video Preview Application\n");

fflush(stdout);

/* Initialize BSL library to read jumper switches: */

EVMDM6437_DIP_init();

90

Page 103: Badripatro Dissertation 09307903

/* VPSS PinMuxing */

/* CI10SEL - No CI[1:0] */

/* CI32SEL - No CI[3:2] */

/* CI54SEL - No CI[5:4] */

/* CI76SEL - No CI[7:6] */

/* CFLDSEL - No C_FIELD */

/* CWENSEL - No C_WEN */

/* HDVSEL - CCDC HD and VD enabled */

/* CCDCSEL - CCDC PCLK, YI[7:0] enabled */

/* AEAW - EMIFA full address mode */

/* VPBECKEN - VPBECLK enabled */

/* RGBSEL - No digital outputs */

/* CS3SEL - LCD_OE/EM_CS3 disabled */

/* CS4SEL - CS4/VSYNC enabled */

/* CS5SEL - CS5/HSYNC enabled */

/* VENCSEL - VCLK,YOUT[7:0],COUT[7:0] enabled */

/* AEM - 8bEMIF + 8bCCDC + 8 to 16bVENC */

sysModuleRegs -> PINMUX0 &= (0x005482A3u);

sysModuleRegs -> PINMUX0 |= (0x005482A3u);

/* PCIEN = 0: PINMUX1 - Bit 0 */

sysModuleRegs -> PINMUX1 &= (0xFFFFFFFEu);

sysModuleRegs -> VPSSCLKCTL = (0x18u);

return;}

/* * ======== video_preview ========* */

void video_preview(void) {

FVID_Frame *frameBuffTable[FRAME_BUFF_CNT];

FVID_Frame *frameBuffPtr;

GIO_Handle hGioVpfeCcdc;

GIO_Handle hGioVpbeVid0;

GIO_Handle hGioVpbeVenc;

int status = 0;

int result;

91

Page 104: Badripatro Dissertation 09307903

int i;

int standard;

int width;

int height;

/* Set video display/capture driver params to defaults */

PSP_VPFE_TVP5146_ConfigParams tvp5146Params =

VID_PARAMS_TVP5146_DEFAULT;

PSP_VPFECcdcConfigParams vpfeCcdcConfigParams =

VID_PARAMS_CCDC_DEFAULT_D1;

PSP_VPBEOsdConfigParams vpbeOsdConfigParams =

VID_PARAMS_OSD_DEFAULT_D1;

PSP_VPBEVencConfigParams vpbeVencConfigParams;

standard = read_JP1();

/* Update display/capture params based on video standard (PAL/NTSC) */

if (standard == STANDARD_PAL) {

width = 720;

height = 576;

vpbeVencConfigParams.displayStandard =

PSP_VPBE_DISPLAY_PAL_INTERLACED_COMPOSITE;}

else {

width = 720;

height = 480;

vpbeVencConfigParams.displayStandard =

PSP_VPBE_DISPLAY_NTSC_INTERLACED_COMPOSITE; }

vpfeCcdcConfigParams.height = vpbeOsdConfigParams.height = height;

vpfeCcdcConfigParams.width = vpbeOsdConfigParams.width = width;

vpfeCcdcConfigParams.pitch = vpbeOsdConfigParams.pitch = width * 2;

/* init the frame buffer table */

for (i=0; i<FRAME_BUFF_CNT; i++) {

frameBuffTable[i] = NULL; }

/* create video input channel */

if (status == 0) {

92

Page 105: Badripatro Dissertation 09307903

PSP_VPFEChannelParams vpfeChannelParams;

vpfeChannelParams.id = PSP_VPFE_CCDC;

vpfeChannelParams.params =

(PSP_VPFECcdcConfigParams*)&vpfeCcdcConfigParams;

hGioVpfeCcdc = FVID_create

("/VPFE0",IOM_INOUT,NULL,&vpfeChannelParams,NULL);

status = (hGioVpfeCcdc == NULL ? -1 : 0); }

/* create video output channel, plane 0 */

if (status == 0) {

PSP_VPBEChannelParams vpbeChannelParams;

vpbeChannelParams.id = PSP_VPBE_VIDEO_0;

vpbeChannelParams.params =

(PSP_VPBEOsdConfigParams*)&vpbeOsdConfigParams;

hGioVpbeVid0 = FVID_create

("/VPBE0",IOM_INOUT,NULL,&vpbeChannelParams,NULL);

status = (hGioVpbeVid0 == NULL ? -1 : 0); }

/* create video output channel, venc */

if (status == 0) {

PSP_VPBEChannelParams vpbeChannelParams;

vpbeChannelParams.id = PSP_VPBE_VENC;

vpbeChannelParams.params =

(PSP_VPBEVencConfigParams *)&vpbeVencConfigParams;

hGioVpbeVenc = FVID_create

("/VPBE0",IOM_INOUT,NULL,&vpbeChannelParams,NULL);

status = (hGioVpbeVenc == NULL ? -1 : 0); }

/* configure the TVP5146 video decoder */

if (status == 0) {

result = FVID_control(hGioVpfeCcdc,

VPFE_ExtVD_BASE+PSP_VPSS_EXT_VIDEO_DECODER_CONFIG, &tvp5146Params);

status = (result == IOM_COMPLETED ? 0 : -1); }

/* allocate some frame buffers */

if (status == 0) {

93

Page 106: Badripatro Dissertation 09307903

for (i=0; i<FRAME_BUFF_CNT && status == 0; i++) {

result = FVID_allocBuffer(hGioVpfeCcdc, &frameBuffTable[i]);

status = (result == IOM_COMPLETED &&

frameBuffTable[i] != NULL ? 0 : -1); } }

/* prime up the video capture channel */

if (status == 0) {

FVID_queue(hGioVpfeCcdc, frameBuffTable[0]);

FVID_queue(hGioVpfeCcdc, frameBuffTable[1]);

FVID_queue(hGioVpfeCcdc, frameBuffTable[2]);

}

/* prime up the video display channel */

if (status == 0) {

FVID_queue(hGioVpbeVid0, frameBuffTable[3]);

FVID_queue(hGioVpbeVid0, frameBuffTable[4]);

FVID_queue(hGioVpbeVid0, frameBuffTable[5]); }

/* grab first buffer from input queue */

if (status == 0) {

FVID_dequeue(hGioVpfeCcdc, &frameBuffPtr); }

/* loop forever performing video capture and display */

while ( status == 0 ) {

/* grab a fresh video input frame */

FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);

extract_uyvy ((frameBuffPtr->frame.frameBufferPtr));

copy_frame();

frame_substract();

tracking();

write_uyvy ((frameBuffPtr->frame.frameBufferPtr));

/* display the video frame */

FVID_exchange(hGioVpbeVid0, &frameBuffPtr);}}

/* * ======== read_JP1 ========

* Read the PAL/NTSC jumper.

*

94

Page 107: Badripatro Dissertation 09307903

* Retry, as I2C sometimes fails: */

static int read_JP1(void)

{ int jp1 = -1;

while (jp1 == -1) {

jp1 = EVMDM6437_DIP_get(JP1_JUMPER);

TSK_sleep(1); }

return(jp1);}

void extract_uyvy(void * currentFrame)

{ int r, c;

for(r = 0; r < 480; r++) {

for(c = 0; c < 360; c++){

I_u1[r][c] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 0);

I_y1[r][2*c] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 1);

I_v1[r][c] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 2);

I_y1[r][2*c+1] = * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 3);

} } }

void write_uyvy (void * currentFrame)

{ int r, c;

for(r = 0; r < 480; r++) {

for(c = 0; c < 360; c++){

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 0)=

I_u[r][c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 1)=

I_y[r][2*c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 2)=

I_v[r][c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 3)=

I_y[r][2*c+1];

95

Page 108: Badripatro Dissertation 09307903

} } }

void copy_frame()

{ int r, c;

for(r = 0; r < 480; r++) {

for(c = 0; c < 360; c++) {

I_u[r][c] = I_u1[r][c];

I_y[r][2*c] = I_y1[r][2*c] ;

I_v[r][c] = I_v1[r][c] ;

I_y[r][2*c+1] = I_y1[r][2*c+1] ;

} } }

void frame_substract()

{int r, c;

for(r = 0; r < 480; r++) {

for(c = 0; c < 360; c++){

I_u3[r][c]= I_u1[r][c] -I_u2[r][c] ;

I_y3[r][2*c] = I_y1[r][2*c] -I_y2[r][2*c];

I_v3[r][c] = I_v1[r][c] -I_v2[r][c] ;

I_y3[r][2*c+1]=I_y1[r][2*c+1]-I_y2[r][2*c+1];

} }

for(r = 0; r < 480; r++) {

for(c = 0; c < 360; c++) {

I_u2[r][c] = I_u1[r][c];

I_y2[r][2*c] = I_y1[r][2*c] ;

I_v2[r][c] = I_v1[r][c] ;

I_y2[r][2*c+1] = I_y1[r][2*c+1] ;

} } }

void tracking()

{ int r, c,m,n,p,q;

int cent_x,cent_y,cent_z;

int centroid_x,centroid_y;

int dim_x,dim_y;

cent_x=0;

96

Page 109: Badripatro Dissertation 09307903

cent_y=0;

cent_z=0;

for(m = 0; m < 480; m++) {

for(n = 0; n < 360; n++) {

if((I_u3[m][n]<45 || I_u3[m][n]>200) & (I_y3[m][2*n]<45

|| I_y3[m][2*n]>200) & (I_v3[m][n] <45 || I_v3[m][n]>200)

& (I_y3[m][2*n+1] <45 || I_y3[m][2*n+1]>200)){

I_u3[m][n] = 128 ;

I_y3[m][2*n] = 16 ;

I_v3[m][n] = 128 ;

I_y3[m][2*n+1] = 16; }

else {

cent_x= cent_x + m ;

cent_y= cent_y + n ;

cent_z= cent_z + 1 ;

}

centroid_x= (cent_x/cent_z);

centroid_y=(cent_y/cent_z);} }

for(p =centroid_x-10 ; p < centroid_x+10; p++) {

for(q = centroid_y-10; q < centroid_y+10; q++) {

if(p== centroid_x-10 || p==centroid_x+9 || q ==

centroid_y-10 || q==centroid_y+9) {

I_u[p][q] = 255;

I_y[p][2*q] = 255;

I_v[p][q] = 255;

I_y[p][2*q+1] = 255; }

else {I_u[p][q] = I_u[p][q];

I_y[p][2*q] = I_y[p][2*q];

I_v[p][q] = I_v[p][q];

I_y[p][2*q+1] = I_y[p][2*q+1]; }}}

}

97

Page 110: Badripatro Dissertation 09307903

References

[1] Morimoto. T., Kiriyama. O., Harada. Y., Adachi. H., Koide. T., and Mattausch. H. J.,

‘‘Object tracking in video images based on image segmentation and pattern matching”, Proc.

of IEEE Int. Symp. on Cir. and Syst., 2005, pp. 3215-3218.

[2] Yamaoka. K., Morimoto. T., Adachi. H., Koide. T., and Mattausch. H. J., ‘‘Image segmenta-

tion and pattern matching based FPGA/ASIC implementation architecture of real-time object

tracking”, Asia and south pacific conference on design automation, 2006, pp. 176-181.

[3] Qiaowei. L., Shuangyuan. Y., and Senxing. Z., ‘‘Image segmentation and major approaches,

” IEEE International Conference on Computer Science and Automation Engineering, 2011,

pp. 465-468.

[4] Patra. D., Santosh. K. K., and Chakraborty. D., ‘‘Object tracking in video images using

hybrid segmentation method and pattern matching”, Annual IEEE India Conference, 2009,

pp. 1-4.

[5] Watve. A. K., ‘‘Object tracking in video scenes”, M. Tech. seminar, IIT Kharagpur, India,

2010.

[6] Uy. D. L.,‘‘An algorithm for image clusters detection and identification based on color for

an autonomous mobile robot”, Research report submitted to Hampton university, Verginia,

1994

[7] Bochem. A., Herpers. R., and Ken. K. B., ,‘‘ Acceleration of Blocb Detection within Im-

ages in Hardware”, Research report, University of New Brunswick, 2009, World Wide Web,

http://www.cs.unb.ca/tech-reports/documents/TR_10_205.pdf.

[8] Kaspers, A.,‘‘Blob Detection”, Research report, Image Sciences Institute, UMC Utrecht,

May 5, 2011.

98

Page 111: Badripatro Dissertation 09307903

[9] Gupta. M., ‘‘Cell Identification by Blob Detection”, International Journal of Advances in

Electonics Engineering, vol. 2, Issue 1, 2012.

[10] Hinz. S., ‘‘Fast and subpixel precise blob detection and attribution”, IEEE International

Conference on Image Processing, 2005, vol.3, pp. 457-60.

[11] Francois. A. R., ‘‘ Real-time multi-resolution blob track-ing”, Technical Report IRIS-04-

423, Institute for Robotics and Intelligent Systems, University of South-ern California, July

2004.

[12] Mancas. M., ‘‘ Augmented Virtual Studio”, Tech. rep. 4. 2008. pp. 1-3.

[13] Dharamadhat. T., Thanasoontornlerk. K., and Kanongchaiyos. P., , ‘‘Tracking object in

video pictures based on background subtraction and image matching”, IEEE International

Conference on Robotics and Biomimetics, 2008, pp. 1255-1260.

[14] Piccardi. M., ‘‘Background subtraction techniques: a review”, IEEE International Confer-

ence on Systems, Man and Cybernetics, 2004, vol.4, pp. 3099- 3104.

[15] Andrews. A., ’’Targeting multiple objects in real time”, B.E thesis, University of Calgary,

Canada, October, 1999.

[16] Saravanakumar. S., Vadivel. A., and Saneem. A. C. G., ‘‘Multiple human object tracking

using background subtraction and shadow removal techniques”, International Conference on

Signal and Image Processing, 2010, pp. 79-84.

[17] ZuWhan. K., ‘‘Real time object tracking based on dynamic feature grouping with back-

ground subtraction”, IEEE Conference on Computer Vision and Pattern Recognition, 2008,

pp. 1-8.

[18] Isard. M., and MacCormick. J., ‘‘BraMBLe: a Bayesian multiple-blob tracker”, Eighth

IEEE International Conference on Computer Vision, 2001, vol.2, pp. 34-41.

[19] Gonzales. R. C., and Woods. R. E., ‘‘Digital Image Processing-Second Edition”, Prentice

Hall, 2002.

[20] Haralick. R. M., and Shapiro. L. G., ‘‘Computer and Robot Vision”, volume I, Addison-

Wesley, 1992, pp. 28-48.

99

Page 112: Badripatro Dissertation 09307903

[21] Castagno. R., Ebrahimi. T., and Kunt. M.,‘‘Video Segmentation Based on Multiple Fea-

tures for Interactive Multimedia Applications”, IEEE Transactions on Circuits and Systems

for Video Technology, vol. 8, pp. 562-571, September 1998.

[22] Kenako. T., and Hori. O.,‘‘Feature selection for reliable tracking using template match-

ing”, Proc. IEEE Intl. Conference on Computer Vision and Pattern Recognition, 2003, vol.

1, pp. 796-802.

[23] Bochem. A., Herpers. R., and Kent. K. B., ,‘‘Hardware Acceleration of BLOB Detection

for Image Processing”, Third International Conference on Advances in Circuits, Electronics

and Micro-Electronics, 2010, pp. 28-33.

[24] Mostafa. A., Mehdi. A., Mohammad. H., and Ahmad. A., ‘‘Object Tracking in Video

Sequence Using Background Modeling”, Proc. IEEE Workshop on Application of Computer

Vision, 2011, pp. 967-974.

[25] Babu. R. V., and Makur A., ‘‘Object-based Surveillance Video Compression using Fore-

ground Motion Compensation” Int. Conf. on Control, Automation, Robotics and Vision,

2006, pp. 1-6.

[26] Comaniciu. D., Ramesh. V., and Meer. P., ‘‘Real-time tracking of non-rigid objects using

mean shift, ” Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2000,

vol.2, pp. 142-149.

[27] Foresti. G. L., ‘‘A real-time system for video surveillance of unattended outdoor environ-

ments”, IEEE Trans. Circuits and Systems for Vid. Tech., vol. 8, no. 6, pp. 697-704, 1998.

[28] Elbadri. M., Peterkin. R., Groza. V., Ionescu. D., and Saddik. El. A., ‘‘Hardware support

of JPEG, ” Canadian Conf. on Electrical and Computer Engineering, 2005, pp. 812-815.

[29] Deng. M., Guan. Q., and Xu. S., ‘‘Intelligent video target tracking system based on DSP,

” Int. Conf. on Computational Problem-Solving, 2010, pp. 366-369.

[30] Liping. K., Zhefeng. Z., Gang. X., ‘‘The Hardware Design of Dual-Mode Wireless Video

Surveillance System Based on DM6437, ” Second Inte. Conf. on Networks Security Wireless

Communications and Trusted Computing, 2010, pp. 546-549.

100

Page 113: Badripatro Dissertation 09307903

[31] Pescador. F., Maturana. G., Garrido. M. J., Juarez. E., and Sanz. C., ‘‘An H.264 video

decoder based on a DM6437 DSP, ” Digest of Technical Papers International Conference on

Consumer Electronics, 2009, pp. 1-2.

[32] Wang. Q., Guan. Q. , Xu. S., and Tan. F., ‘‘A network intelligent video analysis system

based on multimedia DSP, ” Int. Conf. on Communications, Circuits and Systems, 2010, pp.

363-367.

[33] Kim. C., and Hwang. J. N., ‘‘Object-based video abstraction for video surveillance sys-

tem, ” IEEE Trans. circuits and Systems for Video Technology y, vol. 12, no. 12, pp. 1128-

1138, 2002.

[34] Nishi. T., and Fujiyoshi. H., ‘‘Object-based video coding using pixel state analysis,” IEEE

Intl. Conference on Pattern Recognition, 2004.

[35] William. K. P., ‘‘ Digital Image Processing (second edition)”, John Wiley & Sons, New

York, 1991.

[36] Wallace. G. K., ‘‘The JPEG still picture compression standard, ” IEEE Transactions on

Consumer Electronics, vol.38, no.1, pp. xviii-xxxiv, Feb 1992.

[37] Seol. S. W., ‘‘An automatic detection and tracking system of moving objects using double

differential based motion estimation”, Proc. of Int. Tech. Conf. Circ./Syst., Comput. and

Comms., 2003, pp. 260-263.

[38] Dwivedi. V., ‘‘Jpeg Image Compression and Decompression with Modeling of DCT Co-

efficients on the Texas Instrument Video Processing Board TMS320DM6437”, Master of

science, California State University, Sacramento, Summer 2010.

[39] Kapadia. P.,‘‘Car License Plate Recognition Using Template Matching Algorithm”, Mas-

ter Project Report, California State University, Sacramento,Fall 2010.

[40] Gohil. N., ‘‘Car License Plate Detection”, Masters Project Report, California State Uni-

versity, Sacramento, Fall 2010.

[41] Texas Instruments Inc., ‘‘TMS320DM6437 DVDP Getting Started Guide”, Texas, July

2007.

101

Page 114: Badripatro Dissertation 09307903

[42] Texas Instrument Inc., ‘‘TMS320DM6437 Digital Media Processor”, Texas, pp. 211-234,

June 2008.

[43] Texas Instruments Inc., ‘‘TMS320C64x+ DSP Cache User’s Guide”, Literature Number:

SPRU862A, Table 1-6, pp. 23, October 2006.

[44] Texas Instrument Inc., ‘‘TMS320DM643x DMP Peripherals Overview Reference Guide”,

pp. 15-17, June 2007.

[45] Texas Instrument Inc., ‘‘TMS320C6000 Programmers Guide”, Texas, pp. 37-84, March

2000.

[46] Xilinx Inc., ‘‘The Xilinx LogiCORE IP RGB to YCrCb Color-Space Converter”, pp. 1-5,

July 2010.

[47] Texas Instruments Inc., ‘‘How to Use the VPBE and VPFE Driver on TMS320DM643x”.

Dallas, Texas, November 2007.

[48] Texas Instrument Inc., ‘‘TMS320C64X+ DSP Cache ”, User Guide, pp. 14-26, February

2009.

[49] Texas Instruments technical Reference, ‘‘TMS320DM6437 Evaluation Module”, Spec-

trum Digital , 2006.

[50] Keith. Jack., ‘‘Video Demystified: A Handbook for the Digital Engineer”, 4th Edition,

Llh Technology Pub,1995.

[51] Pawate. B. I. , ‘‘Developing Embedded Software using DaVinci&OMAP Technology”

Margon & Claypool, 2009.

[52] Bovik. Al., ‘‘Handbook of Image & Video Processing”, Academic Press Series, Depart-

ment of Electrical and Computer Engineering, UTA Texas, 1999.

[53] Stephens. L. B., Student Thesis on “Image Compression Algorithms”, California State

University, Sacramento, August 1996

[54] Berkeley Design Technology, Inc.,‘‘The Evolution of DSP Processors”, World Wide Web,

http://www.bdti.com/articles/evolution.pdf, Nov. 2006.

102

Page 115: Badripatro Dissertation 09307903

[55] Berkeley Design Technology, Inc., ‘‘Choosing a Processor: Benchmark and Beyond”,

World Wide Web,http://www.bdti.com/articles/20060301_TIDC_Choosing.pdf, Nov. 2006.

[56] University of Rochester, ‘‘DSP Architectures: Past, Present and Future”, World Wide

Web, http://www.ece.rochester.edu/research/wcng/papers/CAN_r1.pdf, Nov. 2006.

[57] Steven. W. Smith.,‘‘ The Scientist and Engineer’s Guide to Digital Signal Processing”,

Second Edition, California Technical Publishing, 1999.

[58] Texas Instruments Inc., ‘‘TMS320DM642 Technical Overview”, Dallas, Texas, Septem-

ber 2002.

103

Page 116: Badripatro Dissertation 09307903

Acknowledgments

I express my sincere thanks and deep sense of gratitude to my supervisor Prof. V Rajbabu for

his invaluable guidance, inspiration, unremitting support,encouragement and for his stimulating

suggestions during in the preparation of this report. His persistence and inspiration during the

“ups and downs” in research, and his clarity and focus during the uncertainties, have been very

helpful to me. Without his continuous encouragement and motivation, the present work would

not have seen the light of day.

I acknowledge with thanks to all EI lab members and TI-DSP lab members, at IIT Bombay

who have directly or indirectly helped me throughout my stay in IIT. I Would also like to thank

the assistance provided by the department staff, central library staff and computer faculty staff.

I would like to express my sincere thanks to Mr. Ajay Nandoriya and Mr. K.S Nataraj for

their help and support during the project work.

The family members are of course, a source of faith and moral strength. I acknowledge

the shower of blessing and love of my parents, Mr. Rajiba Lochana Patro and Mrs. Uma Rani

patro, also Godaborish patro and Madhu sundan patro for their unrelenting moral supports in

difficult times.I wish to express my deep gratitude towards all of my friends and colleagues for

providing me constant moral support, their support makes my stay in institute pleasant. I have

enjoyed every moment that I spent with all of you.

And finally I am thankful to God in whom I trust.

Date: Badri Narayan Patro