object tracking from unmanned aerial vehicle...

OBJECT TRACKING FROM UNMANNED AERIAL VEHICLE USING

KANADE-LUCAS-TOMASI TRACKER AND SPEEDED UP ROBUST

FEATURES

FALAH JABAR RAHIM

A project report submitted in partial fulfilment of the

requirements for the award of the degree of Master of

Engineering (Electrical-Electronics & Telecommunications)

Faculty of Electrical Engineering

Universiti Teknologi Malaysia

JANUARY 2015

iii

This thesis is dedicated to my parents.

For their endless love, support, encouragement.

iv

ACKNOWLEDGEMENT

My first thanks is for my supervisor, Dr. USMAN ULLAH SHEIKH. This

thesis would not have been possible without his valuable feedback. Most

importantly, I want to thank to my family for endless support throughout my whole

studies.

v

ABSTRACT

Object detection and tracking are important and challenging tasks in many

computer vision applications such as surveillance, vehicle navigation, and

autonomous robot navigation. Video surveillance in a dynamic environment,

especially for humans and vehicles, is one of the current challenging research topics

in computer vision. In this thesis a tracking and location method for Unmanned

Aerial VehicleVision System called UAV tracker is proposed. In this method, the

target is extracted from the Region of Interested (ROI) automatically by Speeded Up

Robust Features (SURF); then the Kanade–Lucas–Tomasi tracker is used to get the

target’s position in the sequence images. The proposed framework learns about target

appearance by updating the object module in each frame, which can further improve

the robustness of tracker as well as feature extraction and matching process.

Extensive experimental results are provided by comparing proposed algorithm with

(15) related approaches on (15) challenging sequences, which demonstrate the robust

tracking achieved by proposed tracker. Experimental results show that the proposed

method deals with translation, rotation, partial occlusion, deformation, pose, scale

changes, similar appearance and illumination change successfully.

vi

ABSTRAK

Pengesanan penjejakan objek merupakan salah satu perkara penting dan

mencabar dalam bidang penglihatan komputer terutamanya untuk aplikasi seperti

pemantauan, navigasi kenderaan dan navigasi robot berautonomi. Pemantauan video

dalam persekitaran dinamik terutamanya untuk manusia dan kenderaan merupakan

salah satu topik hangat dalam bidang penglihatan komputer. Thesis ini

mencadangkan kaedah penjejakan dan lokasi untuk sistem penglihatan kenderaan

udara tanpa pemandu (UAV) yang juga dikenali sebagai penjejak UAV. Dalam

kaedah yang dicadang, sasaran akan diekstrak dari kawasan berkepentingan (ROI)

secara automatik menggunakan Cirian Teguh Yang Dipercepatkan (SURF) dan

kemudian dengan penjejak Kanade-Lucas-Tomasi digunakan untuk menentukan

lokasi sasaran dalam urutan imej. Rangka kerja yang dicadangkan belajar

penampilan sasaran dengan mengemaskini modul objek dalam setiap kerangka yang

dapat meningkatkan keteguhan penjejak dan juga pengekstrakan cirian dan proses

pemadanan. Keputusan experimental yang menyeluruh terhadap 15 urutan yang

mencabar berserta 15 kaedah sedia ada diberikan. Keputusan experimental

menunjukkan bahawa kaedah yang dicadangkan dapat menangani penterjemahan,

putaran, penutupan separa, perubahan pada bentuk, posisi, skala, penampilan yang

serupa dan pencahayaan dengan berjaya.

vii

TABLE OF CONTENTS

CHAPTER TITLE PAGE

DECLARATION ii

DEDICATION iii

ACKNOWLEDGEMENT iv

ABSTRACT v

ABSTRAK vi

TABLE OF CONTENTS vii

LIST OF TABLES x

LIST OF FIGURES xi

LIST OF ABBERIVATIONS xiii

LIST OF SYMBOLS xv

LIST OF APPENDIXES xvii

1 INTRODUCTION 1

1.1 Introduction 1

1.2 Problem statement 2

1.3 Objectives 2

1.4 Project Scope 3

1.5 Thesis Outline 3

2 LITERATURE REVIEW 4

2.1 Main challenges in video tracking 4

2.2 Literature survey 6

2.2.1 Particle Filter 6

2.2.2 Kalman Filter 6

viii

2.2.3 Kernel based tracking 7

2.2.4 Mean Shift 7

2.2.5 Frame by frame tracking 8

2.2.6 Patches based tracking 9

3 BACKGROUND THEORY 11

3.1 Target Detection 11

3.1.1 Object Representation 12

3.1.2 Feature detection 14

3.1.3 SIFT 15

3.1.4 SURF 21

3.2 Target Tracking 25

3.2.1 Gradient-based Trackers 26

3.2.2 Mean-Shift (MS) Tracker 28

4 METHODOLOGY 31

4.1 System Overview 31

4.2 Initialization 33

4.3 Run-time 33

4.4 Object Detection 34

4.4.1 Feature Extraction and Matching

(SURF)

34

4.4.2 SURF features parameters 35

4.5 Object tracker 37

4.5.1 Forward-Backward error 37

4.6 Random Sample Consensus ( RANSAC) 39

4.6.1 The RANSAC algorithm 40

4.6.2 Results from RANSAC 43

4.7 Update initial frame 44

4.8 Combine detector and tracker 44

5 EXPERIMENTAL RESULTS 45

ix

5.1 Setup 45

5.1.1 Code library 46

5.2 Evaluation Methodology 46

5.3 Dataset 47

5.4 Quantitative analysis 49

5.4.1 Comparison 1 using dataset1 50

5.4.2 Comparison 2 using dataset2 52

5.5 Qualitative analysis 54

5.5.1 Strengths 54

5.5.2 Weaknesses 59

6 CONCLUSION 62

6.1 Research Summary 62

6.2 Research Finding 63

6.3 Recommendation for future work 63

REFERENCES 64

Appendix 70

x

LIST OF TABLES

TABLE NO. TITLE

PAGE

4.1 Basic affine transforms relations 42

5.1 Sequences in dataset1 48

5.2 Sequences in dataset2 49

5.3 Evaluated tracking algorithms 50

A1 Code library folders 70

xi

LIST OF FIGURES

FIGURE NO. TITLE PAGE

2.1 Main challenges in video tracking 5

3.1 Object representation (a)to (i) 14

3.2 DoG method to detect keypoints 16

3.3 The stage of keypoins selection(SIFT) 18

3.4 Keypoint descriptor 20

3.5 Object detection in clutter background by

using(SIFT)

21

3.6 Haar wavelet responses to calculate SURF

descriptor

23

3.7 Detect object of interest to be found in clutter

background by using(SURF)

24

4.1 Architecture of object tracking system 32

4.2 Forward -backward error measurement 38

4.3 Basic affine transform 41

4.4 Result from RANSAC algorithm (a)to (b) 43

5.1 Overlapping between two boxes 47

5.2 Tracking sequences for evaluation 48

5.3 The first frame of each sequence where the object

Defined by yellow box

49

5.4 Overall performance score for each tracker on

dataset1

51

5.5 Overall performance score for each tracker on

dataset2

52

5.6 Tracking results 53

5.7 Tracking under scale change 54

xii

5.8 Illumination changes 55

5.9 Appearance changes 56

5.10 Partial occlusion 57

5.11 Target similarity 58

5.12 Example of out-of-plane rotation 59

5.13 Failure in tracking when full occlusion occurs 60

5.14 Effect of updating he initial frame 61

xiii

LIST OF ABBREVIATIONS

AUC - Area under curve

BRISK - Binary Robust Invariant Scalable Keypoints

BSBT - Beyond Semi-Supervised Tracking

CPF - Convolution particle filtering

DoG - Difference of Gaussians

DFT - Distribution Field Tracking

EKF - Extended Kalman filter

EMD - Earth Mover’s Distance

FRAG - Fragments

KF - Kalman filter

KMS - Kernel Mean-shift

KLT - Kanade-Lucas-Tomasi

LoG - Laplacian of Gaussians

LOT - Locally Orderless Tracking

MHT - Multiple Hypothesis Trackers

MS - Mean-Shift Tracker

MIL - Multiple Instance Learning

RANSAC - Random Sample Consensus

ROI - Region of interest

SIFT - Scale-Invariant Feature Transform

SURF - Speeded Up Robust Features

xiv

SAD - Sum of absolute differences

SHT - Single Hypothesis Trackers

SSD - Sum of squared differences

SemiT - Semi-supervised Tracker

UAV - Unmanned Aerial Vehicle

TLD - Tracking Learning Detection

xv

LIST OF SYMBOLS

𝜎 - Scale parameter

𝐺 - Gaussian kernel

𝐼 - Input image

𝐿 - Smoothed image

𝐷 - Stands for DoG

𝐻 - Hessian matrix

𝐷𝑥𝑥 - Partial derivative in x-axis

𝐷𝑥𝑦 - Partial derivative in y-axis

v - Descriptor vector

W - Window size

𝑥𝑡 - Object state

𝓌 - Pixel location

𝑏𝑇 - Template transpose

𝐼𝑡 - Template

𝐼𝑘 - Original image

�̂� - Model histogram

�̂� - Candidate histogram

𝑁𝑏 - Number of pixels

�̂�𝑢 - Target model

�̂�𝑢 - Likelihood value

𝜌 - Bhattacharyya coefficient

xvi

∩ - Intersection

∪ - Union

S - Intersection area

𝑡0 - Overlapping threshold

xvii

LIST OF APPENDIX

APPENDIX TITLE PAGE

A Code library 70

1

CHAPTER 1

INTRODUCTION

1. 1 Introduction

Video surveillance is an active research topic in computer vision that focuses

on to detection, recognition and tracking objects over a sequence of images and it

also makes an attempt to understand and describe object behaviour by replacing the

aging old traditional method of monitoring cameras by human operators. Object

detection and tracking are important and challenging tasks in many computer vision

applications such as surveillance, vehicle navigation and autonomous robot

navigation. Object detection involves locating objects in the frame of a video

sequence. Every tracking method requires an object detection mechanism either in

every frame or when the object first appears in the video. Object tracking is the

process of locating an object or multiple objects over time using a camera. The high

availability of computers, high quality video and Unmanned Aerial Vehicles (UAV)

to perform challenging surveillance, monitoring tasks, has generated a great deal of

interest in object tracking algorithms.

In order to achieve the goal of autonomous UAV operation, algorithms for

automatic target detection and tracking are deployed on aerial platforms.

Nevertheless, processing of UAV imagery is challenging due to several factors such

as rapid camera motion and low-resolution images captured from high altitude

vehicles making it difficult to discriminate similar targets. Additional challenges

2

stem from the fact that targets may undergo significant appearance changes and they

can also go in and out of camera field of view. Since the technology has matured and

the general hardware is sufficient, so it is necessary to start conducting research in

the area and develop a video tracking solution that can be applicable to surveillance

missions carried out by an UAV.

1. 2 Problem Statement

This project aims to propose an object-tracking algorithm, which can detect

and track moving or stationary object in video streams, which is captured from a

moving or stationary camera. This algorithm will address some of the tracking

challenges such as; pose change, partial occlusion, appearance and viewpoint

changes, background clutter, scale changes and Illumination changes.

1. 3 Objectives

(i) Given a bounding box defining the object of interest in a single frame, the

goal is to automatically determine the object’s bounding box in every frame

that follows.

(ii) To design object detection and tracking which overcome the following

challenges :

Illumination changes.

Similar targets.

Clutter background.

Scale change.

Out-of- plane rotation.

3

1. 4 Project Scope

The aim of this project is to propose a tracking algorithm which can track

single moving or stationary object in video streams, which are captured from moving

or stationary camera.

1. 5 Thesis Outline

The structure of this thesis is as follows. In chapter 2 reviewed previous

works related to object tracking. Chapter 3 describes the methodology of the

proposed object tracking method. Chapter 4 goes to explain object tracking method

and, implementing the algorithms with necessary parameters. In Chapter 5 the

experimental results and performance of the proposed algorithm is provided. In 6, the

work is concluded and direction for future work is proposed.

64

REFERENCES

Avidan, S (2007). Ensemble tracking. Pattern Analysis and Machine Intelligence,

IEEE Transactions on, 29(2), 261-271.

Adam, A., Rivlin, E., and Shimshoni, I. (2006, June). Robust fragments-based

tracking using the integral histogram. In Computer Vision and Pattern

Recognition, 2006 IEEE Computer Society Conference on (Vol. 1, pp. 798-

805). IEEE.

Bradski, G. R. (1998). Computer vision face tracking for use in a perceptual user

interface

Babenko, B., Yang, M. H., and Belongie, S. (2009, June). Visual tracking

Online multiple instance learning. In Computer Vision and Pattern

Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 983-990). IEEE.

Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008). Speeded-up robust

features (SURF). Computer vision and image understanding, 110(3), 346-359.

Babenko, B., Yang, M. H., and Belongie, S. (2011). Robust object tracking with

online multiple instance learning. Pattern Analysis and Machine Intelligence,


Comaniciu, D., Ramesh, V., and Meer, P. (2003). Kernel-based object tracking.

Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(5), 564-

577.

Collins, R. T., Liu, Y., and Leordeanu, M. (2005). Online selection of discriminative

tracking features. Pattern Analysis and Machine Intelligence, IEEE

Transactions on, 27(10), 1631-1643.

65

Chapelle, O., B. Schölkopf, and A. Zien, editors. Semi-Supervised Learning.

The MIT Press, Sept. 2006. 3, 28, 29

Cheng, P., Zhou, G., and Zheng, Z. (2009). Detecting and counting vehicles from

small low-cost UAV images. In ASPRS 2009 Annual Conference, Baltimore

(Vol. 3, pp. 9-13).

Comaniciu, D., Ramesh V., and Meer P.(2000) Detection and Tracking of Point

Features Computer Vision and Pattern Recognition, Proceedings. IEEE

Conference, Vol. 2, pages 142-149, 2000

Collins, R., Zhou, X., and Teh, S. K. (2005, January). An open source tracking

testbed and evaluation web site. In IEEE International Workshop on

Performance Evaluation of Tracking and Surveillance (pp. 17-24).

Collins, R. T. (2003, June). Mean-shift blob tracking through scale space. In

Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE

Computer Society Conference on (Vol. 2, pp. II-234). IEEE.

Del Moral, P. (1996). Non-linear filtering: interacting particle resolution. Markov

Processes and related fields, 2(4), 555-581.

Fischler, M. A., and Bolles, R. C. (1981). Random sample consensus: a paradigm for

model fitting with applications to image analysis and automated cartography.

Communications of the ACM, 24(6), 381-395.

Grabner, H., Leistner, C., and Bischof, H. (2008). Semi-supervised on-line boosting

for robust tracking. In Computer Vision–ECCV 2008 (pp. 234-247). Springer

Berlin Heidelberg.

Grabner, M., Grabner, H., and Bischof, H. (2007, June). Learning features for

tracking. In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE

Conference on (pp. 1-8). IEEE.

66

Grabner, H., Grabner, M., and Bischof, H. (2006, September). Real-Time Tracking

via On-line Boosting. In BMVC (Vol. 1, No. 5, p. 6).

Haykin, S. S. (Ed.). (2001). Kalman filtering and neural networks (pp. 123-174).

New York: Wiley.

Hare, S., Saffari, A., and Torr, P. H. (2011, November). Struck: Structured output

Shahed Nejhum, S. M., Ho, J., and Yang, M. H. (2008, June). Visual tracking with

histograms and articulating blocks. In Computer Vision and Pattern

Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1-8). IEEE.

Hua, G., and Wu, Y. (2006, June). Measurement integration under inconsistency for

robust tracking. In Computer Vision and Pattern Recognition, 2006 IEEE

Computer Society Conference on (Vol. 1, pp. 650-657). IEEE.

Harris, C., and Stephens, M. (1988, August). A combined corner and edge detector.

In Alvey vision conference (Vol. 15, p. 50).

Henriques, J. F., Caseiro, R., Martins, P., and Batista, J. (2012). Exploiting the

Circulant structure of tracking-by-detection with kernels. In Computer Vision–

ECCV 2012 (pp. 702-715). Springer Berlin Heidelberg.

Hare, S., Saffari, A., and Torr, P. H. (2012, June). Efficient online structured output

learning for keypoint-based object tracking. In Computer Vision and Pattern

Recognition (CVPR), 2012 IEEE Conference on (pp. 1894-1901). IEEE.

Isard, M., and Blake, A. (1998). Condensation—conditional density propagation for

visual tracking. International journal of computer vision, 29(1), 5-28.

Javed, O., Ali, S., and Shah, M. (2005, June). Online detection and classification of

moving objects using progressively improving detectors. In Computer Vision

and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society

Conference on (Vol. 1, pp. 696-701). IEEE.

67

Kalal, Z., Mikolajczyk, K., and Matas, J. (2012). Tracking-learning-detection.

Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(7), 1409-

1422.

Kalal, Z., Matas, J., and Mikolajczyk, K. (2010, June). Pn learning: Bootstrapping

binary classifiers by structural constraints. In Computer Vision and Pattern


Kalal, Z., Mikolajczyk, K., and Matas, J. (2010, August). Forward-backward error:

Automatic detection of tracking failures. In Pattern Recognition (ICPR), 2010

20th International Conference on (pp. 2756-2759). IEEE.

Lucas, B. D., and Kanade, T. (1981, August). An iterative image registration

technique with an application to stereo vision. In IJCAI (Vol. 81, pp. 674-679).

Lucas, B. D., and Kanade, T. (1981, August). An iterative image registration

technique with an application to stereo vision. In IJCAI (Vol. 81, pp. 674-679).

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints.

International journal of computer vision, 60(2), 91-110.

Leutenegger, S., Chli, M., and Siegwart, R. Y. (2011, November). BRISK: Binary

robust invariant scalable keypoints. In Computer Vision (ICCV), 2011 IEEE

International Conference on (pp. 2548-2555). IEEE.

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints.

International journal of computer vision, 60(2), 91-110.

Maggio, E., and Cavallaro, A. (2011). Video tracking: theory and practice.

John Wiley & Sons.

Nebehay, Georg and Pflugfelder, Roman. (2014). Consensus-based Matching and

Tracking of Keypoints for Object Tracking. IEEE.

68

Oron, S., Bar-Hillel, A., Levi, D., and Avidan, S. (2012, June). Locally orderless

tracking. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE

Conference on (pp. 1940-1947). IEEE.

Ozuysal, M., Calonder, M., Lepetit, V., and Fua, P. (2010). Fast keypoint

recognition using random ferns. Pattern Analysis and Machine Intelligence,


Özuysal, M., Lepetit, V., Fleuret, F., and Fua, P. (2006). Feature harvesting for

tracking-by-detection. In Computer Vision–ECCV 2006 (pp. 592-605).

Springer Berlin Heidelberg.

Pérez, P., Hue, C., Vermaak, J., and Gangnet, M. (2002). Color-based probabilistic

tracking. In Computer vision—ECCV 2002 (pp. 661-675). Springer Berlin

Heidelberg.

Rosten, E., and Drummond, T. (2005, October). Fusing points and lines for high

performance tracking. In Computer Vision, 2005. ICCV 2005. Tenth IEEE

International Conference on (Vol. 2, pp. 1508-1515). IEEE.

Ross, D. A., Lim, J., Lin, R. S., and Yang, M. H. (2008). Incremental learning for

robust visual tracking. International Journal of Computer Vision, 77(1-3), 125-

141.

Stalder, S., Grabner, H., and Van Gool, L. (2009, September). Beyond semi-

supervised tracking: Tracking should be as simple as detection, but not simpler

than recognition. In Computer Vision Workshops (ICCV Workshops), 2009

IEEE 12th International Conference on (pp. 1409-1416). IEEE.

Santner, J., Leistner, C., Saffari, A., Pock, T., and Bischof, H. (2010, June). Prost:

Parallel robust online simple tracking. In Computer Vision and Pattern


Stauffer, C., and Grimson, W. E. L. (2000). Learning patterns of activity using real-

time tracking. Pattern Analysis and Machine Intelligence, IEEE Transactions

69

on, 22(8), 747-757.

Sevilla, L., and Learned-Miller, E. (2012, June). Distribution fields for tracking

Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 1910-

1917). IEEE.

Tomasi, C., and Kanade, T. (1991). Detection and tracking of point features.

Pittsburgh: School of Computer Science, Carnegie Mellon Univ.

Viola, P., and Jones, M. (2001). Rapid object detection using a boosted cascade of

simple features. In Computer Vision and Pattern Recognition, 2001. CVPR

2001. Proceedings of the 2001 IEEE Computer Society Conference on (Vol. 1,

pp. I-511). IEEE.

Wu, Y., Lim, J., and Yang, M. H. (2013, June). Online object tracking:

A benchmark. In Computer Vision and Pattern Recognition (CVPR), 2013

IEEE Conference on (pp. 2411-2418). IEEE.

Xiao, J., and Shah, M. (2005). Motion layer extraction in the presence of occlusion

using graph cuts. Pattern Analysis and Machine Intelligence, IEEE

Transactions on, 27(10), 1644-1659.

Yilmaz, A., Javed, O., and Shah, M. (2006). Object tracking: A survey.

Acm computing surveys (CSUR), 38(4), 13.

Zhang, L., and van der Maaten, L. (2013, June). Structure preserving object tracking.

In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference

on (pp. 1838-1845). IEEE.

object tracking from unmanned aerial vehicle...

Documents