object tracking from unmanned aerial vehicle...
TRANSCRIPT
OBJECT TRACKING FROM UNMANNED AERIAL VEHICLE USING
KANADE-LUCAS-TOMASI TRACKER AND SPEEDED UP ROBUST
FEATURES
FALAH JABAR RAHIM
A project report submitted in partial fulfilment of the
requirements for the award of the degree of Master of
Engineering (Electrical-Electronics & Telecommunications)
Faculty of Electrical Engineering
Universiti Teknologi Malaysia
JANUARY 2015
iii
This thesis is dedicated to my parents.
For their endless love, support, encouragement.
iv
ACKNOWLEDGEMENT
My first thanks is for my supervisor, Dr. USMAN ULLAH SHEIKH. This
thesis would not have been possible without his valuable feedback. Most
importantly, I want to thank to my family for endless support throughout my whole
studies.
v
ABSTRACT
Object detection and tracking are important and challenging tasks in many
computer vision applications such as surveillance, vehicle navigation, and
autonomous robot navigation. Video surveillance in a dynamic environment,
especially for humans and vehicles, is one of the current challenging research topics
in computer vision. In this thesis a tracking and location method for Unmanned
Aerial VehicleVision System called UAV tracker is proposed. In this method, the
target is extracted from the Region of Interested (ROI) automatically by Speeded Up
Robust Features (SURF); then the Kanade–Lucas–Tomasi tracker is used to get the
target’s position in the sequence images. The proposed framework learns about target
appearance by updating the object module in each frame, which can further improve
the robustness of tracker as well as feature extraction and matching process.
Extensive experimental results are provided by comparing proposed algorithm with
(15) related approaches on (15) challenging sequences, which demonstrate the robust
tracking achieved by proposed tracker. Experimental results show that the proposed
method deals with translation, rotation, partial occlusion, deformation, pose, scale
changes, similar appearance and illumination change successfully.
vi
ABSTRAK
Pengesanan penjejakan objek merupakan salah satu perkara penting dan
mencabar dalam bidang penglihatan komputer terutamanya untuk aplikasi seperti
pemantauan, navigasi kenderaan dan navigasi robot berautonomi. Pemantauan video
dalam persekitaran dinamik terutamanya untuk manusia dan kenderaan merupakan
salah satu topik hangat dalam bidang penglihatan komputer. Thesis ini
mencadangkan kaedah penjejakan dan lokasi untuk sistem penglihatan kenderaan
udara tanpa pemandu (UAV) yang juga dikenali sebagai penjejak UAV. Dalam
kaedah yang dicadang, sasaran akan diekstrak dari kawasan berkepentingan (ROI)
secara automatik menggunakan Cirian Teguh Yang Dipercepatkan (SURF) dan
kemudian dengan penjejak Kanade-Lucas-Tomasi digunakan untuk menentukan
lokasi sasaran dalam urutan imej. Rangka kerja yang dicadangkan belajar
penampilan sasaran dengan mengemaskini modul objek dalam setiap kerangka yang
dapat meningkatkan keteguhan penjejak dan juga pengekstrakan cirian dan proses
pemadanan. Keputusan experimental yang menyeluruh terhadap 15 urutan yang
mencabar berserta 15 kaedah sedia ada diberikan. Keputusan experimental
menunjukkan bahawa kaedah yang dicadangkan dapat menangani penterjemahan,
putaran, penutupan separa, perubahan pada bentuk, posisi, skala, penampilan yang
serupa dan pencahayaan dengan berjaya.
vii
TABLE OF CONTENTS
CHAPTER TITLE PAGE
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENT iv
ABSTRACT v
ABSTRAK vi
TABLE OF CONTENTS vii
LIST OF TABLES x
LIST OF FIGURES xi
LIST OF ABBERIVATIONS xiii
LIST OF SYMBOLS xv
LIST OF APPENDIXES xvii
1 INTRODUCTION 1
1.1 Introduction 1
1.2 Problem statement 2
1.3 Objectives 2
1.4 Project Scope 3
1.5 Thesis Outline 3
2 LITERATURE REVIEW 4
2.1 Main challenges in video tracking 4
2.2 Literature survey 6
2.2.1 Particle Filter 6
2.2.2 Kalman Filter 6
viii
2.2.3 Kernel based tracking 7
2.2.4 Mean Shift 7
2.2.5 Frame by frame tracking 8
2.2.6 Patches based tracking 9
3 BACKGROUND THEORY 11
3.1 Target Detection 11
3.1.1 Object Representation 12
3.1.2 Feature detection 14
3.1.3 SIFT 15
3.1.4 SURF 21
3.2 Target Tracking 25
3.2.1 Gradient-based Trackers 26
3.2.2 Mean-Shift (MS) Tracker 28
4 METHODOLOGY 31
4.1 System Overview 31
4.2 Initialization 33
4.3 Run-time 33
4.4 Object Detection 34
4.4.1 Feature Extraction and Matching
(SURF)
34
4.4.2 SURF features parameters 35
4.5 Object tracker 37
4.5.1 Forward-Backward error 37
4.6 Random Sample Consensus ( RANSAC) 39
4.6.1 The RANSAC algorithm 40
4.6.2 Results from RANSAC 43
4.7 Update initial frame 44
4.8 Combine detector and tracker 44
5 EXPERIMENTAL RESULTS 45
ix
5.1 Setup 45
5.1.1 Code library 46
5.2 Evaluation Methodology 46
5.3 Dataset 47
5.4 Quantitative analysis 49
5.4.1 Comparison 1 using dataset1 50
5.4.2 Comparison 2 using dataset2 52
5.5 Qualitative analysis 54
5.5.1 Strengths 54
5.5.2 Weaknesses 59
6 CONCLUSION 62
6.1 Research Summary 62
6.2 Research Finding 63
6.3 Recommendation for future work 63
REFERENCES 64
Appendix 70
x
LIST OF TABLES
TABLE NO. TITLE
PAGE
4.1 Basic affine transforms relations 42
5.1 Sequences in dataset1 48
5.2 Sequences in dataset2 49
5.3 Evaluated tracking algorithms 50
A1 Code library folders 70
xi
LIST OF FIGURES
FIGURE NO. TITLE PAGE
2.1 Main challenges in video tracking 5
3.1 Object representation (a)to (i) 14
3.2 DoG method to detect keypoints 16
3.3 The stage of keypoins selection(SIFT) 18
3.4 Keypoint descriptor 20
3.5 Object detection in clutter background by
using(SIFT)
21
3.6 Haar wavelet responses to calculate SURF
descriptor
23
3.7 Detect object of interest to be found in clutter
background by using(SURF)
24
4.1 Architecture of object tracking system 32
4.2 Forward -backward error measurement 38
4.3 Basic affine transform 41
4.4 Result from RANSAC algorithm (a)to (b) 43
5.1 Overlapping between two boxes 47
5.2 Tracking sequences for evaluation 48
5.3 The first frame of each sequence where the object
Defined by yellow box
49
5.4 Overall performance score for each tracker on
dataset1
51
5.5 Overall performance score for each tracker on
dataset2
52
5.6 Tracking results 53
5.7 Tracking under scale change 54
xii
5.8 Illumination changes 55
5.9 Appearance changes 56
5.10 Partial occlusion 57
5.11 Target similarity 58
5.12 Example of out-of-plane rotation 59
5.13 Failure in tracking when full occlusion occurs 60
5.14 Effect of updating he initial frame 61
xiii
LIST OF ABBREVIATIONS
AUC - Area under curve
BRISK - Binary Robust Invariant Scalable Keypoints
BSBT - Beyond Semi-Supervised Tracking
CPF - Convolution particle filtering
DoG - Difference of Gaussians
DFT - Distribution Field Tracking
EKF - Extended Kalman filter
EMD - Earth Mover’s Distance
FRAG - Fragments
KF - Kalman filter
KMS - Kernel Mean-shift
KLT - Kanade-Lucas-Tomasi
LoG - Laplacian of Gaussians
LOT - Locally Orderless Tracking
MHT - Multiple Hypothesis Trackers
MS - Mean-Shift Tracker
MIL - Multiple Instance Learning
RANSAC - Random Sample Consensus
ROI - Region of interest
SIFT - Scale-Invariant Feature Transform
SURF - Speeded Up Robust Features
xiv
SAD - Sum of absolute differences
SHT - Single Hypothesis Trackers
SSD - Sum of squared differences
SemiT - Semi-supervised Tracker
UAV - Unmanned Aerial Vehicle
TLD - Tracking Learning Detection
xv
LIST OF SYMBOLS
𝜎 - Scale parameter
𝐺 - Gaussian kernel
𝐼 - Input image
𝐿 - Smoothed image
𝐷 - Stands for DoG
𝐻 - Hessian matrix
𝐷𝑥𝑥 - Partial derivative in x-axis
𝐷𝑥𝑦 - Partial derivative in y-axis
v - Descriptor vector
W - Window size
𝑥𝑡 - Object state
𝓌 - Pixel location
𝑏𝑇 - Template transpose
𝐼𝑡 - Template
𝐼𝑘 - Original image
�̂� - Model histogram
�̂� - Candidate histogram
𝑁𝑏 - Number of pixels
�̂�𝑢 - Target model
�̂�𝑢 - Likelihood value
𝜌 - Bhattacharyya coefficient
xvi
∩ - Intersection
∪ - Union
S - Intersection area
𝑡0 - Overlapping threshold
xvii
LIST OF APPENDIX
APPENDIX TITLE PAGE
A Code library 70
1
CHAPTER 1
INTRODUCTION
1. 1 Introduction
Video surveillance is an active research topic in computer vision that focuses
on to detection, recognition and tracking objects over a sequence of images and it
also makes an attempt to understand and describe object behaviour by replacing the
aging old traditional method of monitoring cameras by human operators. Object
detection and tracking are important and challenging tasks in many computer vision
applications such as surveillance, vehicle navigation and autonomous robot
navigation. Object detection involves locating objects in the frame of a video
sequence. Every tracking method requires an object detection mechanism either in
every frame or when the object first appears in the video. Object tracking is the
process of locating an object or multiple objects over time using a camera. The high
availability of computers, high quality video and Unmanned Aerial Vehicles (UAV)
to perform challenging surveillance, monitoring tasks, has generated a great deal of
interest in object tracking algorithms.
In order to achieve the goal of autonomous UAV operation, algorithms for
automatic target detection and tracking are deployed on aerial platforms.
Nevertheless, processing of UAV imagery is challenging due to several factors such
as rapid camera motion and low-resolution images captured from high altitude
vehicles making it difficult to discriminate similar targets. Additional challenges
2
stem from the fact that targets may undergo significant appearance changes and they
can also go in and out of camera field of view. Since the technology has matured and
the general hardware is sufficient, so it is necessary to start conducting research in
the area and develop a video tracking solution that can be applicable to surveillance
missions carried out by an UAV.
1. 2 Problem Statement
This project aims to propose an object-tracking algorithm, which can detect
and track moving or stationary object in video streams, which is captured from a
moving or stationary camera. This algorithm will address some of the tracking
challenges such as; pose change, partial occlusion, appearance and viewpoint
changes, background clutter, scale changes and Illumination changes.
1. 3 Objectives
(i) Given a bounding box defining the object of interest in a single frame, the
goal is to automatically determine the object’s bounding box in every frame
that follows.
(ii) To design object detection and tracking which overcome the following
challenges :
Illumination changes.
Similar targets.
Clutter background.
Scale change.
Out-of- plane rotation.
3
1. 4 Project Scope
The aim of this project is to propose a tracking algorithm which can track
single moving or stationary object in video streams, which are captured from moving
or stationary camera.
1. 5 Thesis Outline
The structure of this thesis is as follows. In chapter 2 reviewed previous
works related to object tracking. Chapter 3 describes the methodology of the
proposed object tracking method. Chapter 4 goes to explain object tracking method
and, implementing the algorithms with necessary parameters. In Chapter 5 the
experimental results and performance of the proposed algorithm is provided. In 6, the
work is concluded and direction for future work is proposed.
64
REFERENCES
Avidan, S (2007). Ensemble tracking. Pattern Analysis and Machine Intelligence,
IEEE Transactions on, 29(2), 261-271.
Adam, A., Rivlin, E., and Shimshoni, I. (2006, June). Robust fragments-based
tracking using the integral histogram. In Computer Vision and Pattern
Recognition, 2006 IEEE Computer Society Conference on (Vol. 1, pp. 798-
805). IEEE.
Bradski, G. R. (1998). Computer vision face tracking for use in a perceptual user
interface
Babenko, B., Yang, M. H., and Belongie, S. (2009, June). Visual tracking
Online multiple instance learning. In Computer Vision and Pattern
Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 983-990). IEEE.
Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008). Speeded-up robust
features (SURF). Computer vision and image understanding, 110(3), 346-359.
Babenko, B., Yang, M. H., and Belongie, S. (2011). Robust object tracking with
online multiple instance learning. Pattern Analysis and Machine Intelligence,
IEEE Transactions on, 33(8), 1619-1632.
Comaniciu, D., Ramesh, V., and Meer, P. (2003). Kernel-based object tracking.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(5), 564-
577.
Collins, R. T., Liu, Y., and Leordeanu, M. (2005). Online selection of discriminative
tracking features. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 27(10), 1631-1643.
65
Chapelle, O., B. Schölkopf, and A. Zien, editors. Semi-Supervised Learning.
The MIT Press, Sept. 2006. 3, 28, 29
Cheng, P., Zhou, G., and Zheng, Z. (2009). Detecting and counting vehicles from
small low-cost UAV images. In ASPRS 2009 Annual Conference, Baltimore
(Vol. 3, pp. 9-13).
Comaniciu, D., Ramesh V., and Meer P.(2000) Detection and Tracking of Point
Features Computer Vision and Pattern Recognition, Proceedings. IEEE
Conference, Vol. 2, pages 142-149, 2000
Collins, R., Zhou, X., and Teh, S. K. (2005, January). An open source tracking
testbed and evaluation web site. In IEEE International Workshop on
Performance Evaluation of Tracking and Surveillance (pp. 17-24).
Collins, R. T. (2003, June). Mean-shift blob tracking through scale space. In
Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE
Computer Society Conference on (Vol. 2, pp. II-234). IEEE.
Del Moral, P. (1996). Non-linear filtering: interacting particle resolution. Markov
Processes and related fields, 2(4), 555-581.
Fischler, M. A., and Bolles, R. C. (1981). Random sample consensus: a paradigm for
model fitting with applications to image analysis and automated cartography.
Communications of the ACM, 24(6), 381-395.
Grabner, H., Leistner, C., and Bischof, H. (2008). Semi-supervised on-line boosting
for robust tracking. In Computer Vision–ECCV 2008 (pp. 234-247). Springer
Berlin Heidelberg.
Grabner, M., Grabner, H., and Bischof, H. (2007, June). Learning features for
tracking. In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE
Conference on (pp. 1-8). IEEE.
66
Grabner, H., Grabner, M., and Bischof, H. (2006, September). Real-Time Tracking
via On-line Boosting. In BMVC (Vol. 1, No. 5, p. 6).
Haykin, S. S. (Ed.). (2001). Kalman filtering and neural networks (pp. 123-174).
New York: Wiley.
Hare, S., Saffari, A., and Torr, P. H. (2011, November). Struck: Structured output
Shahed Nejhum, S. M., Ho, J., and Yang, M. H. (2008, June). Visual tracking with
histograms and articulating blocks. In Computer Vision and Pattern
Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1-8). IEEE.
Hua, G., and Wu, Y. (2006, June). Measurement integration under inconsistency for
robust tracking. In Computer Vision and Pattern Recognition, 2006 IEEE
Computer Society Conference on (Vol. 1, pp. 650-657). IEEE.
Harris, C., and Stephens, M. (1988, August). A combined corner and edge detector.
In Alvey vision conference (Vol. 15, p. 50).
Henriques, J. F., Caseiro, R., Martins, P., and Batista, J. (2012). Exploiting the
Circulant structure of tracking-by-detection with kernels. In Computer Vision–
ECCV 2012 (pp. 702-715). Springer Berlin Heidelberg.
Hare, S., Saffari, A., and Torr, P. H. (2012, June). Efficient online structured output
learning for keypoint-based object tracking. In Computer Vision and Pattern
Recognition (CVPR), 2012 IEEE Conference on (pp. 1894-1901). IEEE.
Isard, M., and Blake, A. (1998). Condensation—conditional density propagation for
visual tracking. International journal of computer vision, 29(1), 5-28.
Javed, O., Ali, S., and Shah, M. (2005, June). Online detection and classification of
moving objects using progressively improving detectors. In Computer Vision
and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society
Conference on (Vol. 1, pp. 696-701). IEEE.
67
Kalal, Z., Mikolajczyk, K., and Matas, J. (2012). Tracking-learning-detection.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(7), 1409-
1422.
Kalal, Z., Matas, J., and Mikolajczyk, K. (2010, June). Pn learning: Bootstrapping
binary classifiers by structural constraints. In Computer Vision and Pattern
Recognition (CVPR), 2010 IEEE Conference on (pp. 49-56). IEEE.
Kalal, Z., Mikolajczyk, K., and Matas, J. (2010, August). Forward-backward error:
Automatic detection of tracking failures. In Pattern Recognition (ICPR), 2010
20th International Conference on (pp. 2756-2759). IEEE.
Lucas, B. D., and Kanade, T. (1981, August). An iterative image registration
technique with an application to stereo vision. In IJCAI (Vol. 81, pp. 674-679).
Lucas, B. D., and Kanade, T. (1981, August). An iterative image registration
technique with an application to stereo vision. In IJCAI (Vol. 81, pp. 674-679).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints.
International journal of computer vision, 60(2), 91-110.
Leutenegger, S., Chli, M., and Siegwart, R. Y. (2011, November). BRISK: Binary
robust invariant scalable keypoints. In Computer Vision (ICCV), 2011 IEEE
International Conference on (pp. 2548-2555). IEEE.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints.
International journal of computer vision, 60(2), 91-110.
Maggio, E., and Cavallaro, A. (2011). Video tracking: theory and practice.
John Wiley & Sons.
Nebehay, Georg and Pflugfelder, Roman. (2014). Consensus-based Matching and
Tracking of Keypoints for Object Tracking. IEEE.
68
Oron, S., Bar-Hillel, A., Levi, D., and Avidan, S. (2012, June). Locally orderless
tracking. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE
Conference on (pp. 1940-1947). IEEE.
Ozuysal, M., Calonder, M., Lepetit, V., and Fua, P. (2010). Fast keypoint
recognition using random ferns. Pattern Analysis and Machine Intelligence,
IEEE Transactions on, 32(3), 448-461.
Özuysal, M., Lepetit, V., Fleuret, F., and Fua, P. (2006). Feature harvesting for
tracking-by-detection. In Computer Vision–ECCV 2006 (pp. 592-605).
Springer Berlin Heidelberg.
Pérez, P., Hue, C., Vermaak, J., and Gangnet, M. (2002). Color-based probabilistic
tracking. In Computer vision—ECCV 2002 (pp. 661-675). Springer Berlin
Heidelberg.
Rosten, E., and Drummond, T. (2005, October). Fusing points and lines for high
performance tracking. In Computer Vision, 2005. ICCV 2005. Tenth IEEE
International Conference on (Vol. 2, pp. 1508-1515). IEEE.
Ross, D. A., Lim, J., Lin, R. S., and Yang, M. H. (2008). Incremental learning for
robust visual tracking. International Journal of Computer Vision, 77(1-3), 125-
141.
Stalder, S., Grabner, H., and Van Gool, L. (2009, September). Beyond semi-
supervised tracking: Tracking should be as simple as detection, but not simpler
than recognition. In Computer Vision Workshops (ICCV Workshops), 2009
IEEE 12th International Conference on (pp. 1409-1416). IEEE.
Santner, J., Leistner, C., Saffari, A., Pock, T., and Bischof, H. (2010, June). Prost:
Parallel robust online simple tracking. In Computer Vision and Pattern
Recognition (CVPR), 2010 IEEE Conference on (pp. 723-730). IEEE.
Stauffer, C., and Grimson, W. E. L. (2000). Learning patterns of activity using real-
time tracking. Pattern Analysis and Machine Intelligence, IEEE Transactions
69
on, 22(8), 747-757.
Sevilla, L., and Learned-Miller, E. (2012, June). Distribution fields for tracking
Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 1910-
1917). IEEE.
Tomasi, C., and Kanade, T. (1991). Detection and tracking of point features.
Pittsburgh: School of Computer Science, Carnegie Mellon Univ.
Viola, P., and Jones, M. (2001). Rapid object detection using a boosted cascade of
simple features. In Computer Vision and Pattern Recognition, 2001. CVPR
2001. Proceedings of the 2001 IEEE Computer Society Conference on (Vol. 1,
pp. I-511). IEEE.
Wu, Y., Lim, J., and Yang, M. H. (2013, June). Online object tracking:
A benchmark. In Computer Vision and Pattern Recognition (CVPR), 2013
IEEE Conference on (pp. 2411-2418). IEEE.
Xiao, J., and Shah, M. (2005). Motion layer extraction in the presence of occlusion
using graph cuts. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 27(10), 1644-1659.
Yilmaz, A., Javed, O., and Shah, M. (2006). Object tracking: A survey.
Acm computing surveys (CSUR), 38(4), 13.
Zhang, L., and van der Maaten, L. (2013, June). Structure preserving object tracking.
In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference
on (pp. 1838-1845). IEEE.