tracking-learning- detection zdenek kalal, krystian mikolajczyk, and jiri matas ieee transactions on...

115
Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 7, JULY 2012 1

Upload: evan-davidson

Post on 28-Dec-2015

219 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

Tracking-Learning-Detection

Zdenek Kalal, Krystian Mikolajczyk, and Jiri MatasIEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 7, JULY 2012

1

Page 2: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

Outline1. INTRODUCTION

2. RELATED WORK

3. TRACKING-LEARNING-DETECTION

4. P-N LEARNING

5. IMPLEMENTATION OF TLD

6. QUANTITATIVE EVALUATION

7. CONCLUSIONS

8. LIMITATIONS AND FUTURE WORK

2

Page 3: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

Outline1. INTRODUCTION

2. RELATED WORK

3. TRACKING-LEARNING-DETECTION

4. P-N LEARNING

5. IMPLEMENTATION OF TLD

6. QUANTITATIVE EVALUATION

7. CONCLUSIONS

8. LIMITATIONS AND FUTURE WORK

3

Page 4: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

GoalOur goal is to automatically determine the object’s bounding box or indicate that the object is not visible in every frame

The video stream is to be processed at frame rate and the process should run indefinitely long. We refer to this task as long-term tracking

4

Page 5: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

5

ProblemsWhen the object reappears in the camera’s field of view, the object may change its appearanceTrackers accumulate error during runtime (drift) and typically fail if the object disappears from the camera viewDetectors require an offline training stage and therefore cannot be applied to unknown objects

A successful long-term tracker should handle scale and illumination changes, background clutter, partial occlusions, and operate in real time

Page 6: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

6

Outline1. INTRODUCTION

2. RELATED WORK

3. TRACKING-LEARNING-DETECTION

4. P-N LEARNING

5. IMPLEMENTATION OF TLD

6. QUANTITATIVE EVALUATION

7. CONCLUSIONS

8. LIMITATIONS AND FUTURE WORK

Page 7: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

7

2. RELATED WORK2.1 Object Tracking

2.2 Object Detection

2.3 Machine Learning

2.4 Most Related Approaches

Page 8: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

8

2. RELATED WORK2.1 Object Tracking

2.2 Object Detection

2.3 Machine Learning

2.4 Most Related Approaches

Page 9: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

9

2.1 Object TrackingObject tracking is the task of estimation of the object motion

Various representations of the object are used in practice :PointsArticulated modelsContoursOptical flow

Page 10: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

10

2.1 Object Tracking(cont.)We focus on the methods that represent the objects by geometric shapes and their motion is estimated between consecutive frames (frame-to-frame tracking)

Template tracking is the most straightforward approach in that case. (an image patch, a color histogram)Static : target template does not change(offline)Adaptive : target template is extracted from the previous frame(online)Combine static and adaptive : recognize “reliable” parts of the template

Page 11: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

11

2.1 Object Tracking(cont.)However, Templates have limited modeling capabilities as they represent only a single appearance of the object, the generative models have been proposed

The generative trackers model only the appearance of the object and often fail in cluttered background

So, recent trackers model the environment where the object moves

Page 12: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

12

2. RELATED WORK2.1 Object Tracking

2.2 Object Detection

2.3 Machine Learning

2.4 Most Related Approaches

Page 13: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

13

2.2 Object DetectionObject detection is the task of localization of objects in an input image

Methods are typically based on the local image features or a sliding window

The feature-based approaches usually follow the pipeline of:

1) feature detection

2) feature recognition

3) model fitting

Page 14: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

14

2.2 Object Detection(cont.)The sliding window-based approaches scan the input image by a window of various sizes

For a QVGA frame(240*320), there are roughly 50,000 patches that are evaluated in every frame

To achieve real-time performance, sliding window-based detectors adopted the so-called cascaded architecture

A classifier is separated into a number of stages, each of which enables early rejection of background patches

Page 15: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

15

2.2 Object Detection(cont.)Training of such detectors typically requires a large number of training examples and intensive computation in the training stage to accurately represent the decision boundary between the object and background

An alternative approach is to model the object as a collection of templates, so the learning involves

Page 16: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

16

2. RELATED WORK2.1 Object Tracking

2.2 Object Detection

2.3 Machine Learning

2.4 Most Related Approaches

Page 17: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

17

2.3 Machine LearningObject detectors are traditionally trained assuming that all training examples are labeled (supervised learning)

However, We wish to train a detector from a single labeled example and a video stream

Semi-supervised learning exploits both labeled and unlabeled data

Algorithms relying on similar assumptions including :Expectation- Maximization (EM) : a generic methodSelf-learningCotraining

Page 18: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

18

2. RELATED WORK2.1 Object Tracking

2.2 Object Detection

2.3 Machine Learning

2.4 Most Related Approaches

Page 19: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

19

2.4 Most Related Approaches

Many approaches combine tracking, learning, and detection in some sense. For example :An offline trained detector is used to validate the trajectory output by a tracker and, if the trajectory is not validated, an exhaustive image search is performed to find the targetIntegrate the detector within a particle filtering framework (tracking)

In contrast to our method, these methods rely on an offline trained detector that does not change its properties during runtime

Page 20: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

20

2.4 Most Related Approaches

Although Adaptive discriminative trackers realize tracking by an online learned detector that discriminates the target from its background

A single process represents both tracking and detection

In contrast to our approach, by keeping the tracking and detection separated, our approach does not have to compromise either on tracking or detection capabilities of its components

Page 21: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

21

Outline1. INTRODUCTION

2. RELATED WORK

3. TRACKING-LEARNING-DETECTION

4. P-N LEARNING

5. IMPLEMENTATION OF TLD

6. QUANTITATIVE EVALUATION

7. CONCLUSIONS

8. LIMITATIONS AND FUTURE WORK

Page 22: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

22

TLD framework

object’s motion

moves out

localize

false positives & false negative

estimates detector’s errors

generates training examples

Page 23: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

23

Outline1. INTRODUCTION

2. RELATED WORK

3. TRACKING-LEARNING-DETECTION

4. P-N LEARNING

5. IMPLEMENTATION OF TLD

6. QUANTITATIVE EVALUATION

7. CONCLUSIONS

8. LIMITATIONS AND FUTURE WORK

Page 24: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

24

4. P-N LEARNING4.1 What is P-N Learning

4.2 Formalization

4.3 Stability

4.4 Experiments with Simulated Experts

4.5 Design of Real Experts

Page 25: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

25

4. P-N LEARNING4.1 What is P-N Learning

4.2 Formalization

4.3 Stability

4.4 Experiments with Simulated Experts

4.5 Design of Real Experts

Page 26: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

26

4.1 What is P-N LearningThe goal of the component is to improve the performance of an object detector by online processing of a video stream

The detector errors can be identified by two types of “experts”P-expert : identifies only false negativesN-expert : identifies only false positives

Both of the experts make errors themselves

However, their independence enables mutual compensation of their errors

Page 27: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

27

4. P-N LEARNING4.1 What is P-N Learning

4.2 Formalization

4.3 Stability

4.4 Experiments with Simulated Experts

4.5 Design of Real Experts

Page 28: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

28

4.2 Formalizationx : an example from a feature space X (unlabeled set)

y : a label from a space of labels Y = {-1,1} (a set of labels)

L = {(x,y)} : a labeled set

Input : a labeled set Ll and an unlabeled set Xu, where l << u

The task of P-N learning is to learn a classifier f :X→Y from labeled set Ll and *bootstrap its performance by the unlabeled set Xu

Page 29: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

29

*Supervised BootstrapConsider that the labels of set Xu are known

Recognize misclassified examples and add them to the training set with correct labels

The same idea in P-N learning with the difference that the labels of the set Xu are unknown

Labels are not given but rather estimated using the P-N experts

Page 30: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

30

4.2 Formalizationx : an example from a feature space X (unlabeled set)

y : a label from a space of labels Y = {-1,1} (a set of labels)

L = {(x,y)} : a labeled set

Input : a labeled set Ll and an unlabeled set Xu, where l << u

The task of P-N learning is to learn a classifier f :X→Y from labeled set Ll and *bootstrap its performance by the unlabeled set Xu

Classifier f is a function from a family F

The family F is considered fixed in training which corresponds to estimation of the parameters Θ

Page 31: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

31

4.2 Formalization(cont.)The P-N learning consists of four blocks :

1) A classifier to be learned

2) Training set : a collection of labeled training examples

3) Supervised training : a method that trains a classifier from a training set

4) P-N experts : functions that generate positive and negative training examples during learning

Page 32: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

32

4.2 Formalization(cont.)

Labeled set L Θ0....Θk-1

𝒚𝒖𝒌= 𝒇 (𝒙𝒖∨Θ

𝒌−𝟏)

Page 33: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

33

4.2 Formalization(cont.)The crucial element of P-N learning is the estimation of the classifier errors

The key idea is to separate the estimation of false positives from the estimation of false negatives.

P-expert estimates false negatives, and adds them to training set with positive label n+(k) [increases generality]

N-expert estimates false positives, and adds them to training set with negative label n-(k) [increases discriminability]

Page 34: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

34

4. P-N LEARNING4.1 What is P-N Learning

4.2 Formalization

4.3 Stability

4.4 Experiments with Simulated Experts

4.5 Design of Real Experts

Page 35: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

35

4.3 StabilityThis section analyzes the impact of the P-N learning on the classifier performance

let us consider that the labels of Xu are known. This will allow us to measure both the classifier errors and the errors of the P-N experts.

Classifier false positives : α(k) false negatives : β(k)

P-expert correct : false : +

N-expert correct : false : +

Page 36: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

36

4.3 Stability(cont.)α(k+1) = α(k) - +

β(k+1) = β(k) - +

false positives α(k) decrease if >

false negatives β(k) decrease if >

Page 37: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

37

4.3 Stability(cont.)Quality measures : P-precisionP-recallN-precisionN-recall

Page 38: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

38

Page 39: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

39

4.3 Stability(cont.)Quality measures : P-precision : P+ = +)P-recall : R+ = / βN-precision : P- = +)N-recall : R- = / α

α(k+1) = α(k) - + β(k+1) = β(k) - +

Page 40: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

40

4.3 Stability(cont.)

�⃗� (𝑘 )=¿

M

Page 41: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

41

4.3 Stability(cont.)Based on the well-founded theory of dynamical systems, if both eigenvalues λ1, λ2 of the transition matrix M are smaller than one, the state vector converges to zero (error-canceling)

Page 42: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

42

4.3 Stability(cont.)In the above analysis, the quality measures were assumed to be constant

In practice, it is not possible to identify all the errors of the classifier

Therefore, the training does not converge to errorless classifier, but may stabilize at a certain level

The performance increases in those iterations where the eigenvalues of M are smaller than one

Page 43: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

43

4. P-N LEARNING4.1 What is P-N Learning

4.2 Formalization

4.3 Stability

4.4 Experiments with Simulated Experts

4.5 Design of Real Experts

Page 44: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

44

4.4 Experiments with Simulated Experts

In this experiment, a classifier is trained on a real sequence using simulated P-N experts

Our goal is to analyze the learning performance as a function of the expert’s quality measures

To reduce 4D space (P+,R+, P-,R-), the parameters are set to P+=R+=P-=R-=1-ε, where ε represents the error of the expert

The transition matrix then becomes M = ∴ λ1=0, λ2=2ε

ε <0.5

Page 45: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

45

4.4 Experiments with Simulated Experts(cont.)

Page 46: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

46

4. P-N LEARNING4.1 What is P-N Learning

4.2 Formalization

4.3 Stability

4.4 Experiments with Simulated Experts

4.5 Design of Real Experts

Page 47: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

47

4.5 Design of Real ExpertsThis section applies the P-N learning to train an object detector from a labeled frame and a video stream

Bounding box

trajectory →structure

Page 48: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

48

4.5 Design of Real Experts(cont.)

The key idea of the P-N experts is to exploit the structure in data to identify the detector errors

P-expert exploits the temporal structure in the video and assumes that the object moves along a trajectory (frame-to-frame tracker)

N-expert exploits the spatial structure in the video and assumes that the object can appear at a single location only (produced by the tracker)

Page 49: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

49

4.5 Design of Real Experts(cont.)

Page 50: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

50

Outline1. INTRODUCTION

2. RELATED WORK

3. TRACKING-LEARNING-DETECTION

4. P-N LEARNING

5. IMPLEMENTATION OF TLD

6. QUANTITATIVE EVALUATION

7. CONCLUSIONS

8. LIMITATIONS AND FUTURE WORK

Page 51: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

51

5. IMPLEMENTATION OF TLD5.1 Prerequisites

5.2 Object Model

5.3 Object Detector

5.4 Tracker

5.5 Integrator

5.6 Learning Component

Page 52: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

52

5. IMPLEMENTATION OF TLD

Page 53: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

53

5. IMPLEMENTATION OF TLD5.1 Prerequisites

5.2 Object Model

5.3 Object Detector

5.4 Tracker

5.5 Integrator

5.6 Learning Component

Page 54: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

54

5.1 PrerequisitesAt any time instance, the object is represented by its state. The state is either a bounding box or a flag indicating that the object is not visible

Therefore, a sequence of object states defines a trajectory of an object the trajectory is fragmented as the object may not be visible

The bounding box has a fixed aspect ratio (given by the initial bounding box) and is parameterized by its location and scale

Spatial similarity of two bounding boxes is measured using overlap

Page 55: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

55

5.1 Prerequisites(cont.)The patch p is sampled from an image within the object bounding boxand then is resampled to a normalized resolution (15 X 15 pixels) regardless of the aspect ratio

Similarity between two patches pi, pj is defined as

S(pi,pj) = 0.5(NCC(pi,pj)+1) NCC = Normalized Correlation Coefficient

Page 56: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

56

5. IMPLEMENTATION OF TLD5.1 Prerequisites

5.2 Object Model

5.3 Object Detector

5.4 Tracker

5.5 Integrator

5.6 Learning Component

Page 57: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

57

5.2 Object ModelObject model M is a data structure that represents the object and its surrounding observed so far. M = {p1

+,p2+,…,pm

+,p1-,p2

-,…,pn-}

Given an arbitrary patch p and object model M, we define several similarity measures :

1) Similarity with the positive nearest neighbor

2) Similarity with the negative nearest neighbor

3) Similarity with the positive nearest neighbor considering 50 percent earliest positive patches

Time ordered

Page 58: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

58

5.2 Object Model(cont.)4) Relative similarity

ranges from 0 to 1, higher values mean more confident that the patch depicts the object

5) Conservative similarityranges from 0 to 1. higher values mean more confidence that the patch resembles appearance observed in the first 50 percent of the positive patches

Page 59: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

59

5.2 Object Model(cont.)Model update

The patch is added to the collection only if the its label estimated by *NN classifier is different from the label given by the P-N experts

Page 60: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

60

*Nearest neighbor classifierA patch p is classified as positive if Sr(p,M)>θNN

otherwise the patch is classified as negative

A classification margin is defined as Sr(p,M)-θNN

Parameter θNN enables tuning the NN classifier either toward recall or precision

Page 61: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

61

5.2 Object Model(cont.)Model update

The patch is added to the collection only if the its label estimated by *NN classifier is different from the label given by the P-N experts

This leads to a significant reduction of accepted patches

Therefore, we improve this strategy by also adding patches where the classification margin is smaller than λ(0.1)

Compromises the accuracy of representation and the speed of growing of the object model

Page 62: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

62

5. IMPLEMENTATION OF TLD5.1 Prerequisites

5.2 Object Model

5.3 Object Detector

5.4 Tracker

5.5 Integrator

5.6 Learning Component

Page 63: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

63

5.3 Object DetectorThe detector scans the input image by a scanning window and for each patch decides about the presence or absence of the object

Scanning-window grid

We generate all possible scales and shifts of an initial bounding box

As the number of bounding boxes to be evaluated is large, the classification of every single patch has to be very efficient

A straightforward approach of directly evaluating the NN classifier is problematic

Page 64: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

64

5.3 Object Detector(cont.)We structure the classifier into three stages :

1) patch variance

2) ensemble classifier

3) nearest neighbor

Page 65: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

65

5.3.1 Patch VarianceThis stage rejects all patches for which gray-value variance is smaller than 50 percent of variance of the patch that was selected for tracking

That gray-value variance of a patch p can be expressed as E(p2)-E2(p)

E(p) can be measured in constant time using integral images

This stage typically rejects more than 50 percent of nonobject patches

Page 66: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

66

5.3.2 Ensemble ClassifierIDEA : Do not learn a single classifier but learn a set of classifiersCombine the predictions of multiple classifiers

Page 67: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

67

5.3.2 Ensemble Classifier(cont.)

The ensemble consists of n base classifiers

Each base classifier i performs a number of *pixel comparisons on the patch

Resulting in a binary code x

Which indexes to an array of *posteriors pi(y|x), where y {0,1}∈

The posteriors of individual base classifiers are averaged

The ensemble classifies the patch as the object if the average posterior is larger than 50 percent

Page 68: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

68

*Pixel ComparisonsPixel comparisons are generated offline at random and stay fixed in runtime

1) The image is convolved with a Gaussian kernel with standard deviation of 3 pixels to increase the robustness to shift and image noise

2) Each comparison returns 0 or 1 and these measurements are concatenated into x

Page 69: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

69

*Pixel Comparisons(cont.)

Page 70: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

70

*Pixel Comparisons(cont.)Generating pixel comparisons

The independence of the classifiers is in our case

We discretize the space of pixel locations and generate all possible horizontal and vertical pixel comparisons

Permutate the comparisons and split them into the base classifiers

As a result, every classifier is guaranteed to be based on a different set of features and cover the entire patch

This is in contrast to standard approaches, where every pixel comparison is generated independent of other pixel comparisons

Page 71: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

71

*Posterior ProbabilitiesEvery base classifier i maintains a distribution of posterior probabilities

d comparisons, which give 2d possible codes that index to the posterior probability Pi(y|x)= (positive and negative patches)

Initialization and update

All base posterior probabilities are set to zero

During runtime, if the classification is incorrect, the corresponding #p and #n are updated, which consequently updates Pi(y|x)

Page 72: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

72

5.3.3 Nearest Neighbor Classifier

We are typically left with several of bounding boxes that are not decided yet

We can use the online model and classify the patch using a NN classifier

A patch is classified as the object if Sr(p,M)>θNN θNN=0.6(0.5~0.7)

When the number of templates in the NN classifier exceeds some threshold (given by memory), we use random forgetting of templates (stabilizes around several hundred)

Page 73: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

73

5. IMPLEMENTATION OF TLD5.1 Prerequisites

5.2 Object Model

5.3 Object Detector

5.4 Tracker

5.5 Integrator

5.6 Learning Component

Page 74: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

74

5.4 TrackerThe tracking component of TLD is based on Median-Flow tracker extended with failure detection

The tracker estimates displacements of a number of points within the object’s bounding box, and votes with 50 percent of the most reliable displacements for the motion of the bounding box using median

Failure detection

Median-Flow tracker assumes visibility of the object and therefore inevitably fails if the object gets fully occluded or moves out of the camera view

Page 75: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

75

5.4 Tracker(cont.)Let di denote the displacement of a single point of the Median-Flow tracker and dm be the median displacement

A failure of the tracker is declared if median|di - dm| > 10 pixels

If the failure is detected, the tracker does not return any bounding box

This strategy is able to reliably identify failures caused by fast motion or fast occlusion

Page 76: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

76

5. IMPLEMENTATION OF TLD5.1 Prerequisites

5.2 Object Model

5.3 Object Detector

5.4 Tracker

5.5 Integrator

5.6 Learning Component

Page 77: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

77

5.5 IntegratorIntegrator combines the bounding box of the tracker and the bounding boxes of the detector into a single bounding box

If neither the tracker nor the detector output a bounding box, the object is declared as not visible

The integrator outputs the maximally confident bounding box, measured using conservative similarity Sc

While the detector localizes already known templates, the tracker localizes potentially new templates and thus can bring new data for the detector

Page 78: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

78

5. IMPLEMENTATION OF TLD5.1 Prerequisites

5.2 Object Model

5.3 Object Detector

5.4 Tracker

5.5 Integrator

5.6 Learning Component

Page 79: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

79

5.6 Learning ComponentThe task of the learning component is to initialize the object detector in the first frame and update the detector in runtime using the P-expert and the N-expert (growing and pruning)

Page 80: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

80

5.6.1 InitializationIn the first frame, the learning component trains the initial detector using labeled examples generated as follows :

The positive training examples are synthesized from initial bounding box

Select 10 bounding boxes on the scanning grid that are closest to the initial bounding box

Generate 20 warped versions by geometric transformations(shift 1%, scale change 1%, in-plane rotation ±10°) and add them with Gaussian noise (σ=5)

The result is 200 synthetic positive patches

Page 81: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

81

5.6.1 Initialization(cont.)Negative patches are collected from the surrounding of the initializing bounding box; no synthetic negative examples are generated

After the initialization, the object detector is ready for runtime and to be updated by a pair of P-N experts

Page 82: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

82

5.6.2 P-ExpertP-expert can exploit the fact that the object moves on a trajectory and add positive examples extracted from such a trajectory

However, in the TLD system, the object trajectory is generated by a combination of a tracker, detector and the integrator

This combined process traces a discontinuous trajectory, which is not correct all the time

The challenge of the P-expert is to identify reliable parts of the trajectory and use it to generate positive training examples

Page 83: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

83

5.6.2 P-Expert(cont.)To identify the reliable parts of the trajectory, the P-expert relies on an object model M

Consider an object model represented as colored points in a feature space. Positive examples are represented by red dots connected by a directed curve suggesting their order, negative examples are black

Using the conservative similarity Sc, we define a subset as the core where Sc is larger than a threshold

Page 84: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

84

5.6.2 P-Expert(cont.)P-expert identifies the reliable parts of the trajectory as follows :

The trajectory becomes reliable as soon as it enters the core and remains reliable until is reinitialized or the tracker identifies its own failure

Page 85: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

85

5.6.2 P-Expert(cont.)In every frame, the P-expert outputs a decision about the reliability of the current location

If the current location is reliable, the P-expert generates a set of positive examples that update the object model and the ensemble classifier

10 bounding boxes closest to the current bounding box

Generate 10 warped versions by geometric transformations(shift 1%, scale change 1%, in-plane rotation ±5°) and add them with Gaussian noise (σ=5)

The result is 100 synthetic positive examples

Page 86: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

86

5.6.3 N-ExpertThe key assumption of the N-expert is that the object can occupy at most one location in the image

If the object location is known, the surrounding of the location is labeled as negative

i.e. Patches that are far from current bounding box (overlap < 0.2) are all labeled as negative

Page 87: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

Outline1. INTRODUCTION

2. RELATED WORK

3. TRACKING-LEARNING-DETECTION

4. P-N LEARNING

5. IMPLEMENTATION OF TLD

6. QUANTITATIVE EVALUATION

7. CONCLUSIONS

8. LIMITATIONS AND FUTURE WORK

87

Page 88: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

88

6 QUANTITATIVE EVALUATION

6.1 Comparison 1: CoGD

6.2 Comparison 2: Prost

6.3 TLD Data Set

6.4 Improvement of the Object Detector

6.5 Comparison 3: TLD Data Set

Page 89: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

89

6 QUANTITATIVE EVALUATION

6.1 Comparison 1: CoGD

6.2 Comparison 2: Prost

6.3 TLD Data Set

6.4 Improvement of the Object Detector

6.5 Comparison 3: TLD Data Set

Page 90: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

90

6.1 Comparison 1: CoGDTLD was compared with results reported in [33] which reports on performance of five trackersIterative Visual Tracking (IVT) Online Discriminative Features (ODV) Ensemble Tracking (ET) Multiple Instance Learning (MIL) Cotrained Generative-Discriminative tracking (CoGD)

[33]Online Tracking and Reacquisition Using Co-trained Generative and Discriminative Trackers

Page 91: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

91

6.1 Comparison 1: CoGD(cont.)

The performance was accessed using the number of successfully tracked frames

i.e. The number of frames where overlap with a ground truth bounding box is larger than 50 percent

Frames where the object was occluded were not counted

Page 92: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

92

6.1 Comparison 1: CoGD(cont.)

CoGD runs at 2 frames per second and requires several frames (typically six) for initialization

In contrast, TLD requires just a single frame and runs at 20 frames per second

Page 93: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

93

6 QUANTITATIVE EVALUATION

6.1 Comparison 1: CoGD

6.2 Comparison 2: Prost

6.3 TLD Data Set

6.4 Improvement of the Object Detector

6.5 Comparison 3: TLD Data Set

Page 94: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

94

6.2 Comparison 2: ProstTLD was compared with the results reported in [67] which reports on performance of five algorithmsOnline Boosting (OB) Online Random Forrest (ORF) Fragmentbaset Tracker (FT) MILProst

[67]PROST : Parallel Robust Online Simple Tracking

Page 95: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

95

6.2 Comparison 2: Prost(cont.)The performance was reported using two measures :

1) Recall : number of true positives divided by the length of the sequence (true positive is considered if the overlap with ground truth is > 50%)

2) Average Localization Error : average distance between center of predicted and ground truth bounding boxTLD estimates the scale of an object. However, the algorithms compared in this experiment perform tracking in single scale only In order to make a fair comparison, the scale estimation was not used

Page 96: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

96

6.2 Comparison 2: Prost(cont.)

Page 97: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

97

6.2 Comparison 2: Prost(cont.)

Page 98: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

98

6 QUANTITATIVE EVALUATION

6.1 Comparison 1: CoGD

6.2 Comparison 2: Prost

6.3 TLD Data Set

6.4 Improvement of the Object Detector

6.5 Comparison 3: TLD Data Set

Page 99: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

99

6.3 TLD Data SetWe consider these sequences as saturated and therefore introduce a new, more challenging data set

We started from the six sequences used in Experiment 6.1 (CoGD) and collected four additional sequences

The new sequences are long and contain all the challenges typical for long-term tracking

More than 50 percent of occlusion or more than 90 degrees of out-of-plane rotation was annotated as “not visible”

Page 100: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

100

6.3 TLD Data Set(cont.)

Page 101: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

101

6.3 TLD Data Set(cont.)

Page 102: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

102

6.3 TLD Data Set(cont.)The performance is evaluated usingPrecision (P) : the number of true positives divided by the number of all responsesRecall (R) : the number of true positives divided by the number of object occurrences that should have been detectedf-measure (F) : F=2PR/(P+R)

A detection was considered to be correct if its overlap with ground truth bounding box was larger than 50 percent

Page 103: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

103

6 QUANTITATIVE EVALUATION

6.1 Comparison 1: CoGD

6.2 Comparison 2: Prost

6.3 TLD Data Set

6.4 Improvement of the Object Detector

6.5 Comparison 3: TLD Data Set

Page 104: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

104

6.4 Improvement of the Object Detector

This experiment quantitatively evaluates the learning component of the TLD system on the TLD data set

We compare the Initial Detector (trained in the first frame) and the Final Detector (obtained after one pass through the training)

We measure the quality of the P-N experts (P+,R+,P-,R-) in every iteration of the learning and report the average score

Page 105: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

105

6.4 Improvement of the Object Detector(cont.)

The target of this sequence is an animal which performs out-of-plane motion

Page 106: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

106

6 QUANTITATIVE EVALUATION

6.1 Comparison 1: CoGD

6.2 Comparison 2: Prost

6.3 TLD Data Set

6.4 Improvement of the Object Detector

6.5 Comparison 3: TLD Data Set

Page 107: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

107

6.5 Comparison 3: TLD Data Set

This experiment evaluates the proposed system on the TLD data set and compares it to five trackers :

1) OB 2) SB 3) BS 4) MIL 5) CoGD

Binaries for trackers 1)-3) are available on the Internet

Trackers 4), 5) were kindly evaluated directly by their authors

We allowed the initialization to be selected by the authorsFor instance, when tracking a motorbike racer, some algorithms might perform better when tracking only a part of the racer

Page 108: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

108

6.5 Comparison 3: TLD Data Set(cont.)

500 frame any positive example added to the model had been augmented with a mirrored version.

Page 109: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

109

Outline1. INTRODUCTION

2. RELATED WORK

3. TRACKING-LEARNING-DETECTION

4. P-N LEARNING

5. IMPLEMENTATION OF TLD

6. QUANTITATIVE EVALUATION

7. CONCLUSIONS

8. LIMITATIONS AND FUTURE WORK

Page 110: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

110

7 CONCLUSIONSIn this paper, we designed a new framework that decomposes the tasks into three components : tracking, learning, and detection

We have demonstrated that an object detector can be trained from a single example and an unlabeled video stream using the following strategy :

1) evaluate the detector

2) estimate its errors by a pair of experts

3) update the classifier

Page 111: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

111

7 CONCLUSIONS(cont.)Each expert is focused on identification of particular type of the classifier error and is allowed to make errors itself

The stability of the learning is achieved by designing experts that mutually compensate their errors

A real-time implementation of the framework has been described in detail

An extensive set of experiments was performed

The superiority of our approach with respect to the closest competitors was clearly demonstrated

The code of the algorithm, as well as the TLD data set, has been made available online (http://cmp.felk.cvut.cz/tld)

Page 112: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

112

Outline1. INTRODUCTION

2. RELATED WORK

3. TRACKING-LEARNING-DETECTION

4. P-N LEARNING

5. IMPLEMENTATION OF TLD

6. QUANTITATIVE EVALUATION

7. CONCLUSIONS

8. LIMITATIONS AND FUTURE WORK

Page 113: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

113

8 LIMITATIONS AND FUTURE WORKTLD does not perform well in case of full out-of-plane rotationThe Median-Flow tracker drifts away from the target and can be reinitialized only if the object reappears with appearance seen/learned beforeCurrent implementation of TLD trains only the detector and the tracker stay fixed, so the tracker always makes the same errors

Future work : train the tracking component

Page 114: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

114

8 LIMITATIONS AND FUTURE WORK(cont.)TLD currently tracks a single object

Future work : Multitarget trackingThe current version does not perform well for articulated objects such as pedestrians

Future work : In case of restricted scenarios, e.g., static camera, an interesting extension of TLD would be to include background subtraction in order to improve the tracking capabilities

Page 115: Tracking-Learning- Detection Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34,

Thanks for listening!

115