online multi-person tracking using variance magnitude of image colors and solving short minimum...

14
Online Multi-Person Tracking Using Variance Magnitude of Image colors and Solving Short Minimum Clique Problem Pourya Jafarzadeh 1 , Bijan Shoushatrain 2 1-MSc Student University of Isfahan- Iran, Ahwaz 2- Assistant Professor, University of Isfahan- Iran, Isfahan Corresponding Author’s E-mail: [email protected] Abstract Multi-object tracking (MOT) is an essential but challenging task in many computer vision applications. Numerous researches have been performed on this topic where first objects are detected independently in each frame (object detection) and then the detected objects are linked together into trajectories (data association). In complex scenes, MOT is still a difficult task due to many problems including long-term occlusion by clutter or other objects, similar appearances of different objects, crowded scenes, etc. In this paper, data association is formulated as a Short Minimum Clique Problem (SMCP). Using three consecutive frames, three clusters are created where each clique between these clusters is a tracklet (partial trajectory) of a person. For this purpose, a fast and simple method is proposed for creating cliques by pruning the extra edges between clusters. For edge weights, color histogram similarities and similarity of eigenvalues of bounding boxes of people are used. Moreover for occlusion handling a trustable and fast method is applied. By saving the color histograms of people, the occlusion handling is performed. The proposed algorithm is evaluated on three challenging sequences of TUD-Crossing, TUD-Stadtmitte and PET 2009 and then compared to state-of-the-art methods where promising results are obtained. Keywords: Multi-object tracking, Clique, Short Minimum Clique Problem, SMCP, 1. INTRODUCTION One of the most important tasks in many computer vision applications is multi-object tracking (MOT). It has wide applications including various video analysis scenarios, such as motion and scene analysis, video indexing, activity recognition, video surveillance and traffic monitoring, among which traffic video surveillance motivates most of the investigations on multi-object tracking. Using MOT, the states of multiple objects are estimated while their identifications are conserved under appearance and motion variations with time. In complex scenes, MOT is still a difficult task due to many problems consisting of long-term occlusion by clutter or other objects, ID-switching, crowded scenes, and so on. Multi-object tracking (MOT) aims to estimate object trajectories according to the identities in image sequences. Recently, thanks to the advances of object detectors [1], [2], numerous tracking-by-detection approaches have been developed for MOT. In this type of approaches, target objects are detected first and tracking algorithms estimate their trajectories using detection results, this part is called data association. Tracking-by-detection methods can be broadly categorized into online and offline (batch or semi-batch) tracking methods. Offline MOT methods generally utilize detection results from past and future frames. Tracklets are first generated by linking individual detections in a number of frames, and then iteratively associated to construct long trajectories of objects in the entire sequence, or in a time-sliding window with a temporal delay (e.g.,[3], [4]). On the other hand, online MOT algorithms estimate object trajectories using only detections from the current as well as past frames (e.g. [5][7]), and online MOT algorithms are more applicable to real-time applications such as advanced driving assistant systems and robot navigation[8].

Upload: pourya-jafarzadeh

Post on 15-Apr-2017

101 views

Category:

Science


0 download

TRANSCRIPT

Online Multi-Person Tracking Using Variance Magnitude of

Image colors and Solving Short Minimum Clique Problem

Pourya Jafarzadeh

1, Bijan Shoushatrain

2

1-MSc Student University of Isfahan- Iran, Ahwaz

2- Assistant Professor, University of Isfahan- Iran, Isfahan

Corresponding Author’s E-mail: [email protected]

Abstract Multi-object tracking (MOT) is an essential but challenging task in many computer vision applications.

Numerous researches have been performed on this topic where first objects are detected independently in

each frame (object detection) and then the detected objects are linked together into trajectories (data

association). In complex scenes, MOT is still a difficult task due to many problems including long-term

occlusion by clutter or other objects, similar appearances of different objects, crowded scenes, etc. In this

paper, data association is formulated as a Short Minimum Clique Problem (SMCP). Using three

consecutive frames, three clusters are created where each clique between these clusters is a tracklet

(partial trajectory) of a person. For this purpose, a fast and simple method is proposed for creating cliques

by pruning the extra edges between clusters. For edge weights, color histogram similarities and similarity

of eigenvalues of bounding boxes of people are used. Moreover for occlusion handling a trustable and fast

method is applied. By saving the color histograms of people, the occlusion handling is performed. The

proposed algorithm is evaluated on three challenging sequences of TUD-Crossing, TUD-Stadtmitte and

PET 2009 and then compared to state-of-the-art methods where promising results are obtained.

Keywords: Multi-object tracking, Clique, Short Minimum Clique Problem, SMCP,

1. INTRODUCTION

One of the most important tasks in many computer vision applications is multi-object tracking (MOT). It has

wide applications including various video analysis scenarios, such as motion and scene analysis, video

indexing, activity recognition, video surveillance and traffic monitoring, among which traffic video

surveillance motivates most of the investigations on multi-object tracking. Using MOT, the states of multiple

objects are estimated while their identifications are conserved under appearance and motion variations with

time. In complex scenes, MOT is still a difficult task due to many problems consisting of long-term occlusion

by clutter or other objects, ID-switching, crowded scenes, and so on.

“Multi-object tracking (MOT) aims to estimate object trajectories according to the identities in image

sequences. Recently, thanks to the advances of object detectors [1], [2], numerous tracking-by-detection

approaches have been developed for MOT. In this type of approaches, target objects are detected first and

tracking algorithms estimate their trajectories using detection results, this part is called data association.

Tracking-by-detection methods can be broadly categorized into online and offline (batch or semi-batch)

tracking methods. Offline MOT methods generally utilize detection results from past and future frames.

Tracklets are first generated by linking individual detections in a number of frames, and then iteratively

associated to construct long trajectories of objects in the entire sequence, or in a time-sliding window with a

temporal delay (e.g.,[3], [4]). On the other hand, online MOT algorithms estimate object trajectories using

only detections from the current as well as past frames (e.g. [5]–[7]), and online MOT algorithms are more

applicable to real-time applications such as advanced driving assistant systems and robot navigation” [8].

Data association techniques are divided in two groups: temporally local and temporally global.

Bipartite matching is the most popular method for temporally local approaches where a Hungarian algorithm

is an exact solution for it. Another group of data association is temporally global which has the ability to

better deal with the challenges. The popularity of the global data association based tracking methods has

recently increased. In global data association, optimization is achieved over a batch of frames instead of just

two/few consecutive frames [3], [7], [9]–[11].

Authors in [10], [11] suggested global trackers by finding minimum/maximum cliques in graphs

where each clique corresponds to the tracklet of a person in the frames. In [11], cliques are found by

optimizing motion and appearance. Recently, GMCP has been used in the fields of biology and

telecommunication [12]. Also in [10], binary integer programming is used to find cliques by considering all

possible connections in the graph.

In this paper, due to the fact that the current and past frames are used, the proposed method is an

online method. In the proposed method tracking is based on solving minimum clique problem. For this

purpose, three consecutive frames as three clusters are used where cliques are considered as the trakclets of a

person in those frames. By finding and analyzing cliques, occluded and visible (unoccluded) objects are

determined. Moreover, the new approach is based on similarity of eigenvalues along with color histograms of

bounding boxes of corresponding people in three frames. The proposed tracker eliminates the extra edges of

the graph which have no effect in vital comparisons in the clique problem.

The rest of the paper is organized as follows. In Section 2, our tracking method based on Short

Minimum Clique Problem (SMCP) is introduced. Section 3 presents the experiments and the proposed

tracker is compared with the state-of-the-art methods. Finally, the paper conclusion is given in Section 4.

2. Problem Formulation

Our proposed tracker is based on three-partite matching. A frame is considered as a cluster in which a

detected person is defined as a node of that cluster. A node in a cluster doesn’t have any connection with

other nodes of the same cluster but it has a number of connections to the other nodes in other clusters (see

figure 1.b). Suppose there are K people in three consecutive frames. The goal is to find a sub-graphs which

forms cliques in which the sum of edges are minimized and exactly K nodes are selected from each cluster.

Thus, K cliques are obtained where each clique has three nodes. Each clique indicates the tracklet of one

person from K people in the video (see figure 1(a)).

To give a more formal definition, the input to the SMCP is a graph where , and are

the nodes, edges and their corresponding weights, respectively. is divided into a set of disjoint nodes of

clusters where is defined the th node in the th cluster. The goal is now to pick a set of K cliques by

selecting exactly K nodes from each cluster that minimizes the total score. Minimum cliques according to the

weights of edges of graph set are then selected as explained in the following sections.

2.1 Finding Tracklets Using SMCP

The proposed method in this paper operates on three consecutive frames in each step in order to find

corresponding people in those frames. The performance of the proposed is comparable with the results of the

aforementioned global methods.

Consider three consecutive frames as , , . By comparing detected pedestrians in those

frames, a graph is created in which the edges represent all the connections the corresponding bounding boxes.

After finding minimum cliques in the graph, corresponding pedestrians in the three frames are obtained and

tracker continues with the next three frames that the first frame in the new sequence is the last one in the

previous sequence.

The graphs of some scenes may become large with a huge number of edges. For instance, in TUD-

Crossing dataset, there are 6 pedestrians in some frames (clusters) where each cluster has 6 nodes. In the full

graph of the three frames, there are 108 edges and the problem space becomes huge. Hence in building the

tracker’s graph, an efficient method for pruning the edges of the graph is applied. The distance of pedestrians

in frames for creating the edges is utilized, and the Euclidean distance (equation 1) between two bounding

boxes is used for this purpose. A predefined distance measure is set as a threshold and the tracker just

considers edges among the pedestrians (nodes) that their Euclidean distances are less than the mentioned

threshold. As a result, the graph with 108 edges is reduced to a graph by approximately 30 edges that is

reduced to 27% of first state (see figure 1). So the problem space is reduced substantially and then the tracker

finds the minimum cliques in the new graph by using the weights of edges.

( ) √(

) (

)

(1)

(a)

(b)

Figure 1. In (a) we show the full graph with 108 edges, but in (b) with regarding

the distance the sparse graph is created and two cliques as sample solutions. The

cliques are determined by the bold edges (blue and orange). The graph has 30

edges that is of the (a).

2.2 Calculation of Edges’ Weights

The intersection of the color histograms of bounding boxes and similarity of eigenvalues are employed for

computing the weights of edges. For appearance representation of a node, the color histogram [13] is utilized.

Formula (2) is used to compute the intersection of the color histograms of bounding boxes in which

means the part of the node of frame.

( ) ∑ (

)

(2)

where k represents histogram intersection kernel. The root mean square error (RMSE) kernel is utilized in

the proposed method.

The eigenvalues in a covariance matrix represent the variance magnitude in the direction of the largest

spread of the data. The direction of the largest spread of the data is the direction of the eigenvector of the

corresponding eigenvalue [14]. So, the covariance of the bounding box’s parts is computed and then the

eigenvalues of each covariance matrix is computed. After this the eigenvalues of each matrix is sorted in

descending order. Then the sorted eigenvalues of each part ( is compared with the eigenvalues of its

corresponding part in other bounding box ( Formula (3) is used to compare the similarity of

eigenvalues of two bounding boxes:

( ) ∑ (

)

(3)

Where EigsofCov is the eigenvalues of the covariance matrix of , and K represents the kernel for

comparing eigenvalues. The root mean square error (RMSE) kernel is utilized in the proposed method.

In the algorithm, three frames are compared with each other; i.e. with , with and ,

with . If the Euclidean distance between two nodes is less than a predefined threshold, an edge between

them is created. It should be noted that

demonstrates the weight of the edge between th node in th

cluster (frame) and th node in th cluster (frame). Similarly

is the edge between the nodes and

. Now assume that node in has two connections with and nodes in . The

algorithm then computes the

and

.Firstly the intersections of histograms between and

two nodes are computed. After this, the two comparison values are sorted in

descending order (the less value of RMSE indicates more similarity of color histograms). Secondly,

similarity of eigenvalues between mentioned bounding boxes is computed. Then the comparison values are

sorted in descending order (the less value of RMSE, the more similarity of eigenvalues). Suppose that

is located at the top of the sorted list of the intersection of histograms and similarity of eigenvalues.

In the other words, it is ranked first in both lists. Meanwhile, is ranked as second one in both lists.

After that,

and

are computed by formula (3) as follows:

(4)

In this case,

and

. So the weight of the edge between and is 1 and the

weight of the edge between and is 2. Thus, the node is more similar to the node than the

node .

The proposed tracker creates all possible edges in the graph by considering the distance threshold. All

possible cliques in three clusters are found by a simple iterative algorithm and then a clique with minimum

edges’ weight in the graph is selected. The minimum clique is the tracklet of a person in the three frames (in

figure. 1 (a) two cliques are illustrated). When the minimum clique is found, all its nodes and connected

edges are removed. As a result, the problem space is reduced more than before. In the next step, the next

minimum clique is found. This process is continued until all cliques are found, then the algorithm analysis

remainder nodes in the three frames for occlusion handling that is elucidated in section 2.3. Then, the tracker

selects next three frames for tracking where the first frame of the next sequence is the last frame in the

previous sequence.

In the proposed method, each clique consists of three nodes and each of which has two edges. So the sum of

outgoing edges of one node entering another cluster is less than or equal to one. Thus, one clique does not

include more than one node at each cluster [10].

(5) ∑

2.3 Occlusion Handling Model

The most essential issue in tracking is said to be occlusion handling. For example, a person goes behind

another person or an object in th frame; then he/she may appear in ( th frame. A precise tracker should

be able to handle occlusions by recognizing the person in frame as the same person in ( frame.

Basically, the proposed algorithm performs the occlusion handling by analyzing the remaining nodes in the

graph (the nodes that don’t belong to any clique).

As mentioned in Section 2.1, our tracker deals with 3 frames. In the simplest situation, the same

people present in all three frames. As stated before, the algorithm selects minimum cliques after each other.

In this circumstance, the tracker finds cliques in the graph where the cliques are tracklet of people in the

frames. But in another situation, a person may be occluded in the third frame. The tracker discovers

cliques and a single edge between the first and the second frames (see figure. 2(a)). If the person does not exit

from the scene, the average of histograms of the nodes (bounding boxes) in the edge is saved in a buffer

along with its location in the second frame and the second frame number. Also in another situation, the

person may be present in the first frame and he /she is occluded in the second and third frames (figure 2(b)).

The tracker finds cliques among frames and a single node remains in the first frame. The algorithm

saves histogram of eight partitions of a single node, the coordinates of its bounding box and the frame

number in the buffer. Generally after finding full cliques, the remaining nodes in the first and second clusters

determine which nodes are stored in the buffer.

In a sample situation, there are people in frames , respectively. So there are

cliques and a single node in the third frame. Meanwhile, in another situation, there are

people in frames , respectively and there exist cliques and an edge between frames

and . These conditions demonstrate that one person is added to the scene in the third frame or the

second and third frames. By checking its coordinates, the tracker decides whether it is a new arrival in the

scene. If that is a new arrival, it is assigned a new ID and is identified as a new person. But if it is not

recognized as a new arrival, it is identified as the person who is released from occlusion. Therefore, the

single node in the 3rd

frame or the node of the edge between the 2nd

and 3rd

frames is compared with the

buffer. As mentioned in above, the histograms of the occluded people along with their coordinates and their

frame numbers are stored in the buffer. So the histograms of the bounding boxes of new people are compared

with the histograms of the bounding boxes of the stored people in the buffer. The stored person in the buffer

that has the most similarity with the new person is released from the buffer. For example, a person is

occluded in frame and it is identified in frame by the tracker. The coordinates of the person in

frame is available and the coordinate in frame is obtained from the buffer. Now its location in

frames should be determined (i.e., in the frames in which the person was occluded). Formulas (6-9) can be

used to estimate the location of the person in occluded frames . Similar to [10],

[11] methods, a constant velocity model is employed. The formulas (6-9) assume the constant velocity for

people in sequences. The tracker computes the rate of spatial movement in direction as and the rate of

spatial movement in direction as . These two rates are used to locate the person in the occluded frames.

Figures 4, 5 and 6 showed occlusion handling examples of the proposed method.

(a)

(b)

Figure 2. In (a), the man (blue bounding box) is occluded in the frame t+2. In (b), the man (blue

bounding box) is occluded in frame t+1and frame t+2

2.4 Structure of the Buffer

The histograms of the bounding boxes of occluded people are stored in a buffer and after appearing again in

the scene, their bounding boxes histograms are compared with those stored in the buffer. In this section the

structure of the buffer is delineated. The space for each person in the buffer is: table which is the

histogram of eight parts of a person’s bounding box. The histogram is computed by method [13].The

coordinates of the left upper corner of the bounding box of the person and the frame number.

The steps of the proposed tracker are as follows:

Algorithm 1 input: is the whole of a sequence, T frames,

output: the tracks of people in the sequence 1- Input 3 frames

2- Create the sparse graph in addition to the weights of the edges

3- Find all the possible cliques in the graph

4- Select the minimum clique

5- Remove nodes and edges of the minimum clique from the graph,

remove all the edges that connected to the nodes of the minimum clique

6- If there is another clique go to step 4; otherwise go to step 7

7- Analyze the nodes and edges in the 1st and the 2

nd frames for storing in

the buffer as occluded people,

8- Analyze the nodes and edges in the 2nd

and the 3rd

frames for occlusion handling.

9- Update the buffer by releasing identified occluded people and store new occluded people in the

buffer.

10- If then and go to step 1 else exit.

3. Tracking Evaluation

In this section, the experimental evaluation of the proposed tracking algorithm and comparison against the

state-of-the-art methods are presented. We carry out experiments on three publicly available sequences,

which provide a wide range of significant challenges: TUD-Crossing, TUD-Stadtmitte and sequence S2L1

from VS-PET2009 benchmark. We compared our method with the state of the art trackers, borrowing the

numbers from the authors’ papers.

The trackers are as follows: GMMCP [10], GMCP [11], WMWIS [15], MAT [16], DCPF [17],

OMTUC [18], OMPTH [19], DLP [20], MTEM [21], GORMT [22], KSO [9], GAC [23], MOTMM [24].

3.1 Implementation

We used deformable part based model [1] to get the detection hypothesis in each frame. The coefficient

for computing edge weight is set to . We implemented sensitivity analysis in TUD-Stadtmitte dataset and

(6)

(7)

(8)

( ) (9)

compared the people in each two consecutive frames with different values for . According to the curve in

figure 3, the best value for is . The distance threshold for edge creating is set to 40, due to the fact that

most of the time a pedestrian does not move more than 40 pixels in two consecutive frames.

3.2 Evaluation Metrics

Standard CLEAR MOT [25] are used as evaluation metrics. False positives, false negatives (Missed people)

and ID-Switches are measured by MOTA. MOTP is defined as the average distance between the ground truth

and estimated targets locations. MOTP shows the ability of the tracker in estimating the precise location of

the object, regardless of its accuracy at recognizing object configurations, keeping consistent trajectories, and

so forth. Therefore, MOTA has been widely accepted in the literature as the main scale of performance of

tracking methods.

Figure 3. Sensitivity analysis TUD-Stadtmitte dataset. The recall value for different is

computed.The maximum value achieved at .

TUD Data set

TUD-Crossing and TUD-Stadtmitte are two sequences in this data set with low camera angle and

frequent occlusions. Crossing and Stadtmitte include 201 and 179 frames respectively.

PET2009-S2L1-View one

The sequence consists of 795 frames. This sequence consists of occlusions and people crossing

from sides of each other many times. Due to the long-time of the sequence, it is so useful for

testing methods. Proposed method has low ID-Switches like many states-of-the-art methods in this

sequence. Table 1. Tracking result for TUD-Crossing sequence.

IDSW Rec. Prec. MOTP% MOTA% Method

2 - - 70 91.9 GMMCP

2 98.8 89.2 73 85.9 WMWIS

0 92.83 98.6 75.6 91.63 GMCP

2 98.6 85.1 71.0 84.3 DCPF

8 - - 76.9 90.6 MAT

11 - - 67.5 71.3 OMPTH

2 - - 71.0 84.3 OMTUC

2 94.64 94.11 73.6 92.31 Our method

Table 2. Tracking result for TUD-Stadtmitte sequence.

IDSW Rec. Prec. MOTP% MOTA% Method

0 - - 73.9 82.4 GMMCP

4 - - 73.9 79.3 DLP

0 85.4 95.6 63.4 77.7 GMCP

7 - - 65.8 60.5 MTEM

3 - - 70 75.4 MAT

2 - - 59.8 75.0 OMPTH

- - - 64.0 68.6 GORMT

0 70 100 73 80.83 Our method

Table 3. Tracking result for PET 2009-S2L1 sequence.

I

DSW Rec. Prec. MOTP% MOTA% Method

8 96.45 93.64 69.02 90.3 GMCP

28 60.00 81.00 58.00 80.0 KSO

19 90.81 90.66 58.38 81.46 GAC

15 85.13 96.28 73.93 82.84 MTEM

10 94.03 92.40 68.74 84.77 MOTMM

8 - - 74.3 92.8 MAT

10 - - 66.1 91.0 OMPTH - - - 75.7 88.3 GORMT

6 98.7 97.7 85.33 96.57 Our method

4. Conclusion

In this work, we propose to formulate multi-person tracking as a Short Minimum Clique Problem, which it is

solved through sparse graphs and using comparison of eigenvalues of covariance matrix besides color

histogram for finding corresponding people. Then we show that by using three frames in each step and using

a fluent occlusion handler, satisfactory results can be achieved. In our experiment the tracker compared with

the state-of-the-art methods on three challenging sequences. Experimental results demonstrate that our

method is effective and efficient.

Frame Number = 61

Frame Number = 68

Frame Number = 75

Frame Number = 83

Frame Number = 90

Frame Number = 103

Figure 4. The Woman with ID=7 that is cleared with a yellow bounding box is occluded in 43 frames. The

occlusion handling method saves her ID. Moreover other pedestrians are tracked correctly with their ID at #61 to

#103 frames.

Frame Number = 87

Frame Number = 95

Frame Number = 105

Frame Number = 109

Frame Number = 117

Frame Number = 128

Figure 5. The Woman with ID=10 that is cleared with a yellow bounding box is occluded in many times, but her

ID preserved by proposed occlusion handling method.

Frame Number = 292

Frame Number = 293

Frame Number = 294

Frame Number = 295

Frame Number = 296

Frame Number = 297

Figure 6. The person with ID=10 that is cleared with a black arrow is occluded behind the person with ID=9. The

occlusion handling method saves his ID. Moreover other pedestrians are tracked correctly with their ID at #292 to

#297 frames.

REFERENCES [1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object Detection with

Discriminative Trained Part Based Models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9,

pp. 1627–1645, 2010.

[2] Y. Amit and P. Felzenszwalb, “Object Detection,” Comput. Vis. A Ref. Guid., no. Springer US, pp.

537–542, 2014.

[3] H. Pirsiavash, D. Ramanan, and C. C. Fowlkes, “Globally-Optimal Greedy Algorithms for Tracking

a Variable Number of Objects,” 2011.

[4] J. Xing, H. Ai, and S. Lao, “Multi-object tracking through occlusions by local tracklets filtering and

global tracklets association with detection responses,” 2009 IEEE Comput. Soc. Conf. Comput. Vis.

Pattern Recognit. Work. CVPR Work. 2009, pp. 1200–1207, 2009.

[5] A. Dehghan and M. Shah, “Binary Quadratic Programing for Online Tracking of Hundreds of People

in Extremely Crowded Scenes,” vol. 14, no. 8, pp. 1–14, 2016.

[6] T. Wu, Y. Lu, and S.-C. Zhu, “Online Object Tracking, Learning and Parsing with And-Or Graphs,”

pp. 1–14, 2015.

[7] Z. Wu, J. Zhang, and M. Betke, “Online Motion Agreement Tracking,” Procedings Br. Mach. Vis.

Conf. 2013, pp. 63.1–63.11, 2013.

[8] J. H. Yoon and C. Lee, “Online Multi-Object Tracking via Structural Constraint Event Aggregation,”

IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2016.

[9] J. Berclaz, F. Fleuret, E. Türetken, and P. Fua, “Multiple object tracking using k-shortest paths

optimization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 9, pp. 1806–1819, 2011.

[10] A. Dehghan, “GMMCP Tracker : Globally Optimal Generalized Maximum Multi Clique Problem for

Multiple Object Tracking,” Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR 2015),

2015.

[11] A. Roshan Zamir, A. Dehghan, and M. Shah, “GMCP-tracker: Global multi-object tracking using

generalized minimum clique graphs,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif.

Intell. Lect. Notes Bioinformatics), vol. 7573 LNCS, no. PART 2, pp. 343–356, 2012.

[12] C. Feremans, M. Labbé, and G. Laporte, “Generalized network design problems,” Eur. J. Oper. Res.,

vol. 148, no. 1, pp. 1–13, 2003.

[13] J. Domke and Y. Aloimonos, “Deformation and Viewpoint Invariant Color Histograms,” Procedings

Br. Mach. Vis. Conf. 2006, pp. 53.1–53.10, 2006.

[14] V. Spruyt, “A geometric interpretation of the covariance matrix,” 2104. [Online]. Available:

http://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/.

[15] W. Brendel, M. Amer, and S. Todorovic, “Multiobject tracking as maximum weight independent

set,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 1273–1280, 2011.

[16] Z. Wu, J. Zhang, and M. Betke, “Online Motion Agreement Tracking,” Procedings Br. Mach. Vis.

Conf. 2013, pp. 63.1–63.11, 2013.

[17] M. D. Breitenstein, F. Reichlin, B. Leibe, E. K. L. Van Gool, and K. U. Leuven, “Robust Tracking-

by-Detection using a Detector Confidence Particle Filter,” IEEE Int. Conf. Comput. Vis., vol. i, 2009.

[18] M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. Van Gool, “Online multiperson

tracking-by-detection from a single, uncalibrated camera,” IEEE Trans. Pattern Anal. Mach. Intell.,

vol. 33, no. 9, pp. 1820–1833, 2011.

[19] J. Zhang, L. Lo Presti, S. Sclaroff, C. Street, and B. Ma, “Online Multi-Person Tracking by Tracker

Hierarchy,” pp. 379–385, 2012.

[20] K. C. A. Kumar and C. De Vleeschouwer, “Discriminative label propagation for multi-object

tracking with sporadic appearance features,” Proc. IEEE Int. Conf. Comput. Vis., pp. 2000–2007,

2013.

[21] A. Andriyenko and K. Schindler, “Multi-target tracking by continuous energy minimization,” Proc.

IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 1265–1272, 2011.

[22] A. Andriyenko, S. Roth, and K. Schindler, “An analytical formulation of global occlusion reasoning

for multi-target tracking,” Proc. IEEE Int. Conf. Comput. Vis., no. November, pp. 1839–1846, 2011.

[23] H. Ben Shitrit, J. Berclaz, F. Fleuret, and P. Fua, “Tracking multiple people under global appearance

constraints,” Proc. IEEE Int. Conf. Comput. Vis., no. 247022, pp. 137–144, 2011.

[24] J. F. Henriques, R. Caseiro, and J. Batista, “Globally optimal solution to multi-object tracking with

merged measurements,” Proc. IEEE Int. Conf. Comput. Vis., pp. 2470–2477, 2011.

[25] R. Kasturi, D. Goldgof, V. Korzhova, S. Member, J. Zhang, and S. Member, “Framework for

Performance Evaluation of Face , Text , and Vehicle Detection and Tracking in Video : Data ,

Metrics , and Protocol,” Pattern Anal. Mach. Intell. IEEE Trans., vol. 31, no. 2, pp. 319–336, 2009.