[ieee 2006 ieee international conference on video and signal based surveillance - sydney, australia...

Learning a New Statistical Shape Prior Model for Object Detection by GeodesicActive Contours

Wen Fang, Kap Luk ChanSchool of Electrical and Electronic Engineering

Nanyang Technological University{FA0001EN, EKLCHAN}@ntu.edu.sg

Abstract

A new statistical shape prior model is proposed in thispaper which is incorporated into geodesic active contoursfor robust object detection. The object shapes that undergononlinear deformable changes are assumed to lie in a lowdimensional feature subspace and form clusters after a non-linear mapping. They are approximated by a probabilis-tic density model to explore the structure of data distribu-tion. The obtained probability is treated as a shape energyterm and is incorporated into geodesic active contour equa-tion to constrain the further curve evolution process. Thisshape prior model is based on a more sophisticated statis-tical learning of the training data distribution and thus ismore robust in presence of occlusions and cluttered back-ground. Experiments demonstrate its promising detectionperformance for the intended tasks.

1. Introduction

In recent years, there have been substantial researchwork on the level set based method for object detectionand tracking in video surveillance systems [11], [16], [3].The geodesic active contours (GAC) [1], which adopts thelevel set implementation [10] for curve evolution, inheritsmany advantages of the level set framework. Since the GACmethod can provide a tight and reliable object boundary, itis suitable for the human behavior analysis and recognitionapplications. However, as it only depends on the image gra-dient to constrain the curve evolution process, it is unable tohandle the background noise and object occlusions whichare often encountered during the object detection and track-ing process. The level set curves are “deceived” by spuriousedges nearby and converge to wrong location.

Various work have been reported to solve this problem.A common approach is to learn the object shape from aset of training samples, and then use it as global prior

knowledge to guide the movement of current evolving curvewithin a cluster of similar shapes. Thus, if the object is par-tially occluded, the curve can still recover the missing edgesby referring to the information in the prior model. Chen [2]proposed to use the average of the training samples as aprior model. Zhang [17] used a single shape as the shapemodel template which is selected from a family of shapesby measuring an error rate. These shape models only utilizesimple shape prototypes to represent the object characteris-tics, they cannot account for large shape variations.

There are research work that proposed to build the priormodel based on statistical learning. Cremers [5] estimatedthe training data distribution using a Gaussian function, thenuse it as a shape prior model to constrain the curve. Rous-son [12] also modeled the training shape distribution with aGaussian and the curve is moved to a “preferred” shape byseeking the maximum likelihood of the density. However,as the shape priors in the above work are directly built overthe training data which are generated from certain represen-tations of shapes (e.g. level set function), they suffer fromthe curse of dimensionality. The data distribute sparsely ina high dimensional space, making it difficult to capture theirreal structure.

Hence, dimensionality reduction technique is involvedinto the prior model construction. Leventon [9] appliedprincipal component analysis (PCA) on the data first be-fore performing density estimation. The data is fitted witha Gaussian in the reduced subspace. A Similar techniquewas used in [15]. The shape prior is constructed by the lin-ear weighted sum of k principal eigenshapes plus a meanshape. However, very often the object shapes may undergononlinear deformations, such as in the situation of a humanwalking under a surveillance camera. Its shapes may fol-low a nonlinear change due to the change of viewing an-gle. In that situation, a linear projection is unable to showthe true intrinsic structure of data distribution. AlthoughCremers [4] proposed to use kernel method in their model,they performed a kernel mapping into a high dimensionalspace, and then modeled the data distribution with a single

Proceedings of the IEEE International Conferenceon Video and Signal Based Surveillance (AVSS'06)0-7695-2688-8/06 $20.00 © 2006

Gaussian function, without due consideration of the fact thatthere may be several shape clusters that a single Gaussianapproximation is inappropriate.

Therefore in this paper, a novel statistical shape priormodel is proposed for object detection by the geodesic ac-tive contour algorithm. The prior model is constructed froma more sophisticated pattern recognition principle. Theshape states in a submanifolds is assumed to form distinctclusters corresponding to different shapes seen under differ-ent views and different human pose. Thus, the level set rep-resentation (signed distance map) of the object shapes withlocal variations is also assumed to lie in a low-dimensionalfeature space and form distinct clusters after a nonlinearmapping. The kernel principal component analysis (kPCA)is employed which first performs a nonlinear mapping onthe training data to remove their nonlinear statistical proper-ties, followed by a dimensionality reduction technique step.A Gaussian Mixture Model (GMM) is then used to estimatethe distribution of multiple clusters corresponding to differ-ent object shapes in the subspace. The obtained probabilitydensity function of shape prior is regarded as shape strengthand is incorporated into the geodesic active contour functionto constrain the further curve evolution process. This shapeprior model is based on a more accurate statistical learningof the training data distribution and is more robust in pres-ence of occlusions and cluttered background.

The rest of this paper is organized as follows: section 2presents the rationale of geodesic active contours. Section3 describes the construction of our shape prior model. Thecombination of the shape prior knowledge with geodesicactive contour method is shown in section 4. Experimentalresults and conclusions are given in sections 5 and 6.

2. Geodesic Active Contours

The geodesic active contour method is based on therelationship between active contour and the computationof geodesics of minimal distance curves in a Riemannianspace. Let C(q): [0, 1] → R

2 be a parametrized curve andlet I: [0, a] × [0, b] → R

+ denote a given image. The clas-sical snake approach [8] defines an energy function E overa curve C as the sum of an internal and an external energy,and evolves the curve to minimize E

E(C) = β

∫ 1

0

|C′′(q)|2dq − λ

∫ 1

0

|�I(C(q))|dq (1)

where β controls the smoothness of the contour and λis responsible for attracting the contour towards the ob-ject boundaries. Caselles [1] reformulates the optimizationproblem of Eqn. (1) into a problem of geodesic curve com-putation in a Riemannian space,

min∫ 1

0

g(|�I(C(q))|)|C ′(q)|dq (2)

(a) Before alignment (b) After alignment

Figure 1. Comparison of the degree of shape overlap

before and after alignment.

g is a stopping function which contains the information re-garding the object boundary. It usually comes with the formg = 1/(1 + |�I|p), where I is a smoothed version of I andp = 1 or 2. Computing Euler-Lagrange of Eqn. (2) to findthe minimum, Eqn. (3) can be obtained

∂C(t)∂t

= g(I)κ �N − (�g · �N) �N (3)

where κ is the Euclidean curvature, �N is the unit inwardnormal to the curve. Define Ψ: [0, a] × [0, b] → R to be animplicit representation of the curve C. This representation isparameter free, intrinsic and topology free. Embedding theevolution of C in that of Ψ and adding an “area constraint”term cg(I)|�Ψ| to Eqn. (3), we obtain

∂Ψ∂t

= g(c + κ)|�Ψ| + �Ψ · �g (4)

where c is an image-dependent balloon force added to forcethe contour to flow inwards (or outward). In this level setframework, the surface Ψ evolves at every point perpendic-ular to the level sets as a function of the curvature at thatpoint and the image gradient [9].

3. Constructing the Shape Prior Model

Before constructing the shape prior model, the collectedtraining shapes need to be aligned first. Once they share thesame orientation and size, further statistical learning can becarried out.

3.1. Shape Alignment

Since the training shapes extracted from different imagesmay be with different translation, rotation and scale, theyneed to be aligned to the same orientation and size. Weadopt the Euclidean similarity transformations [14] to ad-just the shapes. A Euclidean similarity transformation of apoint x into a point x′ is defined as

x′ = αAx + T (5)


where α is an isotropic scaling factor, A is a rotation matrix,and T is a translation vector.

Fig. 1 compares the degree of shape overlap before andafter alignment. It can be seen that after alignment, theshapes roughly share the same center, with approximatelyequal size, and oriented towards the same direction. Theobject characteristics are much clearer. Note that this align-ment method also need to be used in adjusting the shapeprior model to the same orientation and size with the cur-rent evolving curve during object detection process.

3.2. Creating the training data

The level set representation (signed distance function) ofshapes are chosen to represent the shapes and used as train-ing data. Each shape is embedded as the zero level set ofa higher dimensional surface Ψi. The points on the surfaceencode the distance to the nearest point on the shape, withnegative distances assigned to the inside and positive dis-tances assigned to the outside. To convert these distancemaps into feature vectors, they are rearranged into columnvectors and form a matrix M

M = {Ψ1,Ψ2, · · · ,Ψn} (6)

M is a N × n size matrix, where N is the dimensionof training vector and n is the number of training samples.Figure 2 shows some examples of the training data.

3.3. KPCA

The kPCA makes use of the “kernel trick” which ex-presses the dot products in feature space in terms of kernelfunctions in input space to perform an implicitly nonlinearmapping of the data [13]. In the high dimensional space(can be infinitely high), the mapped data will distribute sim-pler and are more likely to be linearly separated.

Given a set of examples xi, i = 1, · · · , n, xi ∈ RN ,

define a nonlinear map Φ : x ∈ RN → X ∈ F. Here the

data are assumed to be centered so that∑n

i=1 Φ(xi) = 0.The covariance matrix in F is

C =1n

n∑j=1

Φ(xj)Φ(xj)� (7)

Performing the eigen decomposition on C with λV =CV. Note that the solutions V lie in the span ofΦ(x1), · · · ,Φ(xn), so the following equations

λ (Φ(xi) · V) =(Φ(xi) · CV

) ∀i = 1, · · · , n (8)

and V =n∑

i=1

βiΦ(xi) (9)

Figure 2. Examples of training data. Each image is the

level set representation of a corresponding shape.

can be generated, where βi are the coefficients.Define a n × n matrix K by Kij := (Φ(xi) · Φ(xj)),

this dot product matrix can be computed by kernel functionk(x,y) = (Φ(x) · Φ(y)). Thus, the eigenvalue problem isconverted to

Mλβ = Kβ (10)

To extract the principal components, the projection ontothe eigenvectors Vl need to be computed. For our case, letΨ be a test data, then

(kPC)l(Ψ) = (Vl · Φ(Ψ)) =n∑

i=1

βlik(xi,Ψ) (11)

Fig. 3 shows the result of projecting our training data tothe feature subspace using kPCA.

3.4. Density estimation in subspace

In the reduced subspace, the data distribution is esti-mated by a Gaussian Mixture Model (GMM) [6]. The ad-vantages of using GMM is that firstly, we can build a prob-ability density function over several clusters correspond-ing to different object shapes. Thus, with only one shapeprior model, we can recover the original shapes of multi-ple objects. Secondly, the GMM we used is an unsuper-vised learning method which can automatically estimate theproper number of Gaussian components. This is especiallydesirable when the object shapes are extracted from videosequences since the data will be clustered according to theirshape properties. We don’t need to know the number ofclusters in advance.

In [6], the desired number of Gaussian components isdetermined by finding a minimum message length L

L(θ,Y) =N

2

k∑m=1

log(nαm

12

)+

k

2log

n

12

+k(N + 1)

2− log p(Y|θ) (12)


and the parameters Θ = θ1, · · · , θk of GMM is obtained by

θ = arg minθ

L(θ,Y) (13)

Thus, the final probability density function is

P (y) =k∑

m=1

αmp(y|θm) (14)

with k being the number of components, α1, · · · , αk beingthe mixing probabilities, and each p(y|θm) being a multi-variate Gaussian distribution

p(y|θm) =(2π)−

d2√|Σm| exp

[−1

2(y − µm)�Σ−1

m (y − µm)]

(15)Fig. 4 illustrates the result of fitting GMM with our train-

ing data in the feature subspace. Each cluster correspondsto a shape state.

4. Combine shape prior model with geodesicactive contour framework

For the current evolving curve, the probability densityfunction in terms of its level set representation Ψ is

P (Ψ) =k∑

m=1

αmp(Ψ|θm) (16)

where Ψ is the projected vector after kPCA.Our task is to pull the geodesic active contour towards

a preferred shape such that its probability P will be maxi-mized. For computational convenience, this problem is con-verted to minimize the negative log of the probability, andis defined to be a shape energy term

Eshape = − log P (Ψ) (17)

In order to minimize Eqn. (17), we search the gradientdescent direction with respect to the level set function Ψ

∆Ψ∆t

=∂Eshape

∂Ψ=

∂Eshape

∂Ψ· ∂Ψ∂Ψ

(18)

The first term on the right side of Eqn. (18) is

∂Eshape

∂Ψ=

∂ log P (Ψ)∂Ψ

=1

P (Ψ)· ∂P (Ψ)

∂Ψ(19)

= − 1P (Ψ)

k∑m=1

αmp(Ψ|θm) · Σ−1m (Ψ − µm)

To calculate the second term on the right side of Eqn.(18), we note that Ψ is the projected vector of Ψ after kPCA,

−0.6−0.4

−0.20

0.20.4

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

−0.3

−0.2

−0.1

0

0.1

0.2

Figure 3. Training data in feature subspace after kPCA.

Different color corresponds to different shape state.

−0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3−0.2

−0.1

0

0.1

0.2

0

0.5

1

1.5

2

2.5

3

3.5

4

Figure 4. Fitting GMM to our training data in subspace.

Ψ =∑n

i=1 βlik(Ψi,Ψ). The kernel function k(x,y) we

used is Gaussian radius basis function (RBF)

k(x,y) = exp(−||x − y||2

2σ2

)(20)

so∂Ψ∂Ψ

= −n∑

i=1

βlik(Ψi,Ψ) ·

(Ψi − Ψ

σ2

)(21)

At last, Eqn. (19) and (21) are brought into (Eqn. 18) togenerate the final expression

∆Ψ∆t

=1

P (Ψ)·{

k∑m=1

αmp(Ψ|θm) · Σ−1m (Ψ − µm)

}

·{

n∑i=1

βlik(Ψi,Ψ)

(Ψ − Ψi)σ2

}(22)

The first term on the right side of Eqn. (22) acts asan adaptive increment at each iteration step. If the testing


shape is far away from the preferred shape style, the proba-bility P (Ψ) will be small, which results in a large incrementstep of the shape deformation. On the other hand, if the test-ing shape is similar to the shape prior, then probability Pwill be small and the shape deformation will change slowlyuntil it finally converges. The second term in Eqn. (22) isderived from differencing shape energy with respect to theprojected vector Ψ. Actually, during the energy minimiza-tion process, this term evolves the curve in the directionof the maximum probability position. Thus, it controls theconvergence of the curve. The last term in Eqn. (18) explic-itly links the information in training data with the currenttest shape. It is the main force to recover the characteristicsof object shapes.

Now we need to analyze the second term in Eqn. (18)again. As aforementioned, it evolves the curve towards themaximum probability position. However, as P is the prob-ability function of mixture of Gaussian components, it haslocal maximum value. But what we want is a shape thatcan globally match the properties of training data instead oflocally matching. So we replace the second term with thefollowing function[

maxΨ

αmp(Ψ|θm)]· Σ−1

m (Ψ − µm) (23)

Eqn. (23) can be considered a simple Bayesian classifierwhich will “classify” the testing shape to one of the clustersin the subspace according to the posteriori probability. Inthis way, we can avoid the local extremum in the functionand obtain a reliable result.

Rewrite Eqn. (22) in differencing form

Ψn+1 = Ψn + ∆t1 · A (24)

where A represents

A =1

P (Ψn)·{[

maxΨn

p(Ψn|θm)]Σ−1

m (Ψn − µm)}

·{

n∑i=1

βlik(Ψi,Ψn)

(Ψn − Ψi)σ2

}(25)

Rewrite the geodesic active contour function Eqn. (4) indifferencing form also

Ψn+1 = Ψn + ∆t2 · (g(c + κ)|�Ψn| + �Ψn · �g) (26)

Combining Eqns. (24) and (26) together, we can obtainthe final expression of our curve evolution equation

Ψn+1 = Ψn + ∆t2 ·(g(c + κ)|�Ψn| + �Ψn · �g

)+ ∆t1 · A (27)

The image gradient information in Eqn. (26) providesan edge-driven component to Eqn. (27), while the shape

−0.6 −0.4 −0.2 0 0.2

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

Figure 5. Energy minimization process of our proposed

method. The two traces in the figure represent the con-

vergence process of two human shape states. They start

from an obscured shape and finally converged to the cen-

ter of each shape cluster.

energy term in Eqn. (24) provides a knowledge-driven com-ponent to Eqn. (27). Thus, the curve can not only capturethe local deformation based on edge information but alsopreserve a global shape consistency under prior model.

5. Experimental results

The proposed shape prior model framework is tested onthe real video sequences with artificially created obscura-tion to verify its effectiveness. The videos are obtained fromCAVIAR database [7]. Five human shape states, such aswalking, standing and raising arms during a walking pro-cess are selected as training shapes to build the shape priormodel. The number of principal components k in kPCA ischosen to be 5.

Fig. 5 show the examples of energy minimization pro-cess of two shapes. As discussed in section 4, we force thecurve to move towards the mean of each shape cluster byreplacing the second term in Eqn. (22) with a form of “clas-sification” term in order to avoid the local extremum of thefunction. From Fig. 5 we can see that each shape (the twotraces) move exactly towards its corresponding shape clus-ter and finally converged to the center.

We partially occlude the human shape with a horizon-tal bar as shown in Fig. 6 column (a). It can be seen thatthe original shapes are destroyed. Hence the traditionalgeodesic active contour method which evolves its curveonly based on edge information cannot find the correct hu-man contour and converged to wrong position as shown inFig. 6 column (b). By incorporating our shape prior modelwith GAC algorithm and evolving the curve both under theinfluence of edge information and the shape prior knowl-edge, it can be seen that the original human contours can be


successfully recovered.

6. Conclusions

A novel shape prior model based on statistical learningfor robust object detection by geodesic active contours isproposed in this paper. The level set representation of ob-ject shapes under variational changes are collected as train-ing data. They are assumed to lie in a low dimensionalsubspace and form distinct clusters after a nonlinear map-ping (kPCA). Then Gaussian mixture model (GMM) is em-ployed to estimate the data distribution in the subspace. Theobtained probability density function is treated as an energyterm and is incorporated into geodesic active contour func-tion to constrain the curve evolution within a global shapeconsistency. Experimental results demonstrate that the pro-posed shape prior model can successfully handle the objectocclusion problems in video sequences and show good de-tection performance.

References

[1] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active con-tours. Int. J. Comput. Vision, 22(1):61–79, 1997.

[2] Y. Chen, S. Thiruvenkadam, H. D. Tagare, F. Huang, D. Wil-son, and E. A. Geiser. On the incorporation of shape priorsinto geometric active contours. IEEE Workshop on Vari-ational and Level Set Methods in Computer Vision, pages145–152, July 2001.

[3] D. Cremers. Dynamical statistical shape priors for level setbased tracking. IEEE Trans. Pattern Anal. Machine Intell.,2006, to appear.

[4] D. Cremers, T. Kohlberger, and C. Schnorr. Shape statisticsin kernel space for variational image segmentation. PatternRecognition, 36(9):1929–1943, Sept. 2003.

[5] D. Cremers, C. Schnorr, and J. Weickert. Diffusion-snakes:combining statistical shape knowledge and image informa-tion in a variational framework. IEEE Workshop on VLSM,pages 137–144, July 2001.

[6] M. A. T. Figueiredo and A. K. Jain. Unsupervised learningof finite mixture models. IEEE Trans. Pattern Anal. MachineIntell., 24(3):381–396, March 2002.

[7] http://homepages.inf.ed.ac.uk/rbf/CAVIAR/.[8] M. Kass, A. Witkin, and D. Terzopolous. Snakes: active

contour models. Int. J. Comput. Vision, 1:321–331, 1988.[9] M. Leventon, E. Grimson, and O. Faugeras. Statistical shape

influence in geodesic active contours. IEEE Conf. on Comp.Vis. and Patt. Recog., 1:316–323, June 2000.

[10] R. Malladi, J. A. Sethian, and B. C. Vemuri. Shape modelingwith front propagation: a level set approach. IEEE Trans.Pattern Anal. Machine Intell., 17(2):158–175, Ferb. 1995.

[11] N. Paragios and R. Deriche. Geodesic active contours andlevel sets for the detection and tracking of moving objects.IEEE Trans. Pattern Anal. Machine Intell., 22(3):266–280,March 2000.

(a) (b) (c)

Figure 6. Detection results of proposed method on dif-

ferent human shape states. Column (a) is the partially

occluded human shapes, column (b) is the detection re-

sults of traditional geodesic active contours, column (c)

is the detection results of our proposed method.

[12] M. Rousson and N. Paragios. Shape priors for level set rep-resentation. Proc. of the Europ. Conf. on Comp. Vis., pages78–92, May 2002.

[13] B. Scholkopf, A. Smola, and K. R. Muller. Nonlinear com-ponent analysis as a kernel eigenvalue problem. NeuralComputation, 10:1299–1319, 1998. Technical Report No.44, 1996.

[14] M. B. Stegmann and D. D. Gomez. A brief introduction tostatistical shape analysis, Mar 2002.

[15] A. Tsai, A. Yezzi, W. Wells, C. Tempany, D. Tucker, A. Fan,W. E. Grimson, and A. Willsky. Model-based curve evo-lution technique for image segmentation. IEEE Conf. onComp. Vis. and Patt. Recog., 1:463–468, 2001.

[16] A. Yilmaz, X. Li, and M. shah. Contour-based object track-ing with occlusion handling in video acquired using mo-bile cameras. IEEE Trans. Pattern Anal. Machine Intell.,26(11):1531–1536, Nov. 2004.

[17] T. Zhang and D. Freedman. Tracking objects using densitymatching and shape priors. Int’l Conf. on Comp. Vision,2:1056–1062, Oct. 2003.


[ieee 2006 ieee international conference on video and signal based surveillance - sydney, australia...

Documents