human gait recognition by the fusion of motion and static spatio temporaltemplates

Pattern Recognition 40 (2007) 2563–2573www.elsevier.com/locate/pr

Human gait recognition by the fusion of motion and static spatio-temporaltemplates

Toby H.W. Lam, Raymond S.T. Lee∗, David ZhangDepartment of Computing, The Hong Kong Polytechnic University, Kowloon, Hung Hom, Hong Kong

Received 5 December 2005; received in revised form 24 June 2006; accepted 16 November 2006

Abstract

In this paper, we propose a gait recognition algorithm that fuses motion and static spatio-temporal templates of sequences of silhouetteimages, the motion silhouette contour templates (MSCTs) and static silhouette templates (SSTs). MSCTs and SSTs capture the motion and staticcharacteristic of gait. These templates would be computed from the silhouette sequence directly. The performance of the proposed algorithmis evaluated experimentally using the SOTON data set and the USF data set. We compared our proposed algorithm with other research workson these two data sets. Experimental results show that the proposed templates are efficient for human identification in indoor and outdoorenvironments. The proposed algorithm has a recognition rate of around 85% on the SOTON data set. The recognition rate is around 80% inintrinsic difference group (probes A–C) of USF data set.� 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

Keywords: Gait recognition; Motion silhouette contour templates; Static silhouette templates; Biometrics

1. Introduction

Biometrics has received substantial attention from re-searchers. Biometrics is method of recognizing a human ac-cording to physiological or behavioral characteristic. Gait isone of the biometrics that different from the traditional bio-metrics. Gait is the manner of walking. Early medical studyshowed that individual gaits are unique, varying from personto person and are difficult to disguise [1]. In addition, it hasbeen shown that gaits are so characteristic that we recognizefriends by their gait [2] and that a gait can even reveal an indi-vidual’s sex [3]. Unlike other biometrics such as finger-printsand palm-prints, gait recognition requires no contact with acapture device as a gait can be captured in a distance as a lowresolution image sequence.

Gait recognition is basically divided into two types: model-based and model-free recognition [4]. In model-based recog-nition, researchers use information gathered from the human

∗ Corresponding author. Tel.: +852 27667298.E-mail address: [email protected] (R.S.T. Lee).

0031-3203/$30.00 � 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.doi:10.1016/j.patcog.2006.11.014

body, especially from the joints, to construct a model for recog-nition. In general, model-based approach is view and scaleinvariant. To gather these gait information, a high quality ofthe gait sequences is required. Thus, some of the model-basedrecognition systems require multi-camera for collecting the in-formation. One of the classical model-based gait recognitionexperiments was undertaken by Johansson [5], who attachedlight bulbs to a human and then used the movement of the lightbulbs to capture the subject’s motion. In Ref. [6], the bodycontours are computed in each frame of the walking sequence.A stick model is then created from the body contours forrecognition. Johnson and Bobick proposed a multi-view gaitrecognition algorithm which used static body parameters forrecognition [7]. These static body parameters are the height ofthe silhouette, the distance between the head and pelvis, the dis-tance between the left and right foot and the maximum value ofthe distance between the pelvis and the feet. Lee and Grimson[8] proposed a similar approach for recognition. They used thesilhouette images to compute the seven features vectors, such asthe aspect ratio and centroid of the silhouette, for recognition.In addition, they also proposed gait features which based onspectral components for recognition [8]. Wagg and Nixon [9]

http://www.elsevier.com/locate/pr

mailto:[email protected]

2564 T.H.W. Lam et al. / Pattern Recognition 40 (2007) 2563–2573

presented an automated model-based method for gait extractionwhich based on the mean shape and motion information of gait.

At present, most current gait recognition research usesmodel-free (or holistic) recognition. This means using motioninformation directly without the need for a model reconstruc-tion. Model-free approaches usually use sequences of binarysilhouettes, extracting the silhouettes of moving objects from avideo using segmentation techniques such as background sub-traction. Techniques for use in moving object recognition in-clude Murase and Sakai’s [10] proposed parametric eigenspacerepresentation. The eigenspace technique was originally usedin face recognition [11], but Murase and Sakai [10] applied itto gait recognition and lip reading, projecting the extracted sil-houette images onto the eigenspace using principle componentanalysis (PCA). The sequence of movement forms a trajectoryin the eigenspace, a parametric eigenspace representation. Theinput image sequence is preprocessed to form a sequence ofbinary silhouettes and this binary sequence is projected toform a trajectory in the eigenspace. The smallest distance be-tween the input trajectory and the reference trajectory is thebest match. Huang et al. [12] applied a similar technique forpurposes of gait recognition using linear discriminating anal-ysis (LDA), also known as canonical analysis. The advantageof using LDA is that it discriminates better between differ-ent classes. Three different types of temporal templates wereproposed, all generated by the computation of optical flow.Canonical analysis allows the temporal template sequence tobe projected to form a manifold in the subspace.

Foster et al. [13] presented an area-based metric which iscalled gait masks. First, they masked each silhouette in an imagesequence and then measure the unmasked area. Difference inthis information is used to form a time-varying signal which canbe used as a signature in automatic gait recognition. Hayfron-Acquah et al. [14] proposed a gait recognition method that usesa generalized symmetry operator which exploits the symmetryof human motion, using a symmetry operator on the edge of thesilhouette image to generate a symmetry map. A gait signatureis then generated by using a fast Fourier transformation (FFT)on the mean of the symmetry map. BenAbdelkader et al. [15]had a similar idea for gait recognition that proposed a new gaitrepresentation called image-self similarity plot. These plots arethen projected to an eigenspace by using PCA.

Wang and Tan [16] proposed a new transformation methodfor reducing the dimensionality of the input feature space byunwrapping a 2D silhouette image and transforming it into a1D distance signal. The sequences of silhouette images are thentransformed to create a time-varying distance signals. Finally,they apply the eigenspace transformation to the distance signal.Liu and Sarkar [17] proposed another representation for gaitrecognition which is called averaged silhouette. The silhouettesequence is transformed into a single image representation forrecognition. Euclidean distance is adopted for similarity mea-sures between these representations.

In this paper, we propose a fast, robust and innovativegait recognition algorithm based on motion and static spatio-temporal templates. We propose to recognize gaits using twogait feature templates in combination, motion silhouettes con-

tour templates (MSCTs) and static silhouette templates (SSTs).MSCTs and SSTs embed critical spatial and temporal infor-mation. The use of these templates reduces the computationcost and the size of the database. The efficacy of the proposedmethod in indoor and outdoor environments has been demon-strated on the SOTON [18] and USF data sets [19]. The restof this paper is organized as follows. In Section 2, we describethe detail on the MSCT and SST. Section 3 provides detailsof the proposed recognition algorithm. Section 4 presents theexperimental results and Section 5 offers our conclusion.

2. Features extraction

In this section, we describe the details of the motion spatio-temporal template, MSCT, and static spatio-temporal template,SST.

2.1. Motivation

The main motivation of the proposed templates is to constructdiscriminative representations from the motion and static char-acteristics of the walking sequence for recognition. As men-tioned in previous section, there are different approaches forgait recognition such as holistic and model-based approach.One of the common approaches is extracting features from eachframe of a walking sequence and then generating a sequenceof features [10,12,13,16]. Unlike these methods, in this paper,we proposed a method which simply extract two feature tem-plates, exemplar MSCT and exemplar SST, from a sequenceof silhouette images for recognition. We consider the motioncharacteristic of gait is the part of the body in motion dur-ing walking, such as hand and leg. This motion characteristicis captured by using the contour of the silhouette images. Wesuppose the static characteristic of gait is the torso which re-mains steady during walking. This static characteristic is cap-tured by using the silhouette images directly. Compared withthe existing research works, instead of creating only one repre-sentation for recognition, two representations are constructedfor reorganization in our proposed algorithm. The proposedtemplates for gait recognition are simple and computationallyefficient. It could be computed without the need to generate asequence of features or perform any transformations. Differentfrom the model-based approach, it is not necessary to constructany model for recognition. The template construction processis simple and the construction time is short. Thus, it is suitablefor real-time gait recognition. The following section describeshow to construct exemplar MSCT and exemplar SST.

2.2. Motion silhouette contour template (MSCT) and staticsilhouette template (SST)

The basis of these templates is a sequence of silhouette im-ages. A MSCT contains information about the movement char-acteristics of a human gait and a SST contains informationabout the static characteristics of a human gait. These templatesare used together for gait recognition. First, the silhouettes are

T.H.W. Lam et al. / Pattern Recognition 40 (2007) 2563–2573 2565

Fig. 1. Flow diagram of the proposed gait recognition algorithm.

extracted and normalized to a fixed size. Then, the gait periodis estimated from the silhouette sequence. The silhouette se-quence is further divided into several cycles according to theestimated gait period. In each cycle, two templates, an MSCTand a SST, are computed. There are a number of MSCT andSST in each silhouette sequence. To ease of computation, anexemplar MSCT and an exemplar SST are computed by aver-aging the MSCT and SST in each sequence. Fig. 1 shows aflow diagram of the proposed gait recognition algorithm.

2.2.1. PreprocessingIn our proposed algorithm, silhouettes are the basis of gait

recognition. The silhouettes are extracted by simple backgroundsubtraction and thresholding [20]. Then, the binarization pro-cess renders the image in black and white; the background isblack and the foreground is white. The bounding box of the sil-houette image in each frame is computed. The silhouette imageis extracted according to the size of the bounding box and theextracted image is resized to a fixed size (128 ∗ 88 pixels). Thepurpose of normalization is to eliminate the scaling effect. Fig.2 shows the examples of the normalized silhouette images.

Fig. 2. Samples of normalized silhouette.

2.2.2. Gait period estimationHuman walking repeats its motion in a stable frequency.

Since our proposed gait feature templates depend on the gaitperiod, we must estimate the number of frames in each walkingcycle. A single walking cycle can be regarded as that periodin which a person moves from the mid-stance (both legs areclosest together) position to a double support position (bothlegs are furthest apart), then the mid-stance position, followedby the double support position, and finally back to the mid-stance position. Fig. 3 shows samples of silhouette images inone cycle.

The gait period Pgait can then be estimated by calculatingthe number of foreground pixels in the silhouette image [19].In mid-stance position, the silhouette image contains a small-est number of foreground pixels. In double support position,the silhouette contains a greatest number of foreground pix-els. However, because sharp changes in the gait cycle are mostobvious in the lower part of the body, gait period estimationmakes use only of the lower half of the silhouette image, withthe gait period being the median of the distance between twoconsecutive minima.

2.2.3. Generating exemplar MSCTsA MSCT contains the motion information about the human

gait. A MSCT could be constructed in three steps. First, thesilhouette images are extracted and normalized to a fixed sizeof 128*88 pixels. Then, the sequence of silhouettes is usedto estimate the gait period Pgait . Finally, the silhouette imagesequences are then divided into several cycles according to theestimated gait period Pgait . MSCTs are created by using theimage sequences and the gait period Pgait . The exemplar MSCTis the average MSCTs.

The MSCT is generated from a sequence of silhouette con-tours. The contour of the silhouette CSi is obtained by subtractsthe original silhouette Si with the eroded silhouette ESi, as inEq. (1). The eroded silhouette ESi could be computed by ero-sion operation, as in Eq. (2):

CSi = Si − ESi = Si − (Si�S), (1)

ESi = Si�S =⋂s∈S

(Si)−s , (2)

where Si is the original silhouette which is used to erode, ESiis the eroded silhouette, CSi is the silhouette contour, S is thestructuring element and � is the eroding operator. (Si)−s rep-resents the translation of silhouette image Si by s.

The structuring element S is a set of coordinate points. Theforeground and background pixels are represented by 1’s and


Fig. 3. Samples of silhouette images in one walking cycle.

1 1 1

1 1 1

1 1 1

Fig. 4. The 3 × 3 structuring element adopted for erosion operation.

0’s respectively. Fig. 4 shows the structuring element which weadopted for erosion operation. The eroding operator is used tosuperimpose the structuring element with the input image. Ifeach nonzero element of the structuring element is contained inthe input image, then the output pixel is 1. Otherwise, the outputpixel is 0 [21]. Fig. 5 shows the image of original silhouette,eroded silhouette and the silhouette contour which computedby erosion operation.

An algorithm (3) is used to create the sequence of silhouettecontour images:

MSCT i(x, y, t)

={

255 if CSii(x, y, t) = 1,

max(0, MSCT i(x, y, t−1)−�) otherwise,(3)

where i is the cycle number in the gait sequence, � is theintensity decay parameter and MSCTi is the number i motionsilhouette contour template. The intensity decay parameter �could be computed using the following formula:

� = 255/Pgait , (4)

where Pgait is the estimated gait period. The use of a dynamicdecay value rather than a fixed intensity decay parameter elim-inates the walking speed effect. Fig. 6 shows some examplesMSCT.

The number of MSCT in a sequence depends on the gaitperiod and the number of frames in the walking sequence. Thefact that different subjects may produce different numbers of

MSCT may increase the degree of computational complexity.To reduce this complexity, an exemplar MSCT is obtained. Theexemplar MSCT is the average MSCT MSCTi in each walkingsequence as

MSCT =∑n

i=1 MSCT i

n, (5)

where n is the number of MSCT in the sequence. Fig. 7 showssome examples of exemplar MSCT.

A great advantage of using MSCT is that the contour imagesfrom which they are formed are an order of magnitude smallerthan silhouette images and are thus more computationally effi-cient. However, if the silhouettes are extracted at a low quality,an MSCT may embed irrelevant information which affects therecognition rate. In the following section we describe how thiserror can be reduced by using the SST.

2.2.4. Generating exemplar SSTsSSTs are used in our recognition algorithm in conjunction

with MSCTs as a way of reducing the recognition rate error. ASST is generated in much the same way as a MSCT except thatit uses the entire silhouette image. The SST can be generatedby using the following algorithm:

SSTi (x, y, t) ={

1 if SSTi (x, y, t) = SSTi (x, y, t − 1),

0 otherwise,(6)

where i is the cycle number in the gait sequence and SSTi isthe i static silhouette template. Fig. 8 shows the examples ofthe SST.

As in the generation of MSCT, the number of SST SSTi inthe sequence depends on the gait period and the number offrames in the sequence. We further obtain the exemplar SSTby averaging the SST i in each walking sequence given as

SST =∑n

i=1 SST i

n, (7)


Fig. 5. (a) Original silhouette, (b) eroded silhouette, (c) silhouette contour.

Fig. 6. Examples of motion silhouette contour template (MSCT).

where n is the number of cycle in the sequence. Fig. 9 showssome examples of exemplar SST.

3. Recognition

The similarity score represents the level of similarity betweenthe testing data and the training data. In this section, we explainthe detail of the similarity measures in our proposed algorithm.For ease of understanding, gallery means training and probemeans testing. Suppose there are Ngallery subjects in the gallerydata set, Nprobe subjects in the probe data set and each subjectcontains a walking sequence. A probe sequence is capturedand then measures the similarity score between each trainingsequence.

Suppose u is the probe sequence and v be the gallery se-quence. A probe sequence Sequ = {Sequ(1), Sequ(2) . . .

Sequ(P )} from training data set and a gallery sequence

Seqv = {Seqv(1), Seqv(2) . . . Seqv(Q)} from testing data setare used for calculating the similarity score, where P andQ are, respectively, the number of frames in the probe andgallery sequences. We calculate the gait period of each subjectin the probe and gallery sequences. We follow the proceduresdescribed in Section 2 to create the exemplar MSCT and SSTfor the gallery and probe sequences. After that, each subjectwould have two templates, MSCT u and SST u, for the probesequence and another two templates, MSCT v and SST v , forgallery sequence.

Our algorithm makes use of two similarity scores. To mea-sure the similarity between gallery and probe MSCT, we cal-culate the similarity score SimScoreMSCT . To measure thesimilarity between gallery and probe SST, we calculate thesimilarity score SimScoreSST . These similarity scores are cal-culated by using the Euclidean distance. The similarity scoreSimScoreMSCT can be computed by Eq. (8) and similarity


Fig. 7. Examples of exemplar MSCT.

Fig. 8. Examples of static silhouette template (SST).

score SimScoreSST can be calculated by Eq. (9):

SimScoreMSCT (MSCT u, MSCT v)

= ‖MSCT u − MSCT v‖SimScoreMSCT

, (8)

SimScoreSST (SST u, SST v) = ‖SST u − SST v‖SimScoreSST

, (9)

where MSCT u is the exemplar MSCT and SST u is the ex-emplar SST of the probe sequence u, MSCT v is the exemplar

MSCT and SST v is the exemplar SST of the gallery sequencev, SimScoreMSCT and SimScoreSST are the mean similarityscore of the exemplar MSCT and exemplar SST, respectively.SimScoreMSCT and SimScoreSST can be computed by Eqs.(10) and (11), respectively:

SimScoreMSCT =∑Ngallery

i=1

∑Nprobe

j=1 ‖MSCT i − MSCT j‖Ngallery ∗ Nprobe

,

(10)


Fig. 9. Examples of exemplar SST.

SimScoreSST =∑Ngallery

i=1

∑Nprobe

j=1 ‖SST i − SST j‖Ngallery ∗ Nprobe

, (11)

where Ngallery is the number of subjects in gallery set andNprobe is the number of subjects in probe set. The final simi-larity score SimScore between two subjects can be calculatedas follows:

SimScore(u, v) = SimScoreMSCT (MSCT u, MSCT v)

+ SimScoreSST (SST u, SST v). (12)

In our proposed recognition algorithm, nearest neighbor(NN) classifier is adopted for classification. For a testing sam-ple u, we calculate the final similarity score SimScore witheach subject in the gallery data set by Eq. (12). Thus, there aretotally Ngallery final similarity score SimScore. The sample uis classified as v when the final score SimScore is the minimumobtained from any of the training patterns. Thus, the testingsample u is classified as the subject v if

min SimScore(u, i) = SimScore(u, v), (13)

where i = 1, 2, . . . Ngallery .

4. Experiments

In this section, we show the performance of proposed gaitrecognition algorithm in two data sets, the SOTON data set [18]and the USF data set [19]. The SOTON data set was captured inindoor environment and the USF data set was captured in out-door environment. Fig. 10 shows some silhouette images fromthese two data sets. For the evaluation, we adopted the FERETscheme [1] and measured the identification rate and the verifi-cation rate using cumulative match characteristics (CMCs). Allexperiments are implemented using Matlab and tested on a P42.26 GHz computer with 512 MB memory.

4.1. Recognition on the SOTON data set

For the SOTON data set, we had to make a number of adjust-ments to the data set. Since there were insufficient frames ineach walking sequence of SOTON data set to estimate the gaitperiod, to generate the proposed feature templates, we fixed thegait period as 30. There were varying numbers of walking se-quences for each subject in the data set, so we constructed threefurther data sets: data sets A, B, and C. In data set A, 50% ofthe image sequences of each subject were selected for trainingand the remainders were used for testing. In data set B, 75% ofthe image sequences of each subject were selected for trainingand the remainders were used for testing. In data set C, 90% ofthe image sequences of each subject were selected for trainingand the remainders were used for testing.

The proposed algorithm was tested to determine its ability torecognize by using an MSCT and an SST together, an MSCTalone, and an SST only. The NN classifier was used in thesetests. Table 1 shows that the algorithm achieved its best resulton the combined templates, with a recognition rate of above86% for all three subsets. Fig. 11 shows the recognition ratefor the three data sets plotted as a CMC curve. The algorithmperforms well on both the MSCT and SST but MSCT is thebetter of the two.

4.2. Recognition on the USF data set

In this experiment, the proposed algorithm was evaluated onthe outdoor data set—USF data set. The USF version 2.1 dataset contains 1 gallery set and A–L (12) probe sets. This data setoffers experimental challenges in that it contains a number ofcovariates such as shoe types, surface types and viewing angles.We compared our proposed algorithm with baseline algorithm[19] and UMD Hidden Markov Model (HMM) algorithm [22].


Fig. 10. Sample silhouette images from (a) SOTON data set. (b) USF data set.

Table 1Performance on SOTON HiD data set

Rank 1 (%) Rank 5 (%) Rank 10 (%)

(I) MSCT and SST(1) 50% train 50% test 86.41 94.75 96.16(2) 75% train 25% test 88.95 94.30 95.54(3) 90% train 10% test 89.56 95.18 95.58

(II) MSCT(1) 50% train 50% test 81.16 90.25 93.72(2) 75% train 25% test 85.03 91.80 94.65(3) 90% train 10% test 88.35 93.98 95.18

(III) SST(1) 50% train 50% test 77.69 89.50 93.25(2) 75% train 25% test 81.64 91.09 93.58(3) 90% train 10% test 80.32 92.37 93.98

98

96

94

92

90

88

86

0 5 10 15 20 25

Rank

Cum

ula

tive M

atc

h S

core

90% train, 10% test

75% train, 15% test

50% train, 50% test

Fig. 11. Cumulative match characteristic for SOTON data set.

To ease our explanation, we placed under three group headings:(I) intrinsic difference, (II) surface difference, and (III) extrinsicdifference on USF version 2.1 data set. Table 2 provides moredetailed information about these groupings. The recognitionperformance is illustrated in Table 3.

The experimental result shows that the proposed algorithmis only slightly worse than the baseline algorithm in group (II).

Table 2USF HiD probe sets (version 2.1)

Group Probe Data set covariates Number ofsamples

(I) Intrinsic difference A View 122B Shoe 54C View, shoe 54

(II) Surface difference D Surface 121E Surface, shoe 60F Surface, view 121G Surface, shoe, view 60

(III) Extrinsic difference H Briefcase 120I Shoe, briefcase 47J View, briefcase 70K Time, shoe, clothing 33L Surface, time, shoe, clothing 33

Compared with the baseline algorithm, the proposed algorithmhas a better recognition rate for groups (I) and (III). The rank 1performance in group (I) of the proposed algorithm is slightlyworse than the UMD HMM algorithm by 2%. However, thereis a distance in the performance of the proposed algorithmcompared with UMD HMM approach in groups (II) and (III).

This gives rise to a number of interesting observations. Itwould seem that the proposed templates are insensitive in dif-ferent viewing angle and shoe types. In group (I), the recogni-tion rate is around 66% by using baseline algorithm. Comparedwith baseline algorithm, there is average 14% improvement inrecognition rate in group (I) by our proposed algorithm. Theperformance of the UMD HMM algorithm and the proposed al-gorithms is nearly the same. The rank 1 recognition rate of theUMD HMM algorithm and the proposed algorithms are 82%and 80%, respectively. In group (III) (probes H–L), the averageidentification rate of proposed algorithm is higher than baselinealgorithm by 2% with a significantly high recognition rate inprobes K and L. Although there is a distance between our pro-posed algorithm with the UMD HMM algorithm, there is alsoa high recognition rate in probe K. This would indicate that theproposed templates retain their discriminative power over time.The fact that the proposed algorithm does not work very wellin group (II) (probes D–G, surface differences) indicates thatthe proposed templates are sensitive to the surface type. Fig.12 shows the recognition rate of proposed gait recognition al-


Table 3The match scores of proposed algorithm and other algorithm in USF data set (version 2.1)

Group Probe Baseline UMD HMM MSCT SST MSCT and SST

Rank 1 Rank 5 Rank 1 Rank 5a Rank 1 Rank 5 Rank 1 Rank 1 Rank 1 Rank 5

I A 73 88 89 — 71 92 77 90 80 94B 78 93 88 — 82 96 83 93 89 94C 48 78 68 — 59 85 69 81 72 87Mean score 66 86 82 — 71 91 76 88 80 92

II D 32 66 35 — 15 46 12 41 14 41E 22 55 28 — 10 40 13 32 10 35F 17 42 15 — 8 22 9 29 10 26G 17 38 21 — 8 25 12 28 13 28Mean score 22 50 25 — 10 33 12 33 12 33

III H 61 85 85 — 54 83 38 63 49 78I 57 78 80 — 55 82 33 58 43 75J 36 62 58 — 37 67 19 42 30 61K 3 12 17 — 42 55 39 42 39 55L 3 15 15 — 12 42 9 24 9 36Mean score 32 50 51 — 40 66 28 46 34 61

aSince Rank 5 performance is not mentioned in Ref. [22], it is missed for comparison.

100

90

80

70

60

50

40

30

20

10

0

0 5 10 15 20 25

Rank

Probe AProbe BProbe CProbe DProbe EProbe FProbe GProbe HProbe IProbe JProbe KProbe L

Cum

ula

tive M

atc

h S

core

Fig. 12. Recognition rate by using MSCT and SST together in USF HiDdata set.

gorithm in USF data set with respect to different ranks. In theillustration, rank n means the individual is matched with oneof the top n samples in ordered similarity scores.

We also applied the feature templates individually for gaitrecognition in USF data set. The recognition rate also recordedin Table 3. The recognition rate of using two templates togetheris higher than the recognition rate of using feature templateindividually. MSCT had a higher recognition rate than SST ingroup (III). It means that MSCT could retain more distinctiveinformation than SST in carrying, clothing and time covariant.Figs. 13 and 14 show the recognition rate of using MSCT andSST when applied to the USF data set individually.

Compared with baseline algorithm, the proposed algorithmachieves a significant improvement in groups (I) and (III). The

100

90

80

70

60

50

40

30

20

10

0

0 5 10 15 20 25

Rank

Cum

ula

tive M

atc

h S

core


Fig. 13. Recognition rate by using MSCT in USF HiD data set.

experiments showed that the algorithm does not work very wellif the surface type is different from the gallery set. The ex-tracted silhouette images may include noise such as shadowunder different surface types. The distorted silhouette imagesmay affect the recognition rate. To further improve the recog-nition rate, we should find out some methods to reconstruct thedistorted silhouette images to an noise-free silhouette images.UMD HMM approach uses Hidden Markov Model for recog-nition. Compared with UMD HMM approach, our proposedalgorithm uses the feature templates directly for recognition.Since our proposed algorithm does not have any model con-struction and transformation before recognition, this probablyaffects the recognition performance. In the future, we wouldlike to further investigate to adopt HMM or other statisticalmodel with our proposed templates for gait recognition.


Table 4The match scores of proposed algorithm and CMU algorithm in USF data set (version 1.7)

Group Probe CMU MSCT SST MSCT and SST

Rank 1 Rank 5 Rank 1 Rank 5 Rank 1 Rank 1 Rank 1 Rank 5

I A 87 100 69 89 69 85 79 92B 81 90 54 76 59 71 61 76C 66 83 37 61 34 54 37 61Mean Score 78 91 53 75 34 54 37 61

II D 21 59 16 33 23 39 24 39E 19 50 18 34 11 34 20 41F 27 53 10 31 16 33 20 34G 23 43 5 34 14 30 9 32Mean Score 23 51 12 33 16 34 18 37


100

90

80

70

60

50

40

30

20

10

0

0 5 10 15 20 25

Rank

Cum

ula

tive M

atc

h S

core

Fig. 14. Recognition rate by using SST in USF HiD data set.

We further compared our proposed algorithm with other re-search work, CMU key frame algorithm [23], on USF dataset. In this experiment, we adopted USF version 1.7 data set.In this data set, the silhouette images are extracted from pa-rameterized algorithm. It contains 1 gallery set and A–G (7)probe sets. The data set covariates are similar to version 2.1data set. The research work of CMU uses the key frame ofthe walking sequence for gait recognition [23]. Table 4 showsthe match scored of the proposed algorithm and key frame ap-proach (CMU) in USF version 1.7 data set. The experimentalresults showed that the proposed algorithm is comparable withthe key frame algorithm. The performance of the proposed al-gorithm is better than the key frame approach in probes D andE. The rank 1 recognition rate of the proposed algorithm isslightly worse than CMU algorithm in probes A and F. Dif-ferent from version 2.1 data set, the proposed representations,MSCT and SST, have a better performance in group (II). Sincethe silhouette images in USF version 1.7 are extracted by pa-rameterized algorithm, the quality of the extract silhouettes de-pends on the parameter value. It reveals that the recognitionperformance depends on the quality of the silhouette images.

5. Conclusions

In this paper, we proposed gait recognition algorithm forhuman identification by fusion of motion and static spatio-temporal templates. The proposed algorithm has a promisingperformance under indoor and outdoor environments. Two fea-ture templates are proposed in this paper, they are motion sil-houette contour template (MSCT) and static silhouette template(SST). These templates embed the static and motion charac-teristic of gait. The performance of the proposed algorithm isevaluated experimentally using the SOTON data set [18] andthe USF data set [19]. In the experiments, there is around 85%recognition rate in SOTON data set. In USF data set (version2.1), under the same surface type, the recognition rate of theproposed algorithm is higher than the baseline recognition. Theaverage recognition rate is 80% and 34% in group (I) (probesA–C) and group (III) (probes H–L). The experimental resultsshowed that performance of proposed algorithm is promisingin indoor and outdoor environments.

In our proposed algorithm, two features templates, MSCTand SST, are used together for gait recognition. These featuretemplates retain their discriminative power under the variouscovariates such as shoe type, viewing angle and time. How-ever, when surface type of the probe set is different from thegallery set, the performance of our proposed algorithm is a littleworse than baseline algorithm and USF HMM algorithm. Thisshowed that the discriminative power of these feature templatesis affected by the surface type. The recognition rate is lowereddue to the distorted silhouette images. Since the shadow of hu-man is different under different surface types, this may affectthe accuracy of the silhouette extraction. To further improvethe recognition rate under surface type difference situation, wewill investigate some algorithms for reconstructing silhouetteimages. The algorithm could create an noise-free silhouette im-age under different conditions such as shoe difference, clothingdifference and surface type difference.

In our proposed algorithm, two templates are directly usedfor recognition without any model creation, parameter settingand transformation. The proposed algorithm is simple andsuitable for real-time recognition. Experiments showed that theaverage processing time is around 7.7 s. The performance is


comparable with some existing works. However, the algorithmstill has room for improvement. In the future, we shall alsoseek to apply some dimension reduction techniques like kernelPCA for reducing the computation complexity. In addition, byusing such technique, it could further reduction the executiontime. In addition, we would also like to adopt other statisti-cal model such as Hidden Markov Model to further improvethe recognition performance of our algorithm. Furthermore,we would like to find some new gait feature templates forrecognition.

Acknowledgments

This work was partially supported by the iJADE projects B-Q569, A-PF74 and Cogito iJADE project PG50 of the HongKong Polytechnic University.

References

[1] M.P. Murray, A.B. Drought, R.C. Kory, Walking patterns of normal men,J. Bone Joint Surg. 46 A(2) (1964) 335–360.

[2] J. Cutting, L. Kozlowski, Recognizing friends by their walk: gaitperception without familiarity cues, Bull. Psychon. Soc. 9 (5) (1977)353–356.

[3] C. Barclay, J. Cutting, L. Kozlowski, Temporal and spatial factors ingait perception that influence gender recognition, Percept. Psychophys.23 (2) (1978) 145–152.

[4] N.V. Boulgouris, D. Hatzinakos, K.N. Plataniotis, Gait recognition: achallenging signal processing technology for biometric identification,IEEE Signal Process. Mag. 22 (6) (2005) 78–90.

[5] G. Johansson, Visual motion perception, Sci. Am. (1975) 75–88.[6] S.A. Niyogi, E.H. Adelson, Analyzing and recognizing walking

figures in XYT, Proc. Computer Vision Pattern Recognition (1994)469–474.

[7] A. Johnson, A. Bobick, A multi-view method for gait recognitionusing static body parameters, in: Proceedings of the Third InternationalConference in Audio- and Video-based Biometric Person Authentication,2001, pp. 301–311.

[8] L. Lee, W.E.L. Grimson, Gait analysis for recognition and classification,in: Proceedings of IEEE International Conference in Automatic Faceand Gesture Recognition, 2002, pp. 148–155.

[9] D.K. Wagg, M.S. Nixon, On automated model-based extraction andanalysis of gait, in: Proceedings of IEEE International Conference inAutomatic Face and Gesture Recognition, 2004, pp. 11–16.

[10] H. Murase, R. Sakai, Moving object recognition in eigenspacerepresentation: gait analysis and lip reading, Pattern Recognition Lett.17 (1996) 155–162.

[11] M. Turk, A. Pentland, Face recognition using eigenfaces, Proc. Comput.Vision Pattern Recognition (1991) 586–591.

[12] P.S. Huang, C.J. Harris, M.S. Nixon, Human gait recognition in canonicalspace using temporal templates, IEE Proc. Vision Image Signal Process.146 (2) (1999) 93–100.

[13] J.P. Foster, M.S. Nixon, A. Prügel-Bennett, Automatic gait recognitionusing area-based metrics, Pattern Recognition Lett. 24 (2003) 2489–2497.

[14] J.B. Hayfron-Acquah, M.S. Nixon, J.N. Carter, Automatic gaitrecognition by symmetry analysis, Pattern Recognition Lett. 24 (2003)2175–2183.

[15] C. BenAbdelkader, R. Culter, H. Nanda, L.S. Davis, EigenGait:motion-based recognition of people using image self-similarity, in:Proceedings of International Conference on Audio and Video-basedPerson Authentication (AVBPA), 2001.

[16] L. Wang, T. Tan, Silhouette analysis-based gait recognition for humanidentification, IEEE Trans. PAMI 25 (12) (2003) 1505–1518.

[17] Z. Liu, S. Sarkar, Simplest representation yet for gait recognition:averaged silhouette, in: Proceedings of International Conference onPattern Recognition, vol. 4, 2004, pp. 211–214.

[18] J.D. Shutler, M.G. Grant, M.S. Nixon, J.N. Carter, On a large sequence-based human gait database, in: Proceedings of the Fourth InternationalConference on Recent Advances in Soft Computing, 2002, pp. 66–71.

[19] S. Sarkar, P.J. Phillips, Z. Liu, I.R. Vega, P. Grother, K.W. Bowyer, ThehumanID gait challenge problem: data sets, performance, and analysis,IEEE Trans. PAMI 27 (2) (2005) 162–177.

[20] P.S. Huang, C.J. Harris, M.S. Nixon, Human gait recognition in canonicalspace using temporal templates, IEE Proc. Vision Image Signal Process.146 (2) (1999) 93–100.

[21] R. van den Boomgaard, R. van Balen, Methods for fast morphologicalimage transforms using bitmapped binary images, Comput. VisionGraphics Image Process.: Graphical Models Image Process. 54 (3) (1992)252–258.

[22] A. Kale, A. Sundaresan, A. Rajagopalan, N. Cuntoor, A. RoyChowdhury,V. Kruger, R. Chellappa, Identification of humans using gait, IEEE Trans.Image Process. (2004) 1163–1173.

[23] R.T. Collins, R. Gross, J. Shi, Silhouette-based human identification frombody shape and gait, in: Proceedings of the International Conference onAutomatic Face and Gesture Recognition, 2002, pp. 351–356.

About the Author—TOBY LAM graduated from the Department of Computing of Hong Kong Polytechnic University in 2003. He is now the Ph.D. candidateworking in the fields of Pattern Recognition, Gait Recognition, Biometrics, Intelligent Agent Technology and Agent Ontology.

About the Author—RAYMOND LEE received his B.Sc. from Hong Kong University in 1989, he received his M.Sc. and Ph.D. from Hong Kong PolytechnicUniversity in 1997 and 2000, respectively.After graduation from Hong Kong University, he joined the Hong Kong Government in the Hong Kong Observatory as a Meteorological Scientist for weatherforecasting and took part in the development of Telecommunication systems for the provision of meteorological services from 1989 to 1993.Prior to joining the Hong Kong Polytechnic University in September 1998, Raymond also worked as MIS Manager and System Consultant in various businessorganizations in Hong Kong. In his work, he developed various IS and e-commerce projects.His major research areas include: Intelligent Agent Technology (IAT), Agent Ontology, Chaotic Neural Networks, Pattern recognition, Epistemology, VisualPerception and Visual Psychology, Weather Simulation and Forecasting, Intelligent E-Commerce Systems.

About the Author—DAVID ZHANG (M’92–SM’95) graduated in Computer Science from Peking University in 1974. He received his M.Sc. and Ph.D. inComputer Science from the Harbin Institute of Technology (HIT) in 1982 and 1985, respectively. From 1986 to 1988 he was a Postdoctoral Fellow at TsinghuaUniversity and then an Associate Professor at the Academia Sinica, Beijing. In 1994 he received his second Ph.D. in Electrical and Computer Engineeringfrom the University of Waterloo, Ont., Canada. Currently, he is a Chair Professor, the Hong Kong Polytechnic University where he is the Founding Director ofthe Biometrics Technology Centre (UGC/CRC) supported by the Hong Kong SAR Government. He also serves as Adjunct Professor in Tsinghua University,Shanghai Jiao Tong University, Beihang University, Harbin Institute of Technology, and the University of Waterloo. Professor Zhang is a Croucher SeniorResearch Fellow and Distinguished Speaker of IEEE Computer Society.

human gait recognition by the fusion of motion and static spatio temporaltemplates

Technology

recognition rate

face recognition

holistic recognition

modelfree recognition

gait information

gait sequences

motion information of

gait features