active conditional models - carnegie mellon universityactive conditional models ying chen1 and...
TRANSCRIPT
Active Conditional Models
Ying Chen1 and Fernando De la Torre2
1School of IoT Engineering, Jiangnan University, China. e-mail: [email protected] Robotics Institute, Carnegie Mellon University. Pittsburgh, PA 15213, USA. e-mail: [email protected]
Abstractâ Matching images with large geometric and iconicchanges (e.g. faces under different poses and facial expressions)is an open research problem in computer vision. There are twofundamental approaches to solve the correspondence problemin images: Feature-based matching and model-based matching.Feature-based matching relies on the assumption that featuresare stable across view-points and iconic changes, and it usessome unary, pair-wise or higher-order constraints as a measureof correspondence. On the other hand, model-based approachessuch as Active Shape Models (ASMs) align appearance featureswith respect to a model. The model is learned from hand-labeled samples. However, model-based approaches typicallysuffer from lack of generalization to untrained situations.
This paper proposes Active Conditional Models (ACM) thatcombines the benefits of both approaches. ACM learns theconditional relation (both in shape and appearance) betweena reference view of the object and other view-points or iconicchanges. The ACM model generalizes better to untrainedsituations, because it has less number of parameters (less proneto overfitting) and directly learns variations w.r.t a referenceimage (similar to feature-based methods). Several examplesin the context of facial feature matching across pose andexpression illustrate the benefits of ACMs.
I. INTRODUCTION
Establishing correspondence between images is a funda-mental problem in computer vision. Correspondence betweenimages is needed in many computer vision algorithms such asobject recognition [1], image registration [2], face detectionand tracking [3], face recognition [5], and 3D reconstruction[4]. Approaches to solve for correspondence could be broadlydivided in feature-based and model-based methods. Feature-based methods extract features in both images and use someunary [6], [18], pair-wise [7] or higher-order constraint [10]to solve correspondence. However, feature-based approachestypically fail when there is a large change in camera mo-tion, pose or large iconic changes (e.g. eyes closed). Asshown in Fig. 1(a), feature based approaches fail to detectcorrespondence when there are strong changes in pose andexpression. On the other hand, model-based approachesincorporate a priori information about iconic and view-pointchanges. Many methods, such as Snakes [12], ParameterizedAppearance Models (e.g Active Shape Models(ASM) [13],Active Appearance Models(AAM) [14], [27], 3D MorphableModels (3DMM) [15]) have been successfully applied tosolve facial correspondence. However, these models typicallysuffer from lack of generalization to untrained samples.
Dr. De la Torre was partially supported by National Institute of HealthGrant R01 MH 051435. Dr. Ying was supported by the China ScholarshipCouncil Grant 2008110195.
(a) Feature-based matching(SIFT)
Active Shape
Models
Active
Appearance
Models
Morphable
Models
(b) Model-based matching
Training
Testing
(c) Active conditional matching
Active
Conditional
Models
Labeled
neutral
image
Subject Subject
Training Testing
Fig. 1. (a) Feature-based (SIFT) method for matching two images.(b) Model-based approaches to solve the correspondence between imagefeatures and a model. (c) Active Conditional Models (ACM) learn a mappingfrom a reference image (red box) to other images with changes in pose andexpression.
In this paper we propose a new statistical model, ActiveConditional Models (ACMs), that benefits from both model-based and feature-based matching. ACMs learn the shape andappearance representation of one class of objects conditionalto a reference image. ACMs outperform PAMs in generaliza-tion ability, because they learn the conditional informationrelative to a reference image. Unlike feature-based corre-spondence, ACMs constrain with a shape and appearancemodel. Unlike model-based methods, ACMs incorporates areference image that reduces the amount of person-specificvariability. Fig. 1 illustrates the differences between the threeapproaches to image matching.
The remainder of this paper is structured as follows: Sec-tion 2 provides a brief overview of related work. Section 3gives details of learning and inference in ACMs. In Section 4,ACMs are extended by adding appearance. The experimentalresults are presented and discussed in Section 5.
II. PREVIOUS WORK
This section reviews previous work on feature-based andmodel-based image matching.
A. Feature-based image matching
Image matching has been a central research topic in com-puter vision over the last few decades. Typical approachesto correspondence involve matching feature points betweenimages. Loweâs SIFT descriptor [18] is one of the state-of-the-art methods to construct geometric invariant features tomatch rigid objects. SIFT and its extensions [24], [19], [25],[6], [23] have been successfully applied to many problems.
Alternatively, the correspondence problem can be formu-lated into a graph matching problem considering each featurepoint in the image as a node in a graph. Leordeanu andHerbert [7] proposed a spectral graph matching optimizationalgorithm using general unary and pairwise constraints. Theybuilt an affinity matrix between pair-wise points, and thecorrespondence is found by thresholding the leading eigen-vectors. The problem of hyper-graph matching has been fur-ther studied [20], [10], considering higher-order interactionsbetween tuples of features beyond pairwise.
Recently, researchers paid more attention to learn theoptimal set of parameters for graph matching. Caetano etal. [11] made use of structural models improving the graphmatching solution. Torresani et al. [21] proposed an energyminimization approach to establish correspondences for non-rigid motion, in which the parameter of the error function arelearned by NIO algorithm [22]. Leordeanu and Herbert [9]introduced a unsupervised learning strategy to learn theweights in spectral matching.
B. Model-based image matching
Model-based methods are able to solve the correspondencein difficult situations because the model encodes prior knowl-edge of the expected shape and appearance of an objectclass. ASMs [13] have been proven an effective method tomodel the shape and appearance of objects. ASMs builda model of shape variation by performing PCA on a setof landmarks (after Procrustes). For each landmark, theMahalanobis distance between the sample and mean textureis used to assess the quality of the fit of a new shape.The fitting process is performed using a local search alongthe normal of the landmark. Later, the new positions areprojected onto rigid and non-rigid basis. These two stepsare alternated until convergence. Further extensions of ASMsinclude 3D Morphable Models [15], AAM [14] and kernelgeneralizations [27] among others. The work that is closest tothe proposed one is the one done by Asthana et al. [16]. Thisapproach [16] utilizes the MPEG-4 based facial animationsystem to generate virtual images having different poses. Theset of virtual images is used as training data for a view-basedAAM, and later locate landmarks in previously untrainedface images.
III. ACTIVE CONDITIONAL SHAPE MODELS (ACSMS)
This section describes the details of Active ConditionalShape Models (ACSMs). Unlike ASMs, ACSMs learn amapping between a neutral shape and shape variations ofthe same individual under different poses and expressions.
A. ACSMs Energy Function
Let FFF i â â3Ăd f (see footnote for notation1) be the ho-mogeneous representation for d f landmarks located in theneutral (or reference) view of the ith subject. Fig. 1.c showstwo examples of labeled face images with 66 landmarksunder different expressions and poses. FFF i has the followingstructure:
FFF i =
ui1 ui2 ... uid f
vi1 vi2 ... vid f
1 1 ... 1
(1)
where (ui j,vi j)T denotes jth landmark coordinates for the ith
subject.ACSMs learn a mapping between the neutral (or reference)
view of the object and the object under different rigid andnon-rigid transformations (see Fig. 1(c)). Let qqqb
i â â2dgĂ1
denote the ith subject under q different configurations, anddg denotes the number of landmarks. i= 1,2, ...,n indexes theperson identity, b = 1,2, ..., p denotes possible deformations(e.g. viewpoint or expression changes) and n is number ofsubjects . qqqb
i has the following structure:
qqqbi =
(xb
i1 ybi1 ... xb
idgyb
idg
)T(2)
where (xbi j,y
bi j), j = 1,2, ...dg(dg †d f ) is the coordinate of
jth point in the bth deformed shape of the ith individual.The ACSM minimizes:
EACSM(AAA,BBB,CCC) =n
âi=1
p
âb=1
â„qqqbi â (
k
âj=1
cbi jBBB j)vec(AAAb
i FFF i)â„22, (3)
where AAAbi â â2Ă3 is an affine matrix that compensates
for geometric changes (e.g. rotation, scale). BBB j â â2dgĂ2d f
is a basis such that a linear combination weighted bythe coefficient cb
i j models non-rigid deformations and 3Drigid transformations (that an affine transformation cannotrecover). Fig. 2 illustrates the deformation process from aneutral face FFF i to faces under different view point or non-rigid deformations due to changes in expressions and pose.
1Bold capital letters denote matrices DDD, bold lower-case letters a columnvector ddd. ddd j represents the jth column of the matrix DDD. All non-bold lettersrepresent scalar variables. di j denotes the scalar in the row i and column jof the matrix DDD and the scalar i-th element of a column vector ddd j . diag is ansperator that transforms a vector to a diagonal matrix or takes the diagonalof the matrix into a vector. 111k â âkĂ1 is a vector of ones. ||ddd||22 denotesthe norm of the vector ddd. ⊠denotes the Hadamard or point-wise product,â denotes the Kronecker product, vec(·) represents the vec operator thatconverts a matrix into a column vector, and mod(x,y) denotes the modulusafter x dividing by y.
1
iA3
iA
k
j
jijc1
1B
k
j
jijc1
2B
k
j
jijc1
3B
iF
2
iA
Fig. 2. Shape deformation process
Equation 3 can be re-written as:
EACSM(UUU ,BBB,CCC) =m
âl=1
â„qqql â (k
âj=1
c jlBBB j)vec(AAAlFFF l)â„22
=m
âl=1
â„qqql â (k
âj=1
c jlBBB j)uuulâ„22,
(4)
where m = np, l is an index that includes all possibleinstances of a subject i and all instances of possible con-figurations b, satisfying i = âl/pâ and b = mod(l, p), anduuul = vec(AAAlFFF l) â â2d f Ă1.
B. Optimization
To learn the ACSMsâ parameters we have to minimizeEq. 4 with respect to the non-rigid transformation matrix BBB,the local coefficients cccl for every subject and every instanceof deformation, and the affine transformation AAAl . We usean alternated least square (ALS) strategy to monotonicallyreduce the error in each step. ALS alternates between opti-mizing for BBB while AAAl and cccl are fixed and optimizing AAAlwhen cccl and BBB are fixed. The minimization scheme is shownin the table of Algorithm 1, where:
QQQ = [qqq1 qqq2 ... qqqm], BBB = [BBB1 BBB2 ... BBBk]
UUU = [111kkk âuuu1 111kkk âuuu2 ... 111kkk âuuum],
CCC =
c11 c12 ... c1mc21 c22 ... c2m... ... ... ...ck1 ck2 ... ckm
(5)
IV. ACTIVE CONDITIONAL APPEARANCE MODELS(ACAMS)
Similar to ACSMs, ACAMs learn the conditional relationbetween a reference (or neutral) appearance and the appear-ance of the same subject under different conditions. Similarto feature-based methods, it uses a reference image.
Algorithm 1 Learning Active Conditional Shape Models
1. Initialize parameters BBB and CCC.
2. For each instance l, minimize Eq. 4 over AAAl for fixedBBB and cccl
AAAâl = (
k
âj=1
c jlBBB j)â1qqqlFFF
Tl (FFF lFFFT
l )â1
(6)
3. For each instance l, minimize Eq. 4 over cccl for fixedAAAl(uuul) and BBB
cccâl = (ZZZT1 ZZZ1)
â1ZZZT1 qqql (7)
where ZZZ1 = BBBPPP, and PPP = diag(111k)âuuul
4. Minimize Eq. 4 over BBB for fixed AAA(UUU) and CCC
BBBâ = QQQZZZT2 (ZZZ2ZZZT
2 )â1 (8)
where ZZZ2 = (CCCâ12d f )âŠUUU .
5. If not converged, return to step 2.
ACAMs are similar to ACSMs but no affine transformationmatrix AAAl is necessary. We use l to index all possibleinstances of a subject i under all possible configurations b. dis the dimension of the appearance features. Let xxxl ââdĂ1 bethe neutral appearance of the lth instance, and yyyl â âdĂ1 bethe deformed appearance of the lth instance, then the energyfunction is:
EACAM(BBB,CCC) =m
âl=1
â„yyyl â (k
âj=1
c jlBBB j)xxxlâ„22 (9)
where m is the number of appearance samples with allpossible deformations. The ALS algorithm to fit ACAMs isillustrated in the table for Algorithm 2.
Algorithm 2 Learning Active Conditional AppearanceModel
1. Initialize parameter BBB.
2. For each instance l, minimize Eq. 9 over cccl for fixedBBB.
cccâl = (ZZZT1 ZZZ1)
â1ZZZT1 YYY l (10)
where ZZZ1 = BBBPPP, YYY = [yyy1 yyy2 ... yyym], andPPP = diag(111k)â xxxl .
3. Minimize Eq. 9 over BBB for fixed CCC.
BBBâ = YYY ZZZT2 (ZZZ2ZZZT
2 )â1 (11)
whereZZZ222 = (CCC â 12d f ) ⊠XXX , and XXX = [111kkk â xxx1 111kkk âxxx2 ...111kkk â xxxm].
4. If not converged, return to step 2.
L60 L30 L15
Frontal R15 R30 R60
Fig. 3. Manual labels for one subject with changes in pose.
Iteration1 2 3 4 5 6 7 8
0.04
0.08
0.12
0.16
0.2
Iteration1 2 3 4 5 6 7 8 9
8
12
16
20
(a) Shape energe during training (b) Appearance energy during training
MS
CA
E
MA
CA
E
Fig. 4. Training error for ACSMs and ACAMs.
V. EXPERIMENTS AND DISCUSSION
We evaluated the proposed ACMs on the problem offacial feature detection across pose and expression usingthe CMU Multi-PIE face dataset [17]. The CMU Multi-PIE database has a total of 337 subjects, with 129 subjectspresent in all four sessions, under 15 different viewpoints,6 expressions and 19 illuminations. 108,566 images underdifferent illuminations with different subject ID, viewpointsand expressions are manually labeled.
Section V-A describes the training procedure for ACMs.Section V-B compares the generalization performance ofACMs against ASMs using the same training data. Section V-C provides results of the fitting error.
A. Training Active Conditional Models
In all experiments, we used 8,240 labeled sample pairs(each pair contains one frontal image and one image underdifferent pose or expression) from the CMU Multi-PIEdatabase. We used 360 pairs for testing and the rest fortraining and cross-validation. As shown in Fig. 3, 66 salientpoints were manually labeled for the pose ranging from â45to 45 degrees, and 38 points in the range from â90 to â45and 45 to 90 degrees.
We used 3,729 sample pairs for training the ACSM(Algorithm 1) and the ACAM (Algorithm 2). Fig. 4 showsthe training error for ACSMs and ACAMs. As expected, theerror decreases monotonically with the number of iterations.Fig. 5 shows the synthesis of different shapes for differentAAAl and cccl parameters. It is worth pointing out that unlikeASMs, a linear ACSM can represent strong changes in posein a continuous manner (see Fig. 5).
In our experiments, the dimension of BBB, i.e. parameterk in ACMs, is determined with 10-fold cross-validation inwhich the 4,150 validation sample pairs are partitioned into
Fig. 5. Synthesized shapes using ACSMs.
10 subsamples. In the cross-validation set the optimal k wasestimated to be k = 21 for ACSM, and k = 6 for AASM. Itis worth pointing out, that to make a fair comparison ASMswere trained with exactly the same data as ACMs (includingthe neutral image). In the case of ASMs, the dimensionof shape and appearance is 30 and 58 respectively, whichpreserves 95% of PCA energy.
B. Generalization of ACMs
This section analyzes the generalization properties ofACSMs and compares it to ASMs. In this section, weassume that the landmarks for the testing images (not trainingsubjects) are known.
We took 49 untrained subjects (360 sample pairs) fortesting and compared the shape reconstruction performanceof ASM [14] and ACMs. That is, given the labeled landmarksin the testing sample l, we computed the shape reconstructionerror as follows:
esl =
1dg
â„gggsl â rrrs
lâ„, (12)
where gggsl be the ground-truth shape vector, and rrrs
l =(âk
j=1 c jlBBB j)vec(AAAlFFF l) is the shape vector in the case ofthe ACSM (after converge). FFF l is the shape vector of lth
frontal sample, BBB is the learned basis, AAAl and cccl are computediteratively using Eq. (6) and Eq. (7) with qqql = gggs
l .Fig. 6 shows the shape reconstruction error for the 360
samples. We display the reconstruction error and the his-togram of the error for the ACSM and ASM. As can beobserved, for most samples the shape error (assuming thelandmarks are known) for ACSMs is lower than for ASMs.
Similarly, to evaluate the generalization of ACAMs tountrained samples, the appearance similarity error ea
p,l for thepth salient point (p = 1,2, ...,dg) in the lth testing sample isdefined as:
eap,l =
EiâN1(p){â„gggal,p,i â rrra
l,p,iâ„}E jâ{N2(p)âN1(p)}{â„ggga
l,p, j â rrral,p, jâ„}
, (13)
where gggal,p,i denotes the ground-truth appearance representa-
tion for the ith neighbor of point p in lth sample, rrral,p,i the
synthesized one, E{·} is the mean operator, and N1(p) and
0 50 100 150 200 250 300 3500
0.1
0.2
0.3
0.4
0.5
0.6
0.25 0.35 0.45 0.55 0.650
20
40
60
Fre
quen
cy
Shape Error
Shape E
rror
(pix
el/shape p
oin
t)
Testing Samples
ASMACM
Fig. 6. Comparison of shape reconstruction error.
0 50 100 150 200 250 300 3500.045
0.05
0.055
0.06
0.065
0.052 0.056 0.06 0.064
10
30
50
70
Appearance Error
Fre
quen
cy
Testing sample l
ASMACM
0 50 100 150 200 250 300 350
0.38
0.42
0.46
0.5
0.54
App
eara
nce s
imila
rity
err
or
Testing sample l
0.44 0.48 0.52 0.560
20
40
60
Appearance Error
Fre
quency
ASMACM A
ppeara
nce s
imila
rity
err
or
Fig. 7. Appearance error. (a) Size of N1(p) equals 5Ă5, and size of N2(p)equals 17Ă17; (b) (a) Size of N1(p) equals 1Ă1, and size of N2(p) equals9Ă9
N2(p) represent small-sized and large-sized neighbors of p.â„ggga â rrraâ„ is the appearance error between the image patchand its reconstruction by the ACAM.
Smaller values eap,l indicates that for each point p, the
appearance difference of its close neighbor is smaller thanthe differences of its distant neighbors, which means thatthe fitting/searching process is more likely to find the rightposition of p. The appearance similarity error for the samplel is the average of the appearance similarity error for eachp, that is denoted by ea
l .Fig. 7 shows two graphics with the error for two different
sizes of N1(p) and N2(p). In the first graphic N1(p) has thesize of 5Ă5, and N2(p) has the size of 17Ă17. In the secondgraphic, N1(p) has the size of 1Ă1, and N2(p) has the sizeof 9Ă 9. It is clear from Fig. 7 that independently of theneighbor size, ACAMs have a smaller appearance similarityerror. This typically translates that the the ACAM is morelikely to have a local minima in the expected position of thelandmark.
C. Fitting Error in ACMs
Previous section has shown the generalization propertiesof ACMs assuming the landmarks in the testing images areknown. This section compares the fitting performance, whenthe landmarks in the testing images are unknown.
The ACMs search for landmarks is similar to the ASMsearch. First, we use a face detector to locate the face, andinitialize the ACM with a mean shape. In each iteration andfor each landmark point, the ACMâs search is done alongthe normal of the shape profile and it selects the locationwith smaller appearance error. The new location is projectedinto the ACSMs. This process is repeated until the difference
1 2 3 4 510
15
20
25
30
35
40
45
Iteration
Shape E
rror
Shape deformation during iteration
Fig. 8. Shape deformation during the search process.
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Fig. 9. Correspondence results with different viewpoints and expressions.
between the current synthesized shape and image points isless than an average of 0.3 pixel/point. We use the samegraylevel representation to build a model for ASMs andACMs.
Fig. 9 shows ACMs matching results across pose andexpression, given the landmarks in the neutral image. Imagesin Fig. 9(a)-(h) are from Multi-PIE, and images in Fig. 9(g)-(h) are images taken with a regular camera in an unstructuredenvironment. As can be seen, ACMs are able to successfullymatch across pose and expression. Fig. 8 illustrates how theshape varies with each iteration when fitting an untrainedimage.
Fig. 10 illustrates the correspondence error between facialfeatures under different viewpoints (from -90 to 90) anddifferent expressions (e.g., smile, disgust). We compared
Corr
esp
on
de
nce e
rro
r
(pix
el/p
oin
t)
SIFT ASM ACM
L90 L60 L30 L15 R15 R30 R60 R90 smile surprise squint disgust scream
Co
rre
sp
on
de
nce
err
or
(pix
el/po
int)
SIFT ASM ACM
(a) Comparison results on different viewpoints (b) Comparison results on different expressions
0
0.5
1
1.5
2
2.5
3
0
1
2
3
4
5
6
Fig. 10. Matching accuracy comparison. (a) images undergoing differentviewpoints; (b) images undergoing different expressions.
three approaches: feature-based correspondence, ASMs andACMs. For feature matching, we used SIFT as the feature toget sparse feature correspondence and interpolate them to getthe landmarks in the expected locations. The correspondenceerror is defined as el =
1dgâ„gggl ârrrlâ„, where subscript l denotes
different deformation instances, such as smile, surprise orviewpoint changes. rrrl is the ground truth landmarks, andgggl is the reconstructed shape (vectorized) from the neutralimage.
Fig. 10 shows a quantitative evaluation of the shapeerror (difference between the ground truth labels and thelandmarks provided by the algorithm). SIFT-based corre-spondence performs badly under strong changes in pose orexpression. As expected, ACMs outperform ASMs becauseit is less prone to overfitting and learns a conditional relationwith a reference image.
VI. CONCLUSION
This paper proposes ACMs, a model that learns condi-tional relations between images of an object under differentconfigurations and a reference image. ACMs learn a discrim-inative bilinear shape and appearance model. Experimentalresults on the CMU Multi-pie face database show that ACMsoutperforms ASMs and feature-based methods (using SIFT)when matching face images undergoing changes in pose andexpression. Further research needs to be addressed to avoidlocal minima in the fitting process.
VII. ACKNOWLEDGMENTS
Dr. De la Torre was partially supported by NationalInstitute of Health Grant R01 MH 051435. Dr. Ying was sup-ported by the China Scholarship Council Grant 2008110195.
REFERENCES
[1] M. F. Demirci, A. Shokoufandeh, Y. Keselman, L. Bretzner and S.Dickinson, âObject Recognition as Many-to-Many Feature Matchingâ,International Journal of Computer Vision, vol. 69, 2006, pp. 203â222.
[2] D. Shen, âFast Image Registration by Hierarchical Soft Correspon-dence Detectionâ, Pattern Recognition, vol. 42, 2009, pp. 954â961.
[3] D. Boley, R. Maier, W. Zhang, Q. Wang and X. Tang, Real TimeFeature Based 3-D Deformable Face Trackingâ, in Proceedings of the10th European Conference on Computer Vision, 2008, pp. 720â732.
[4] S. Agarwal and N. Snavely and I. Simon and S. M. Seitz and R.Szeliski, âBuilding Rome in a Dayâ, in IEEE International Conferenceon Computer Vision, 2009, pp. 72â79.
[5] D. R. Kisku, M. Tistarelli, J. K. Sing and P. Gupta, âFace Recognitionby Fusion of Local and Global Matching Scores using DS Theory:An Evaluation with Uni-classifier and Multi-classifier Paradigmâ, inProceedings IEEE Conf. on Computer Vision and Pattern Recognition,2009, pp. 60â65.
[6] J. M. Morel and G. Yu, âASIFT-A New Framework for Fully AffineInvariant Image Comparisonâ, SIAM Journal on Imaging Sciences,vol. 2, 2009, pp. 438â469.
[7] M. Leordeanu and M. Hebert, âA Spectral Technique for Correspon-dence Problems using Pairwise Constraintsâ, in International Conf. ofComputer Vision, 2009, pp. 1482â1489.
[8] M. Leordeanu, M. Hebert and R. Sukthankar, âBeyond Local Ap-pearance: Category Recognition from Pairwise Interactions of SimpleFeaturesâ, in Proceedings IEEE Conf. on Computer Vision and PatternRecognition, 2007, pp. 1â8.
[9] M. Leordeanu and M. Hebert, âUnsupervised Learning for GraphMatchingâ, in Proceedings IEEE Conf. on Computer Vision andPattern Recognition, 2009, pp. 864â871.
[10] O. Duchennel, F. Bach, I. Kweon and J. Ponce, âA Tensor-BasedAlgorithm for High-Order Graph Matchingâ, in Proceedings IEEEConf. on Computer Vision and Pattern Recognition, 2009, pp. 1980â1987.
[11] T. S. Caetano, J. J. McAuley, L. Cheng, Q. V. Le and A. J. Smola,âLearning Graph Matchingâ, IEEE Transactions on Pattern Analysisand Machine Intelligence, 2009, vol. 31, pp. 1048â1058.
[12] M. Kass, A. Witkin, and D. Terzopoulos, Snakes - Active ContourModels, International Journal of Computer Vision, vol. 1, 1987, pp321-331.
[13] T. F. Cootes, C. J. Taylor, D. H. Cooper and J. Graham, âActive shapemodels: Their Training and Applicationâ, Computer Vision and ImageUnderstanding, vol. 61, 1995, pp. 38â59.
[14] T. F. Cootes, G. J. Edwards and C. J. Taylor, âActive Appearance Mod-elsâ, IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 23, 2001, pp. 681â685.
[15] V. Blanz and T. Vetter, âA Morphable Model for the Synthesis of 3dFacesâ, in Proceedings of the 26th annual conference on Computergraphics and interactive techniques, 1999, pp. 187â194.
[16] A. Asthana, R. Goecke, N. Quadrianto and T. Gedeon, âLearningBased Automatic Face Annotation for Arbitrary Poses and Expressionsfrom Frontal Images Onlyâ, in Proceedings IEEE Conf. on ComputerVision and Pattern Recognition, 2009, pp. 1635â1642.
[17] R. Gross, I. Matthews, J. Cohn, T. Kanade and S. Baker, âGuide tothe CMU Multi-PIE Databaseâ, Carnegie Mellon University, Technicalreport, 2007.
[18] D. G. Lowe, âDistinctive Image Features from Scale-invariant KeyPointsâ, International Journal of Computer Vision, 2004, vol. 60, pp.91â110.
[19] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas,F. Schaffalitzky, T. Kadir and L. Van Gool, âA Comparison of AffineRegion Detectorsâ, International Journal of Computer Vision, 2005,vol. 65, pp. 43â72.
[20] R. Zass and A. Shashua, âProbabilistic Graph and Hypergraph Match-ingâ, in IEEE Conference on Computer Vision and Pattern Recogni-tion, 2008, pp. 1â8.
[21] L. Torresani, V. Kolmogorov and C. Rother, âFeature CorrespondenceVia Graph Matching: Models and Global Optimizationâ, in EuropeanConf. on Computer Vision, 2008, pp. 596â609.
[22] C. K. Liu, A. Hertzmann and Z. Popovic, âLearning Physics-BasedMotion Style with Nonlinear Inverse Optimizationâ, ACM Transac-tions on Graph, 2005, vol. 24, pp. 1071â1081.
[23] E. Tola, V. Lepetit and P. Fua, âA Fast Local Descriptor for DenseMatchingâ, in Proceedings of the IEEE Conf. on Computer Vision andPattern Recognition, 2008, pp. 1â8.
[24] Y. Ke and R. Sukthankar, âPCA-SIFT: A More Distinctive Represen-tation for Local Image Descriptorsâ, in Proceedings of the IEEE Conf.on Computer Vision and Pattern Recognition, 2004, pp. 506â513.
[25] H. Bay, T. Tuytelaars and L. V. Gool, âSURF: Speeded Up RobustFeaturesâ, in European Conference on Computer Vision, 2006, pp.404â417.
[26] T. Cour, P. Srinivasan and J. Shi, âBalanced Graph Matchingâ,in Advances in Neural Information Processing Systems, Vancouver,British Columbia, Canada, 2006, pp. 313â320.
[27] F. De la Torre and M. H. Nguyen, âParameterized Kernel PrincipalComponent Analysis: Theory and Applications to Supervised andUnsupervised Image Alignmentâ, IEEE Conference on ComputerVision and Pattern Recognition (CVPR), 2008.