hr-kinect - high-resolution dynamic 3d scanning for facial ...s0200995/paper.pdf · defects in the...

7
HR-Kinect - high-resolution dynamic 3D scanning for facial expression analysis Tuur Stuyck Dirk Vandermeulen* Dirk Smeets* Peter Claes* Abstract—Expression classification is a much investigated research topic. This work develops a framework to statistically analyse spatio-temporal patterns to quantify the severity of functional defects. The aim is to assess the pre-surgery situation and post-surgery improvements by pinpointing abnormalities in the movement of facial geometry in time and space. A fully automated facial expression analysis pipeline is presented. The system is trained on the publicly available BU-4DFE dynamic facial expression database. The models are validated using a self-recorded sequence of 11 people performing abnormal smiling expressions. The main contribution of this work is the introduction of time normalisation. This normalisation imposes that all 4D sequences start and end with the neutral pose. In addition, the procedure makes all sequences of equal length and uniform in time. Ensuing, all the expressions are in the same state of the expression at a certain point in time. Application of the additional normalisation procedure makes the overall algorithm more robust. As a result, the input data is not as restricted compared to state-of-the-art methods. The presented algorithm performs a correct classification of 93.06% on the self-recorded data. This is slightly less than the state-of-the-art procedure [6] which obtains a 96.71% recognition rate on the BU-4DFE database when classifying three expressions. The proposed method however poses far less restrictions on the input data and is as such more generally applicable. The proposed time normalisation procedure can be integrated with state-of-the-art methods to further increase accuracy and to obtain a method superior to known algorithms. The proposed time normalisation procedure is not fail proof and input data is still restricted by some constraints. However, these are far less stringent than those required in [6]. Further research is in order. Index Terms—facial expression analysis, 3D, 4D, time normalisation, facial anomaly detection, facial expression classification, 3D eigenfaces I. I NTRODUCTION The goal of this work is to develop a framework to statistically analyse spatio-temporal patterns to quantify the severity of functional defects when performing facial expressions. The aim is to construct a framework, able to assess the pre-surgery situation and post-surgery * [email protected], [email protected], [email protected], [email protected], K.U. Leuven, Faculty of Engineering, ESAT/PSI, IBBT-K.U. Leuven Future Health Department Medical Imaging Research Center, University Hospitals Gasthuisberg, Herestraat 49-bus 7003, B-3000 Leuven, Belgium improvements by localising potential abnormalities in the movement of facial topology. Applications of such an algorithm can be found in maxillo-facial surgery. Oral and maxillo-facial surgery is the domain of medicine that involves treating diseases, injuries and defects in the head, neck, face and jaws. Facial expression recognition will act as a very important component to truly obtain effective human- computer intelligent interaction (HCII). Such intelligent designs can provide services adapted to the current mood of its users. In addition, knowledge of the expression is an important advantage in automatic person recognition which has important uses in security systems. This research goes beyond the domain of facial expression recognition which has already been exhaustively investigated. Up until recently, research on facial expression recognition was limited to the 2-dimensional domain. One of the reasons was a lack of data. Thanks to the introduction of the BU- 3DFE (containing 3D static data) and the BU-4DFE (containing 3D dynamic data) databases developed at Binghamton university, research on this topic has grown significantly [27]. The main advantage of 3D data over 2D data is the invariance to pose and illumination changes. The invariance to pose is obtained by application of a superimposition routine. Recognising human emotions from 3D imagery consists of several processing steps. These different steps attempt to detect and localise informative features in the human face to train a model. Most research in the field has been limited to discrete 3D expression recognition. Analysis on the dynamics of facial expressions provided the insight that human emotions are highly dynamical processes in all four dimensions. Incorporating the temporal aspect in the analysis will most likely be crucial to improving recognition performance. This work focusses on the happy expression for detection of anomalies. Extension to other expressions is straightforward and should pose no extra difficulties. A model is trained to recognise normally performed expressions. For each expression, a different model is built. Ensuing the scanning of subjects, the facial

Upload: others

Post on 31-Dec-2019

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HR-Kinect - high-resolution dynamic 3D scanning for facial ...s0200995/paper.pdf · defects in the head, neck, face and jaws. Facial expression recognition will act as a very important

HR-Kinect - high-resolution dynamic 3Dscanning for facial expression analysis

Tuur Stuyck Dirk Vandermeulen* Dirk Smeets* Peter Claes*

Abstract—Expression classification is a muchinvestigated research topic. This work develops aframework to statistically analyse spatio-temporal patternsto quantify the severity of functional defects. The aimis to assess the pre-surgery situation and post-surgeryimprovements by pinpointing abnormalities in themovement of facial geometry in time and space. A fullyautomated facial expression analysis pipeline is presented.The system is trained on the publicly available BU-4DFEdynamic facial expression database. The models arevalidated using a self-recorded sequence of 11 peopleperforming abnormal smiling expressions.

The main contribution of this work is the introduction oftime normalisation. This normalisation imposes that all 4Dsequences start and end with the neutral pose. In addition,the procedure makes all sequences of equal length anduniform in time. Ensuing, all the expressions are in thesame state of the expression at a certain point in time.Application of the additional normalisation proceduremakes the overall algorithm more robust. As a result, theinput data is not as restricted compared to state-of-the-artmethods.

The presented algorithm performs a correct classificationof 93.06% on the self-recorded data. This is slightly lessthan the state-of-the-art procedure [6] which obtainsa 96.71% recognition rate on the BU-4DFE databasewhen classifying three expressions. The proposed methodhowever poses far less restrictions on the input dataand is as such more generally applicable. The proposedtime normalisation procedure can be integrated withstate-of-the-art methods to further increase accuracy andto obtain a method superior to known algorithms. Theproposed time normalisation procedure is not fail proofand input data is still restricted by some constraints.However, these are far less stringent than those requiredin [6]. Further research is in order.

Index Terms—facial expression analysis, 3D, 4D, timenormalisation, facial anomaly detection, facial expressionclassification, 3D eigenfaces

I. INTRODUCTION

The goal of this work is to develop a framework tostatistically analyse spatio-temporal patterns to quantifythe severity of functional defects when performingfacial expressions. The aim is to construct a framework,able to assess the pre-surgery situation and post-surgery

* [email protected],[email protected], [email protected],[email protected], K.U. Leuven, Faculty of Engineering,ESAT/PSI, IBBT-K.U. Leuven Future Health Department MedicalImaging Research Center, University Hospitals Gasthuisberg,Herestraat 49-bus 7003, B-3000 Leuven, Belgium

improvements by localising potential abnormalitiesin the movement of facial topology. Applications ofsuch an algorithm can be found in maxillo-facialsurgery. Oral and maxillo-facial surgery is the domainof medicine that involves treating diseases, injuries anddefects in the head, neck, face and jaws.

Facial expression recognition will act as a veryimportant component to truly obtain effective human-computer intelligent interaction (HCII). Such intelligentdesigns can provide services adapted to the currentmood of its users. In addition, knowledge of theexpression is an important advantage in automaticperson recognition which has important uses in securitysystems.

This research goes beyond the domain of facialexpression recognition which has already beenexhaustively investigated. Up until recently, researchon facial expression recognition was limited to the2-dimensional domain. One of the reasons was alack of data. Thanks to the introduction of the BU-3DFE (containing 3D static data) and the BU-4DFE(containing 3D dynamic data) databases developedat Binghamton university, research on this topic hasgrown significantly [27]. The main advantage of3D data over 2D data is the invariance to pose andillumination changes. The invariance to pose is obtainedby application of a superimposition routine.

Recognising human emotions from 3D imageryconsists of several processing steps. These differentsteps attempt to detect and localise informativefeatures in the human face to train a model. Mostresearch in the field has been limited to discrete 3Dexpression recognition. Analysis on the dynamics offacial expressions provided the insight that humanemotions are highly dynamical processes in all fourdimensions. Incorporating the temporal aspect in theanalysis will most likely be crucial to improvingrecognition performance. This work focusses on thehappy expression for detection of anomalies. Extensionto other expressions is straightforward and should poseno extra difficulties.

A model is trained to recognise normally performedexpressions. For each expression, a different modelis built. Ensuing the scanning of subjects, the facial

Page 2: HR-Kinect - high-resolution dynamic 3D scanning for facial ...s0200995/paper.pdf · defects in the head, neck, face and jaws. Facial expression recognition will act as a very important

geometry needs to undergo a number of preprocessingsteps to obtain a signal-to-noise-ratio adequate to traina model. For meaningful comparisons, facial geometryshould be located, orientated and scaled to somenormalised setting. Superimposition ascertains that onlytrue differences in geometry are modelled.

Working with 4D data poses the additional problem ofnormalising sequences in time. This is an importantissue that has not been addressed in other research.Time normalisation imposes that all 4D sequences startand end with the neutral pose. In addition, the proceduremakes all sequences of equal length and uniform intime. As a result, all the expressions are in the samestate of the expression at a certain point in time. Thedeviations in vertex positions from the neutral pose aretaken as feature vectors to train the model.

II. RELATED WORK

The proposed method is most similar to the approachproposed by Kakadiaris et al. [6] which is called theadvanced annotated face model approach. A predefinedface model is mapped onto a database model torepresent all facial geometry with uniform topology.The facial expression analysis can then be performed ona point mesh with known dense correspondences. Thetime dimension is taken into account by working withthe differences between consecutive frames, a so calledflow map [6]. These flow maps are analysed usingstatistical methods, e.g. Support Vector Machines orHidden Markov Models. Fang solves this problem withthe help of an adaptation of local binary patterns onthree orthogonal planes (LBP-TOP) [6]. Local binarypatterns (LBP) [2] have been used extensively in 2Dfacial expression recognition because of its effectivenessand ease of computation. The 3D variant LBP-TOPdeveloped by [3] accounts for temporal evolutions inlocal image patterns by not only projecting on thexy−plane but also on the xt- and yt-plane. Le et al.use these patterns for the training of a support vectormachine with radial basis functions for the classificationof expressions [5].

The proposed framework is compared with thestate-of-the-art expression analysis technique proposedby Kakadiaris et al. [6]. Their goal was to build aclassifier to seclude six different expressions. Theyachieve an average recognition rate of 75.82% whentested on all six expressions from the BU-4DFEdatabase. When performing classification on a subset ofthree expressions at a time, better results are obtained.A performance of 96.71% is attained. When classifyingthe angry, happy and surprise expression an averagerecognition of 95.75% is obtained.

III. METHODS

A model is constructed to describe regularly performedexpressions. Ensuing the 4D scanning, the facial

geometry needs to undergo a number of preprocessingsteps to obtain a signal-to-noise-ratio adequate to traina model. Every face is mapped onto a template faceto make sure that all the geometry is represented bythe same topology. This procedure is known as robuststatistical surface registration [14]. Subsequently, aprocedure is performed to centre, scale and orientthe geometry according to a standardised setting suchthat only true differences in geometry are modelled.This preprocessing step is known as robust procrustessuperimposition [15]. Besides normalisation in space,the sequences are subjected to time normalisation.Once done, the deformations for each vertex in thefacial geometry is calculated throughout the sequence.These deformations obtained from normally performedexpressions are used to train the model.

The models are trained based on normal expressionsobtained from the BU-4DFE database. To validatethe algorithm, a self-recorded database is used. Facesare scanned using the dimensional imaging dynamicfacial capture system [26] made available by Medicim,a division of Nobel Biocare. The database containsmultiple recordings of 11 people performing abnormalsmiling expressions. Deformations that are not inaccordance with normal expressions are found asdeviations from a reference model at each point intime. To make a quantitative evaluation of the classifier,the data is manually labelled and a receiver operatingcharacteristic is constructed.

A. Robust statistical surface registration

Before any meaningful analysis can be performed on thedata, several preprocessing and normalisation steps areneeded. Robust statistical surface registration is requiredto ensure that all facial scans are represented by the sametopology. As a result, every mesh has the same vertexinterconnectivity. The mapping procedure does not alterthe shape of the scans. By doing so, a dense meshcorrespondence is made available. The 3D meshes aremapped onto an anthropometric mask using the surfaceregistration framework introduced in [14] which uses acombination of both rigid and non-rigid transformationscomputed on manually indicated landmarks.

B. Normalisation

1) Robust procrustes superimposition: Superimposi-tion finds a transformation T that maps one set oflandmarks onto another set [15]. This transformationensures that all models are centred at the same locationwith the same orientation and scale. A landmark con-figuration XB is mapped onto a configuration XA withtransformation matrix T. E is the displacement whichrepresents the true difference in shape. The transforma-tion matrix T encodes a translation, rotation and scalingas is illustrated in Figure 1. The procedure removes allnon-shape information according to

XA = TXB + E

Page 3: HR-Kinect - high-resolution dynamic 3D scanning for facial ...s0200995/paper.pdf · defects in the head, neck, face and jaws. Facial expression recognition will act as a very important

Fig. 1. Illustration of the robust superimposition algorithm.

Fig. 2. An example of 2 sequences before normalisation. The red linedenotes the original expression intensity in time and the green curveshows the expression intensity after time normalisation.

2) Time normalisation: Time normalisation isneeded when working with 4D data to create auniform representation of every expression sequence.There are three main issues with the data. (i) Not allsequences start and end with the neutral pose. (ii)Most sequences are of different length. (iii) Expressionare performed at various speeds. The result of timenormalisation is a sequence consisting of a fixed numberof frames where every sequence starts and ends withthe neutral pose. Moreover, all the expressions are inthe same state of an expression at a certain point in time.

Figure 2 demonstrates the reason for time normalisation.Red corresponds to the sequence before normalisationand green corresponds to the sequence after timenormalisation. Without normalisation, comparisonsbetween sequences at a certain point in time wouldbe meaningless. This is demonstrated by the dottedblue lines and the original expression intensities shownon the red curves. The expression shown in the upperpart goes from neutral to maximum expression afterwhich the sequence ends. The lower part of the imageshows a sequence that starts and ends with a neutralpose. The goal of time normalisation is to obtainexpression intensities like those shown in the greencurves. Comparisons at a certain point in time (bluedotted lines) are now meaningful.

The working of the procedure can be followed in Figure3. The faces are decomposed using principal componentanalysis (PCA). Each coloured dot corresponds to the

Fig. 3. Examples of a sequence projected in the PCA subspace. Only2 dimensions are shown here.

projection of a face model in the PCA space. Outliersin the first three dimensions are removed for robustnessusing the Thompson Tau method [22]. The procedurefinds the neutral pose in the sequence and uses it asstarting and ending points for a linear interpolatingcurve.

Figure 3 shows an example of a sequence withmultiple expressions. The colour of the dots correspondto the time in the sequence. Colours are linearlyinterpolated using green as starting colour and redrepresents the final time.

Time remapping is done separately for each sequenceinstead of on all data to remove the need to separatevariations resulting from identity to those resulting fromfacial movements. The identity is inherently presentin every data point in the sequence and no furtherprocessing is required. This would otherwise be arather difficult task. In addition, by doing the procedureseparately for each sequence we make sure that theimportant variations for a specific individual are keptwhen reconstructing the faces.

The data is projected onto a 28-dimensional space.The number of dimensions equals the shortest lengthof the available database sequences. Every face isreconstructed as a linear combination of 28 differentvectors (principal components) representing thevariations in the expression. By only retaining a limitednumber of principal components for the reconstructionof the faces, a noise reduction on the data is performed.Noise mostly resides in the principal componentsassociated with smaller eigenvalues [23].

Neutral pose detection

Ideally, the points in the PCA subspace lie on a closed

Page 4: HR-Kinect - high-resolution dynamic 3D scanning for facial ...s0200995/paper.pdf · defects in the head, neck, face and jaws. Facial expression recognition will act as a very important

curve with the first and last few points representinga neutral pose. Some assumptions are made on thedata. (i) The neutral pose is present in the sequence,either at the start or the end. (ii) Most of the facesin the sequence are non-neutral. (iii) Expressions areperformed at normal speeds.

The main idea of this algorithm is to find theneutral pose to use this as beginning and ending of anew sequence. The first assumption makes sure thatthe neutral pose is present in the sequence so that itcan be found. The second assumption is related to howthe algorithm works. The neutral pose is found as thepoint with the maximal distance to the mean of all thepoints in PCA space. When most of the faces are in aneutral pose, the mean of all the points will be closer tothe neutral pose instead of points corresponding to anexpression. Mathematically, the neutral pose is foundas the point with the maximum Euclidean distance tothe mean of the sequence. When only considering thefirst point xb and the last point in the sequence xe aspotential candidates we obtain the expression

xn = arg maxxk∈{xe,xb}

1

N

N∑i=1

‖xi − xk‖22

where xn is selected as the neutral pose. The third andfinal assumption is needed to make the algorithm morerobust. The onset and offset of expressions should beperformed at regular speeds. For example, going fromneutral to expression very swiftly and returning to theneutral pose very slowly will result in an incorrect timenormalisation. The problem lies in the fact that, in thisspecial case, the sequence is not uniformly sampledwhen using equidistant points in time.

Interpolation

Interpolation is done by linearly connecting thedata points. This is a simple but effective way toobtain a closed curve with minimal alteration to thescanned data. The number of frames should be chosento be sufficiently high to capture all fast variationsin facial movement and sufficiently low to not haveto store redundant information. More frames in asequence would result in more points being sampled onthe expression curve in the PCA space which wouldincrease the frequency resolution. The drawback is thatmore storage and processing time is needed.

Reconstruction

The faces are reconstructed by a linear combination ofthe D first principal components ei with coefficientsci. The coefficients corresponds to the samples in PCAspace, obtained from time normalisation.

f ∼ f0 +D∑i=1

ciei

C. Statistical model

A probability density function (pdf) describing a normalsmile is constructed. This allows the evaluation of newdata to obtain the probability of it being a correctlyperformed smile. A Gaussian Mixture Model (GMM)and a Kernel Density Estimator (KDE) are used to modelthe pdf. The pdf is estimated using the projection ofthe faces in the principal component space constructedwith all available data. This PCA space is different fromthe intra-subject PCA space used in time normalisationwhere PCA is performed for each sequence separately.

D. Localisation in space

For each frame, a standardised face is constructed. Otherfaces are compared to this template sequence to detectwhether and where in the face possible irregularitiesare situated. The sequence is constructed with all thetraining data. For each frame, the average value for eachdimension of the PCA coefficients is considered normal.To provide robustness to outliers, the median of the facesin PCA space is used instead of the mean. From thesePCA coefficients, a mesh is reconstructed in 3D space.Given the dense mesh correspondence, the distance foreach vertex to the corresponding vertex in the normalisedface can be computed using z-scores. Given a stochasticvariable X with expected value µ and standard deviationσ, the z-score is computed as

Z =X − µσ

IV. RESULTS

The argumentation can be clarified based on videos onthe following website http://www.student.kuleuven.be/∼s0200995/. The videos display the 3D mesh next tothe visualisation of the deviations and the probabilitiesfor the entire sequence.

A. Validation time normalisation

Figure 4 shows a time normalised sequence in PCAspace. The coloured dots correspond to the original faces.The initial faces (green) are neutral and the sequenceevolves towards a smile. However, the original sequencedoes not return to a neutral pose. The time normalisationprocedure constructs a closed curve and samples thiscurve with equidistant points. The normalised sequencedisplays the correct order of expressions.

B. Validation classification

Figure 5 shows a plot of the recognition rate versus thethreshold value with the Gaussian mixture model shownin green and the kernel density estimator displayed inblue. The recognition rate is defined as the percentageof correctly classified data. The GMM produces betterrecognition rates when compared to the KDE. A correctclassification of 93.063% of the test data is obtainedwith the GMM and an optimal threshold value of 0.771.The KDE achieves a classification rate of 88.93%when used with an optimal threshold value of 0.878.The left side of Figure 6 shows the deviation to a

Page 5: HR-Kinect - high-resolution dynamic 3D scanning for facial ...s0200995/paper.pdf · defects in the head, neck, face and jaws. Facial expression recognition will act as a very important

Fig. 4. Visualisation of the first two dimensions of expression curvesin the PCA space.

0 0.2 0.4 0.6 0.8 10.2

0.4

0.6

0.8

1

Threshold

Rec

ogni

tion

rate

Fig. 5. Plot of the recognition rate of the Gaussian mixture model(green) and the kernel density estimator (blue) versus threshold value.

reference face performing a normal expression. The sizeof the deviation is shown by its colour. Red indicatesa large deformation. The right side of the figure showsa photograph taken during the scanning. The top rowshows the situation where a smile is performed whilstpushing the tongue to the cheek. This produces anoticeable bulge on the side of his face. The algorithmis able to accurately pinpoint the problem area. In anormal smile, such a lump would not occur. The cheekarea around this bulge is also affected as can be seenby the light blue colour surrounding the red highlight.The colours are computed using z-scores.

A qualitative evaluation can be made by discussing theprobability curve shown in Figure 7. This sequencecorresponds to the one shown in Figure 3. At first, alow chance is assigned. The face displays a neutralexpression during the first few frames. After sometime, the probability peaks and remains at a fairly highprobability. This occurs roughly between frames 15and 25. The scan displays a natural smiling expression.After frame 25, a sudden drop in the probability is

−1 −0.5 0 0.5 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Fig. 6. Visualization of the distance shown with colour on a pervertex basis to the standardized face. Red means a large distance andblue a small distance. Left shows a top view of the face, right showsthe image of the scanned person.

noticeable. This corresponds to the transition from anormal smile to an incorrect smile. The abnormal smileof frame 30 is shown. At frame 60 the face goes backto performing a normal smile as can be seen in thegeometry and the probability.

V. DISCUSSION

The proposed procedure achieves a recognition rateof 93.06% when classifying smiles extracted from aself-recorded database. This is about the same efficiencyas the methods proposed by Kakadiaris et al [6]. Inthe paper they explicitly state that they omitted data.They assumed that a sequence starts and ends with aneutral pose and goes to the expression in between.They manually removed sequences that do not satisfythis assumption from the database. Additionally, theyremoved data with discontinuities. Our method is a fullyautomated facial expression analysis pipeline wheresuch an assumption is not so stringent. Our methodsimply assume that the neutral expression is performedsomewhere in the sequence. No manual selection ismade of the data. The proposed time normalisationprocedure can be integrated with state-of-the-artmethods to further increase accuracy and to obtain amethod superior to known algorithms. The proposedtime normalisation procedure is not fail proof and inputdata is still restricted by some constraints. However,these are far less stringent than those required in [6].Further research is in order.

Future work

The time normalisation does not necessarily guaranteeperfect normalisation and some constraints are put onthe input data. The constraints that requires that most ofthe sequence is non-neutral can be checked and ensuredautomatically. Long parts of neutral expressions willshow up as slowly varying points in the PCA space ateither the beginning or the ending of the sequence orboth. This can easily be checked by a simple test andthe sequences can be trimmed automatically so thatthis constraint can be dropped. In addition, the timenormalisation procedure can be made more robust tooutliers by constructing a least squares interpolating

Page 6: HR-Kinect - high-resolution dynamic 3D scanning for facial ...s0200995/paper.pdf · defects in the head, neck, face and jaws. Facial expression recognition will act as a very important

[15] [20]

[25] [30]

[60] [65]

Fig. 7. Classification of a smiling expression interrupted by a non-smiling expression. The corresponding meshes of frame 15, 20, 25,30, 60 and 65 of the sequence are shown.

spline instead of a linear connection of the samples.The PCA in the time normalisation procedure can bedone on all the faces at once and a statistic about theneutral pose can be constructed. This would result in amuch robuster method where the neutral pose can bereconstructed even when it is not present in the originalscan. We have taken a more pragmatic approach andimposed some restriction on the scanned expressionsequences.

VI. CONCLUSION

The aim was to construct a framework, able to assessthe pre-surgery situation and post-surgery improvementsby pinpointing the abnormalities of the movement offacial topology in time and space. Applications of suchan algorithm can be found in maxillo-facial surgery.A fully automated facial expression analysis pipelineis presented. The system is trained on the publiclyavailable BU-4DFE dynamic facial expression database.The models are validated using a self-recorded sequenceof 11 people performing abnormal happy expressions.

The main contribution of this work lies in theintroduction of the concepts of time normalisation.Application of the extra normalisation procedure makesthe overall algorithm more robust. As a result, the inputdata is not as restricted compared to state-of-the-artmethods. A Gaussian mixture model and a kerneldensity estimator are fitted to the happy expression.However, other more specific state-of-the-art algorithmscan be used to obtain better results. The contributionof to the field opens a variety of doors for furtherresearch. Time normalisation removes the issues relatedto temporal alignment of sequences. As a result, regularpattern recognition techniques can be used for everyframe in the normalised sequence without any furtherconsideration of the time aspect of the data.

The algorithm is capable of detecting with greataccuracy when anomalies occur as well as where theyare situated in the face. The constructed algorithmobtains a correct classification of 93.06% on theself-recorded data. This is slightly less than thestate-of-the-art procedure [6] which obtains a 96.71%recognition rate on the BU-4DFE database whenclassifying three expressions. The proposed methodhowever poses far less restrictions on the input dataand is as such more generally applicable. The timenormalisation procedure can be integrated with state-of-the-art methods to further increase accuracy and toobtain a method superior to known algorithms. Thetime normalisation procedure is not fail proof and thedata is still restricted by some constraints. However,far less stringent than in [6]. The procedure has roomfor improvements but the technique displays a largepotential and proves to be a necessity in 4D analysis.Further research on this topic is certainly required.

REFERENCES

[1] I. Mpiperis, S. Malassiotis and M. Strintzis, Bilinear elastically de-formable models with application to 3D face and facial expressionrecognition, FG, 2008, 1-8

[2] T. Ojala, M. Pietikainen, and D. Harwood. A comparative studyof texture measures with classification based on featured distribu-tions. Pattern Recognition, 29(1):51-59, January 1996. 4

[3] G. Zhao and M. Pietikainen. Dynamic texture recognition usinglocal binary patterns with an application to facial expressions.IEEE Transactions on Pattern Analysis and Machine Intelligence,29(6):915-928, June 2007. 1, 4

Page 7: HR-Kinect - high-resolution dynamic 3D scanning for facial ...s0200995/paper.pdf · defects in the head, neck, face and jaws. Facial expression recognition will act as a very important

[4] J.Wang, L. Yin, X. Wei and Y. Sun, 3D facial expressionrecognition based on primitive surface feature distribution, IEEEconference on computer vision and pattern recognition, 2006,1399-1406

[5] V. Le, H. Tang, T. S. Young, Expression recognition from 3D dy-namic faces using robust Spatio-temporal shape features, Beckmaninstitute for Advances Science and Technology, 2012

[6] T. Fang, X. Zhao, O. Ocegueda, S. K. Shah, I. A. Kakadiaris,3D/4D facial expression analysis: An advanced annotated facemodel approach, 2012, Image and Vision Computing, 30 (2012),738–749

[7] S. Berretti, A. del Bimbo, P. Pala, Real-time Expression Recogni-tion from Dynamic Sequences of 3D Facial Scans, EurographicsWorkshop on 3D Object Retrieval, 2012, 85–92

[8] H. Drira, B. Ben Amor, M. Daoudi, A. Srivastava, S. Berretti, 3DDynamic Expression Recognition based on a Novel DeformationVector Field and Random Forest, 21st International Conference onPattern Recognition, 2012

[9] G. Sandbach, S. Zafeiriou, M. Pantic, L. Yin, Static and dynamic3D facial expression recognition: A comprehensive survey, Imageand Vision Computing, 30 (2012), 683–697

[10] G. Sandbach, S. Zafeiriou, M. Pantic, D. Rueckert, Recognitionof 3D facial expression dynamics, Image and Vision Computing,30 (2012), 762–773

[11] J. Orozco, F.A. Garcıa, J.L. Arcos, J. Gonzalez, Spatio-TemporalReasoning for Reliable Facial Expression Interpretation, 5th Inter-national Conference on Computer Vision Systems, 2007

[12] Mika, Sebastian and Scholkopf, Bernhard and Smola, Alex andMuller, Klaus-Robert and Scholz, Matthias and Ratsch, Gunnar,Kernel PCA and de-noising in feature spaces, Advances in neuralinformation processing systems, 11, 1, 536–542, 1999

[13] A.R. Teixeira, A.M. Tome, K. Stadlthanner, E.W. Lang, KPCAdenoising and the pre-image problem revisited, Digital signalprocessing, Volume 18, Issue 4, July 2008, 568-580

[14] P. Claes, P. Suetens (sup.), D. Vandermeulen (sup.), A robuststatistical surface registration framework using implicit functionrepresentations - application in craniofacial reconstruction, 2007

[15] K. Daniels, P. Suetens (sup.), D. Vandermeulen (sup.), W. Vande Voorde (sup.), Modern Techniques for Craniofacial Surgery:Robust Procrustes Analysis, 2012

[16] H. Soyel, H. Demirel, Facial expression recognition using 3Dfacial feature distances, Image analysis and recognition, (2007),831-838

[17] H. Tang, T. Huang, 3D facial expression recognition basedon automatically selected features, Computer vision and patternrecognition workshops, 2008, CVPRW08, IEEE computer societyconference on, IEEE, 2008, 1-8

[18] M. Botch, O. Sorkine, On linear variational surface deformationmethods, IEEE transactions on visualization and computer graph-ics, 14:213-230, January 2008

[19] B. Amberg, R. Knothe, T. Vetter, Expression Invariant 3D FaceRecognition with a Morphable Model, Automatic Face & GestureRecognition, 2008. FG’08. 8th IEEE International Conference on,1–6, 2008, IEEE

[20] D. Smeets, P. Claes, J. Hermans, D. Vandermeulen, P. Suetens,A Comparative Study of 3-D Face Recognition Under ExpressionVariations, IEEE transactions on systems, man, and cubernetics -part c: applications and reviews

[21] L. Gralewski, N. Campbell, I. Penton-Voak, Using a TensorFramework for the Analysis of Facial Dynamics, Automatic Faceand Gesture Recognition, 2006. FGR 2006. 7th InternationalConference on, 217–222, 2006, IEEE

[22] R. Thompson, A note on restricted maximum likelihood estima-tion with an alternative outlier model, Journal of the Royal Sta-tistical Society. Series B (Methodological), 53–55, 1985, JSTOR

[23] L. Van Gool, G. Szekely, V. Ferrari, Computer Vision, 2011[24] Gray and Moore, Very Fast Multivariate Kernel Density Estima-

tion using via Computational Geometry, in Proceedings, Joint Stat.Meeting 2003

[25] Duda, Richard O and Hart, Peter E and Stork, David G, Patternclassification, Wiley-interscience, 2012

[26] Inc. Di3D. http://www.di3d.com[27] L. Yin, X. Chen, Y. Sun, T. Worm, and M. Reale, A High-

Resolution 3D Dynamic Facial Expression Database, Departmentof Computer Science State University of New York at Binghamton

[28] W. Hardle, M. Muller, S. Sperlich, A. Werwatz, Nonparametricand semiparametric models, Springer, 2000

[29] M. Turk and A. Pentland, ”Eigenfaces for Recognition”, Journalof Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991

[30] Kernel density estimator toolbox for MATLAB developed by A.Ihler, available at http://www.ics.uci.edu/∼ihler/code/kde.html

[31] Dimension reduction toolbox developed by L. van der Maaten,available at http://homepage.tudelft.nl/19j49/Matlab Toolbox forDimensionality Reduction.html

[32] Y. Sun and L. Yin, Facial expression recognition based on3D dynamical range model sequences, In Proc. 10th Europeanconference on computer vision, pages 58-71, Marseille, France,October 12-18 2008

[33] G. Sandbach, S. Zafeiriou, M. Pantic, and D. Rueckert. Adynamic approach to the recognition of 3D facial expressions andtheir temporal models, In Proc 9th IEEE International conferenceon automatic face and gesture recognition, Santa Barbara, CA,March 21-25 2011

[34] V. Le, H. Tang, and T. Huang. Expression recognition from 3Ddynamic faces using robust spatio-temporal shape features, In Proc9th IEEE International conference on automatic face and gesturerecognition, Santa Barbara, CA, March 21-25 2011