a facial movement synergies and action unit detection from

11
JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 1 Facial movement synergies and Action Unit detection from distal wearable Electromyography and Computer Vision Monica Perusqu´ ıa-Hern ´ andez, Member, IEEE, Felix Dollack, Member, IEEE, Chun Kwang Tan, Shushi Namba, Saho Ayabe-Kanamura, and Kenji Suzuki, Member, IEEE Abstract—Distal facial Electromyography (EMG) can be used to detect smiles and frowns with reasonable accuracy. It capitalizes on volume conduction to detect relevant muscle activity, even when the electrodes are not placed directly on the source muscle. The main advantage of this method is to prevent occlusion and obstruction of the facial expression production, whilst allowing EMG measurements. However, measuring EMG distally entails that the exact source of the facial movement is unknown. We propose a novel method to estimate specific Facial Action Units (AUs) from distal facial EMG and Computer Vision (CV). This method is based on Independent Component Analysis (ICA), Non-Negative Matrix Factorization (NNMF), and sorting of the resulting components to determine which is the most likely to correspond to each CV-labeled action unit (AU). Performance on the detection of AU06 (Orbicularis Oculi) and AU12 (Zygomaticus Major) was estimated by calculating the agreement with Human Coders. The results of our proposed algorithm showed an accuracy of 81% and a Cohen’s Kappa of 0.49 for AU6; and accuracy of 82% and a Cohen’s Kappa of 0.53 for AU12. This demonstrates the potential of distal EMG to detect individual facial movements. Using this multimodal method, several AU synergies were identified. We quantified the co-occurrence and timing of AU6 and AU12 in posed and spontaneous smiles using the human-coded labels, and for comparison, using the continuous CV-labels. The co-occurrence analysis was also performed on the EMG-based labels to uncover the relationship between muscle synergies and the kinematics of visible facial movement. Index Terms—Electromyography, computer vision, smiles, FACS, posed and spontaneous smiles. Fig. 1. Wearable used to measure distal EMG from four channels placed on the sides of the face, on both temples of the head. This configuration enables facial expression identification without obstructing the face. However, this makes identifying which muscle produced the measured activity challenging. 1 I NTRODUCTION F ACIAL expressions often co-occur with affective expe- riences. Smiles are among the most ubiquitous facial expressions. They are characterized by the corner of the lips moving upwards, as a resulting action of the activity of the Zygomaticus Major muscle (ZM). Although smiles have been deemed the prototypical expression of happiness, not all people who smile are happy. The so-called Duchenne marker, or movement from the Orbicularis Oculi muscle M.P-H. and F.D. are with NTT Communication Science Laboratories E-mail: [email protected]; [email protected] C.K.T., S.A-K., and K.S. are with the University of Tsukuba S.N. is with the University of Hiroshima Manuscript prepared January, 2020. Fig. 2. Muscle activations measured with distal EMG precede camera- detected AU movements. The four channels of raw distal EMG activate on average 374 ms before the detected CV-based AU labels. The plot shows data from one participant posing smiles. The activation patterns of EMG and CV-based AUs are similar to each other, with EMG activity leading. The blue line shows the mean of the four EMG channels plus Standard Deviation (SD). AU6 and AU12 often co-occur, as shown by the CV-based output. This makes identifying which muscle produced the measured activity challenging, as EMG measures a mix of muscle activity throughout the face. (OO), often co-occurs with the Zygomaticus Major activity. Whilst it has been claimed that the Duchenne marker is a signal of smile spontaneity [1], [2], [3], [4], other studies have found this marker in posed smiles as well [5], [6], [7]. Others suggested that it might also signal smile intensity instead of smile authenticity [6], [8], [9]. The Facial Action Coding System (FACS) [10] is a stan- dardized method to label facial movements. It involves identifying Action Descriptors for movements involving multiple muscles from their onset to offset. The advantage arXiv:2008.08791v1 [cs.HC] 20 Aug 2020

Upload: others

Post on 05-Oct-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 1

Facial movement synergies and Action Unitdetection from distal wearable Electromyography

and Computer VisionMonica Perusquıa-Hernandez, Member, IEEE, Felix Dollack, Member, IEEE, Chun Kwang Tan, Shushi

Namba, Saho Ayabe-Kanamura, and Kenji Suzuki, Member, IEEE

Abstract—Distal facial Electromyography (EMG) can be used to detect smiles and frowns with reasonable accuracy. It capitalizes onvolume conduction to detect relevant muscle activity, even when the electrodes are not placed directly on the source muscle. The mainadvantage of this method is to prevent occlusion and obstruction of the facial expression production, whilst allowing EMGmeasurements. However, measuring EMG distally entails that the exact source of the facial movement is unknown. We propose anovel method to estimate specific Facial Action Units (AUs) from distal facial EMG and Computer Vision (CV). This method is based onIndependent Component Analysis (ICA), Non-Negative Matrix Factorization (NNMF), and sorting of the resulting components todetermine which is the most likely to correspond to each CV-labeled action unit (AU). Performance on the detection of AU06(Orbicularis Oculi) and AU12 (Zygomaticus Major) was estimated by calculating the agreement with Human Coders. The results of ourproposed algorithm showed an accuracy of 81% and a Cohen’s Kappa of 0.49 for AU6; and accuracy of 82% and a Cohen’s Kappa of0.53 for AU12. This demonstrates the potential of distal EMG to detect individual facial movements. Using this multimodal method,several AU synergies were identified. We quantified the co-occurrence and timing of AU6 and AU12 in posed and spontaneous smilesusing the human-coded labels, and for comparison, using the continuous CV-labels. The co-occurrence analysis was also performedon the EMG-based labels to uncover the relationship between muscle synergies and the kinematics of visible facial movement.

Index Terms—Electromyography, computer vision, smiles, FACS, posed and spontaneous smiles.

F

Fig. 1. Wearable used to measure distal EMG from four channels placedon the sides of the face, on both temples of the head. This configurationenables facial expression identification without obstructing the face.However, this makes identifying which muscle produced the measuredactivity challenging.1 INTRODUCTION

FACIAL expressions often co-occur with affective expe-riences. Smiles are among the most ubiquitous facial

expressions. They are characterized by the corner of thelips moving upwards, as a resulting action of the activity ofthe Zygomaticus Major muscle (ZM). Although smiles havebeen deemed the prototypical expression of happiness, notall people who smile are happy. The so-called Duchennemarker, or movement from the Orbicularis Oculi muscle

• M.P-H. and F.D. are with NTT Communication Science LaboratoriesE-mail: [email protected]; [email protected]

• C.K.T., S.A-K., and K.S. are with the University of Tsukuba• S.N. is with the University of Hiroshima

Manuscript prepared January, 2020.

Fig. 2. Muscle activations measured with distal EMG precede camera-detected AU movements. The four channels of raw distal EMG activateon average 374 ms before the detected CV-based AU labels. The plotshows data from one participant posing smiles. The activation patternsof EMG and CV-based AUs are similar to each other, with EMG activityleading. The blue line shows the mean of the four EMG channels plusStandard Deviation (SD). AU6 and AU12 often co-occur, as shown bythe CV-based output. This makes identifying which muscle producedthe measured activity challenging, as EMG measures a mix of muscleactivity throughout the face.

(OO), often co-occurs with the Zygomaticus Major activity.Whilst it has been claimed that the Duchenne marker is asignal of smile spontaneity [1], [2], [3], [4], other studies havefound this marker in posed smiles as well [5], [6], [7]. Otherssuggested that it might also signal smile intensity instead ofsmile authenticity [6], [8], [9].

The Facial Action Coding System (FACS) [10] is a stan-dardized method to label facial movements. It involvesidentifying Action Descriptors for movements involvingmultiple muscles from their onset to offset. The advantage

arX

iv:2

008.

0879

1v1

[cs

.HC

] 2

0 A

ug 2

020

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 2

of using the FACS is that no subjective inferences about theunderlying emotion are made during the facial movementidentification. From these AU configurations, inferences canbe made by experts in the frame of different theories ofemotion. In the FACS, the lip corner pulling upwards islabeled as Action Unit 12 (AU12), and the movement aroundthe eyes in the form of a cheek raiser is labeled as AU6. AU6is also the AU associated with the Duchenne Marker. AUscan be measured by visual inspection using video record-ings, either by a human coder [11] or by using ComputerVision (CV) algorithms [12]. Additionally, the underlyingmuscle activity can be measured with Electromyography(EMG) [13], [14]. The standard method is to place the EMGelectrodes directly on top of the relevant muscle to increaseSignal-to-Noise Ratio (SNR). More recently, several studieshave proved the feasibility of measuring facial expressionswith distal EMG [15], [16]. Distal EMG refers to measuringmuscle activity from a body location that is distant fromthe relevant muscle. Distal EMG measurements are possiblethrough volume conduction whereby the electrical activitygenerated by each muscle spreads to adjacent areas [13].By measuring EMG distally, the unnatural obstruction thatthe electrodes pose to the production of facial expressionsis reduced. Despite this advantage, distal measurementsmake it difficult to know the exact location of the EMGactivity source. Hence, current technology has been onlyused to identify grouped muscle activity such as smiles orfrowns. Detecting such facial expressions from EMG has itsown merit, such as high temporal resolution and robustnessagainst occlusion. However, to compare the knowledgedrawn using this technology to the large body of facialexpression research that uses AUs as the basis of analysis,we need to identify muscle movement activity at the AUlevel.

Units of movement are often grouped to form full facialexpressions. In movement science, muscle synergies are de-fined as joint movements produced by muscle groups [17].Analogously, we define facial movement synergies as groupsof muscles, or AUs, moving together. We hypothesize thatmuscle synergies might not correspond 1:1 to visible observ-able synergies, due to the differences between muscle activ-ity and the kinematics of the movement. Additionally, by us-ing synergy analysis, we should be able to observe differentsynergies involving AU6 and AU12 in the context of posedand spontaneous smiles. This might provide more evidencein favor or against AU6 being a marker of spontaneity. IfAU6 and AU12 move together in spontaneous smiles, butnot in posed smiles, the Duchenne marker might be relatedto smile genuineness. On the other hand, if the Duchennemarker is not a signature of spontaneity, we should observesimilar synergies in both posed and spontaneous smiles. Weconducted this co-occurrence analysis using human-labeledAUs, CV-labeled AUs, and EMG-labeled AUs. The objectivewas to find common factors in how a smile is produced,and the relationship between visible movement and EMGsynergies.

For the purpose of this analysis, we propose a sensing-source-synergy framework. Blind Source Separation (BSS)methods, such as Independent Component Analysis (ICA)or pre-trained OpenFace models, could be used to go fromsensing to sources. Furthermore, analyses such as Non-

Negative Matrix Factorization (NNMF) can be used toidentify synergies from fine-grained movement units. Weexplore several algorithms within this space to shed lighton the spatial and temporal elements that form posed andspontaneous smiles.

Moreover, it is possible to create an AU identificationsystem that works in recording sessions where high move-ment or high facial occlusion are expected by combining CV-and EMG- based methods. In those cases, CV alone wouldstruggle to continuously identify certain AUs. On the otherhand, wearable distal EMG can deal with occlusion andmovement, but it cannot disentangle AUs so easily. With ourproposed method, CV can be used in a calibration period asa reference for automatic AU identification in EMG. Thus,we would be able to create a more robust system thatenables fine-grained analysis of facial expression synergiesand activation. A use case scenario would be when assessingchildren’s behavior, or when developing applications forVirtual Reality. It would be possible to have a short calibra-tion session with both camera and EMG recordings. Usingthis multimodal information, the EMG could be tagged andused to identify AUs without requiring constant camerasurveillance while the children play and run around, orwhen the face is covered by a headset.

The main contributions of this work are:

1) A framework to analyze facial activity by estimat-ing sensed signal sources and synergies.

2) A method to identify individual muscle activitysources linked to the AUs 6 and 12 during smileproduction from a multimodal system. This systemuses both CV and EMG during calibration, andEMG only for high-movement, high-occlusion sit-uations. We use CV as reference method to identifydifferent EMG components. This method is based onIndependent Component Analysis (ICA) for BlindSource Separation (BSS) of the EMG readings to es-timate the source muscle. Additionally, we use Non-Negative Matrix Factorization (NNMF) to identifyAU synergies.

3) An analysis of facial movement synergies usingAUs as sources. We present two selection methodsto analyze co-occurring activation patterns of AU6and AU12 in the context of posed and spontaneoussmiles.

2 RELATED WORK

2.1 EMG-based identification

Compared to traditional EMG measurements, a reduced setof electrode positions has proven to yield high facial ex-pression recognition rates of 87% accuracy for seven posedfacial expressions, including sadness, anger, disgust, fear,happiness, surprise and neutral expressions. This subsetincludes electrodes placed on the Corrugator and Frontalis onthe forehead; and ZM and Masseter on the cheeks [18]. DistalEMG has been used to identify different facial gestures byusing different electrode configurations. Two EMG bipolarchannels were placed on the Temporalis muscle on each sideof the face, and one placed on the Frontalis muscle gaveinput to distinguish ten facial expressions. The achieved

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 3

accuracy was 87% using a very fast versatile elliptic basisfunction neural network (VEBFNN) [15]. Although not allgestures were facial expressions of emotion, they did in-clude symmetrical and asymmetrical smiling, raising eye-brows, and frowning.

Distal EMG has been implemented on a wearable de-signed to keep four EMG channels attached to the sidesof the face at eye level (Fig. 1). With this placement, it ispossible to reliably measure smiles in different situationswithout obstructing facial movement [19], [20]. This is pos-sible because smile-related distal activity measured from theZM is sufficiently large to be robust against non-affectivefacial movements such as chewing gum and biting [13],[16], [21]. Hence, the information picked up by the fourchannels is used to approximate different sources of mus-cular activity using ICA [22]. The separated muscle activitycontains components for muscles involved in generatingsmiles and can be used to identify these [16]. This approachcan be used offline for fast and subtle spontaneous smileidentification [19] and is possible even in real time [23].Finally, this device has also been used to analyze spatio-temporal features of a smile by fitting envelopes to theEMGs Independent Components (ICs), and later performingautomatic peak detection on those envelopes [24] with per-formance similar to that achieved by Computer Vision [25].Furthermore, four EMG leads placed around the eyes in aHead-Mounted Display (HMD) have also been used suc-cessfully to distally identify facial expressions even whenthe face is covered by the device. Facial expressions ofanger, happiness, fear, sadness, surprise, neutral, clenching,kissing, asymmetric smiles, and frowning were identifiedwith 85% of accuracy [26]. Another recent work proposedthe use of a thin sticker-like hemifacial 16 electrode arrayto paste on one side of the face and identify ten distinctfacial building blocks (FBB) of different voluntary smiles.Their electrode approach is novel, robust against occlusion,and provides a higher density electrode array than that ofthe aforementioned arrangements. This enabled them to useICA and clustering to define several FBB corresponding toa certain muscle [27]. Nevertheless, they require electrodeusage proximal to each muscle. This entails that a largesticker needs to be placed on the skin, obstructing sponta-neous facial movement. Moreover, the physical connectionof the electrode array enhances artifact cross-talk betweenelectrodes. To eliminate such cross-talk, ICA was used andthe resulting clusters were derived manually.

2.2 CV-based identification

CV is the most widely used technique for identifying facialexpressions [28], even at the individual Action Units level.There are different approaches to extract relevant featuresfor AU identification and intensity estimation. Among these,appearance-based, geometry-based, motion-based, and hy-brid approaches. Several algorithms range between 0.45 and0.57 F1 scores for occurrence detection and between 0.21and 0.41 for intensity estimation [29]. The OpenFace toolkit2.0 [12] is a CV pipeline for facial and head behavior identi-fication. Its behavior analysis pipeline includes landmarkdetection, head pose and eye gaze estimation, and facialaction unit recognition. This algorithm detects AU 1, 2, 4,

5, 6, 9, 12, 15, 17, 20, 25, 26 with an average accuracy of 0.59in a person-independent model. Moreover, the use of spatialpatterns has been shown to achieve about 90% accuracy inthe task of distinguishing between posed and spontaneoussmiles [30]. Dynamic features based on lip and eye land-mark movements have provided an identification accuracyup to 92.90% [31]. Other algorithms using spatio-temporalfeatures as identified by restricted Boltzmann machineshave been able to achieve up to 97.34% accuracy in distin-guishing spontaneous vs. posed facial expressions [32].

2.3 Muscle synergies

The concept of synergy in facial activity recognition hasbeen used mainly to describe synergies between differentsensors [33], [34], [35]. However, muscle synergies referto simultaneous muscle activation. Muscle synergies aretypically expressed in the form of a spatial component (syn-ergies) and a temporal component (activations). The spatialcomponent describes the the grouping and ratio of musclesthat are activating together for a given movement. Thetemporal component describes how each spatial componentis activated in the time series. Two types of synergies havebeen currently proposed: ”Synchronous synergies” assumethere are no temporal delays between the different musclesforming the synergies (i.e. synergies are consistent through-out the movement) while the activation of these componentschange. On the other hand, ”time-varying synergies” con-sider both the synergies to change [17]. There is an ongoingdebate on whether the Central Nervous System controls theactivation of individual motor units, individual muscles,group of muscles, or kinematic and automatic features [17].This research is mainly done in the domain of motor controlof wide and coordinated movements such as gait. Several re-searchers have used NNMF as a method to identify musclesynergies during posture and gait responses. This method isoften used jointly with an analysis of Variance-Accounted-For (VAF) by each synergy component when reconstructingthe original EMG signal [36], [37], [38], [39]. In this method,the source signals are decomposed in as many componentsas there are degrees of freedom. The number of componentsthat contain a VAF higher than a threshold are considered asthe number of synergies contained in the group of sources.

3 THE SENSING-SOURCE-SYNERGY FRAMEWORK

We propose a framework to analyze sensed signals byestimating their sources and synergies. Since AUs are closelyrelated to individual muscle activity, we refer to them as“sources” (Fig. 3). Sources are facial movement units causedby a certain muscle. These individual sources often move insynchronous manner to form visible facial expressions suchas smiles. A group of muscles, or a group of AUs, movingtogether are called synergies. Different transformations arenecessary to go between sensed signals, source signals,and synergies. To go between a sensed signal mix to thesource signals originating a movement, we can use ICA forEMG, and OpenFace for videos. Similarly, NNMF can beused to go from movement sources to synergy groups. Aspecial case is when the sensed signal is very close to thesource signal. For example, if EMG is measured directly

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 4

ICA

OpenFace

NNMF

NNMF when the sensed signal = source

Raw EMG channels

or Camera images

Estimated by an

algorithm

Joint source

movements

Ch 1 or

pixel 1

Ch 2 or

pixel 1

Ch 3 or

pixel 1

Ch 4 or

pixel 1

AU12

AU6

or ZMSynergy

1

Synergy

2

Synergy

3

Synergy

4

. . .

Synergy

N

or OO

Source

N

. . .

AU7

Ch N

. . .

Source

signals

Sensed

signal mix

Synergies

Multimodal IdentificationSensing-Source-Synergy

framework Sensed

signal mix

Source

signals

Matching

Camera

EMG

xcorr

labeling

AU labels

Approximated

components

OpenFace

ICA

AU6 NoiseAU12

Thresholding

IC1

IC2

IC3

max xcorr

max xcorr

max xcorr

Synergy analysis

1 2

3Source

signals

Synergies

AUs 6 single and double

activation patterns

AUs S1, S2, S3, S4

S1, S2AUs

Sensed

signal mix

Camera

Human

EMG

Multimodal

Cognition Co-occurrence

analysis

VAF analysis

NNMF

ICA NNMF

NNMF

OpenFace

Matching

Temporal

Matching

Spatial

Matching

S1, S2, ... S16

S1, S2, S3

S1, S2, S3

xcorr

cos sim

S1, S2, S3

S1, S2

A

B

C

D

E

Fig. 3. (1) Sensing-Source-Synergy framework. Sensors often read raw signals that are a mixture of signals of interest and other sources thatcan be considered noise. In other cases, the measured signals can be considered directly as the sources. We use OpenFace and ICA to derivemovement source units. (2) Multimodal AU identification. AU labels are extracted from CV, and are used to assess facial expressions. Then theinformation derived from CV is used to identify the EMG components that correspond to each AU type. (3) Synergy analysis. Groups of sourcesmoving together were analyzed by looking at the different sensing modalities independently or in combinations. (A) Six single and double activationpatterns were proposed and identified from discrete AU labels. (B) Using NNMF and VAF methods, four AU synergies were found. (C) ICA wasused to find the muscle sources and then NNMF was applied on the ICs to identify two synergies. (D) Temporal matching of each of the possiblesynergies with the ICs identified from EMG. (E) Spatial matching of the NNMF weights derived from EMG and four CV-labeled AUs selected fromthe VAF analysis.

from the muscle originating the muscle activity, we canassume equivalence between measurement and source. Insuch cases, NNMF can be applied directly to the sensedsignal to obtain synergies.

4 DATA SET

This data is a subset of the data generated in a previousstudy [40]. Here only a brief description is provided forinformative purposes.

4.1 Participants41 producers took part in the study (19 female, averageage=25.03 years, SD=3.83).

4.2 Experiment designThe experiment consisted of several blocks. All the produc-ers completed all the experimental blocks in the same order.This was to keep the purpose of the experiment hiddenduring the spontaneous block.

4.2.1 Spontaneous Block (S-B)A positive affective state was induced using a 90 s humor-ous video. After the stimuli, a standardized scale assessingemotional experience was answered. Next, producers wereasked to tag any facial expressions that they had made.

4.2.2 Posed Block (P-B)

Producers were requested to make similar smiles as theydid in the S-B. However, this time, a 90 s slightly negativevideo was presented instead. Their instruction was: “Pleaseperform the smiles you video coded. This is for a contest. Weare going to show the video we record to another person,who is unknown to you, and if she or he cannot guesswhat video you were watching, then you are a good actor.Please do your best to beat the evaluator”. After watchingthe video and performing the task, they completed the samestandardized scale assessing emotional experience. Theywere also asked to tag their own expressions.

4.3 Measurements

• Smile-reader. Four channels total of distal facialEMG were measured from both sides of the faceusing dry-active electrodes (Biolog DL4000, S&MEInc) sampled at 1 kHz (Fig. 1).

• Video recordings. A video of the producers facialexpressions was recorded using a Canon Ivis 52camera at 30 FPS.

• Self video coding. The producers tagged the onsetand offset of their own facial expressions using Dart-fish 3.2. They labeled each expression as spontaneousor posed, and indicated whether or not it was a smile.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 5

• Third person video coding. Two independent raterslabeled the videos with Dartfish 3.2. They coded forthe start frame and the duration of every smile, andAUs 1, 2, 4, 5, 6, 9, 10, 12, 14, 15, 17, 18, 25, 26 and 28.

5 DATA ANALYSIS

We introduced a novel framework to estimate muscle move-ment sources from distal EMG. Using this framework, weanalyzed facial muscle synergies in an automatic manner.Two types of data analysis were performed (Fig. 3). The firsttype is multimodal identification, which aims to identifydifferent sources or muscle groups from the recorded EMG;and assesses their similitude to AUs detected using CV. Werely on feature engineering using blind-source separation,cross-correlation, and dimension reduction. This is, we areusing feature engineering applied on a continuous-timeseries to transform the data into more relevant informationthat can be thresholded by its SD. The second one is syn-ergy analysis, aimed to identify the spatial and temporalstructures of the facial expressions present in the data. Thesynergy analyses were conducted both on single modalitiesand at the multimodal level. Both types of analysis used thesame type of EMG pre-processing.

EMG Pre-processing. The four EMG channels were firstpassed through a custom Hanning window with a ramptime of 0.5 s to avoid introduction of artificial frequenciesby the filtering at the start and the end of the signal. After-wards, the signals were (1) linear detrended, (2) transformedto have zero mean and one standard deviation, (3) band-pass filtered from 15 to 490 Hz, (4) rectified, and (5) low-passfiltered at 4 Hz.

CV-based AU labeling using OpenFace. The FacialBehavior Analysis Toolkit OpenFace 2.0 was used to identifyseveral facial features including AUs. AU identification isgiven both as a continuous output or intensity rating; anda binary output indicating AU presence. The intensity andpresence predictors have been trained separately and onslightly different datasets, which means that they are notalways consistent [12]. In this work, we choose to usethe continuous or the binary rating depending on the re-quirements of our algorithm. The binary CV AU labels areextracted and upsampled, as well as the continuous labels,from 30 Hz to 1 kHz to match the EMG sampling frequency.

Blind-source separation. ICA [41], was used to auto-matically estimate different muscle activity sources. Thewearable used to collect the data has four channels. Thus,we set the number of decomposed components to three.

Synergy identification. NNMF [42] is a dimensionalreduction method aimed to uncover synchronized musclemovements. We expect it to be able to identify sourceactivity happening at the same time, either from CV-AUsor EMG. If AU6 and AU12 happen at different times, theyshould be categorized as belonging to different synergies.The used dataset contains both posed and spontaneoussmiles. Thus, we hypothesize that NNMF will be able toidentify whether AU6 and AU12 belong to the same synergyor not. The number of synergies found primarily dependson the number of available sources. In the case of CV, 17AUs were identified. Hence, the degrees of freedom were amaximum of 16. In the case of EMG, the degrees of freedom

is three when applying NNMF on the raw measurements,and two when applied on the estimated ICs. Fig. 3-3 showsin detail the processing followed.

Multimodal identification with component matchingto CV-generated labels. A matching method was used toassess similarity between EMG components and CV-basedlabels (Fig. 3-2). We assume the EMG signal to contain AU6,AU12 and noise. Noise is defined as electrical interferenceas well as other muscular sources. First, we calculated thecross-correlation of the three ICA components; the continu-ous AU6, AU12 OpenFace CV-labels; and an uniformly dis-tributed random noise distribution. Since AU12 stems fromthe large and strong ZM muscle, the index of the maximumcorrelation is chosen to correspond to AU12. The other twoICs get assigned to be AU6 and noise in order of maximumcorrelation value. Afterwards, a threshold method was usedto determine active samples. Further smoothing is appliedon the individual ICs by means of a first order Savitzky-Golay filter with length 301. An initial period of ≈ 1 sor 30 samples of the IC is used to calculate the baselinesignal average and standard deviation. The whole signalthen is turned into a binary vector where samples that crossthe threshold of m + kσ with k = 2 are set to one. Thevalues set to one are thought to correspond to activity of therespecting AU assigned to the IC during the process of AUidentification. In this process, we were careful not to usethe ground truth labels (Human-coding) as input for ouralgorithm. The human labels were only used to assess howgood the EMG threshold method is approximating visibleactivity tagged by the human coders.

Synergy Analysis. In some contexts, AUs might appeartogether more often. This might be the case for the co-occurrence of AU6 and AU12 in spontaneous smiles, shouldthe Duchenne marker truly be a marker of smile spontaneity.In contrast, posed smiles would be characterized by lesssynchronized AU6 and AU12 activity. Hence, several co-occurring activation were analyzed using different modali-ties (Fig. 3-3-A). We propose six activation patterns: (1) AU6only; (2) AU12 only; (3) AU12 inside AU6; (4) AU6 insideAU12; (5) AU12 before AU6; and (6) AU6 before AU12. Analgorithm was designed to detect these from binary labels.These labels can be generated by human coding, CV, EMGor a combination of them. This method relies on subtractingthe labels of one AU from the other, and then identifying thedifferences within each block for each activation pattern.

NNMF was used to extract muscle synergies from EMG,and AU synergies from the continuous CV-labels. The totalvariance accounted for (VAF) was used as a metric todetermine the ideal number of synergies [38]. First, we usedthe AUs derived from CV (Fig. 3-3-B), and decomposedthem into NNMF components between 1 and 16 per exper-imental block. Afterwards, the ideal number of synergieswas determined by ensuring that the number of synergieswould reconstruct the original signal with less than 15%error for all participants in both posed and spontaneousblocks. Furthermore, we compared the EMG-based and CV-based synergy detection to match them across modalities.The matching was done in the temporal domain using cross-correlation (Fig. 3-3-D), and in the spatial domain usingcosine similarity to sort and match the cross-modal NNMFcomponent weights (Fig. 3-3-E). To use cosine similarity,

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 6

it is necessary to have components with equal number ofweights in both modalities. Thus, we decomposed threesynergies from only four smile-related CV-labeled AUs (AU6, 7, 10 and 12). This was to match the four EMG channels.Nevertheless, we would prefer to do a cascading transfor-mation by first determining the sources using ICA, andthen applying NNMF to the ICs (Fig. 3-3-C). However, dueto the limited number of EMG channels and the evidencefrom the multimodal identification that AU12 and AU6 arestrong in the raw signal; we considered the EMG signal tobe equivalent to the sources only for this analysis (Fig. 3-3-E). Given this, the spatial matching would be an alternativeto the multimodal identification described in Fig. 3-2.

Delay between raw EMG and CV-based AU detection.The delay between the EMG signals and the CV-based AUdetection was calculated by looking at the lag at which themaximum cross-correlation between EMG channels and AUlabels appeared. A similar method was used to calculate thedelay between AU6 and AU12 from the CV-based binarylabels.

Agreement with human coders. Human-coded labels,CV-labels, and EMG-labels were transformed to a match-ing sampling rate. Then the agreement between differentmeasurements was calculated using Cohen’s Kappa [43].Additionally, we report accuracy, precision, and recall. Theadvantage of using Cohen’s Kappa is that it penalizes forthe larger amount of no AU samples in the set, given thatparticipants did not smile all the time.

6 RESULTS

6.1 Delay between CV-EMG signalsThe delay between raw EMG and CV-based labels was374 ms in average (median = 450 ms, SD = 366 ms, Fig. 2).

6.2 Action Unit identification6.2.1 Ground truthThe inter-coder agreement was a Cohen’s Kappa of 0.78 forAU06 and 0.84 for AU12. For further processing, a singlehuman-coded label was set to active when either of thecoders thought there was an AU. The agreement betweenCV and the human labeling was a Cohen’s Kappa of 0.42for AU06 and 0.43 for AU12.

6.2.2 Multimodal identificationThe Cohen’s kappa between Human-coded labels andEMG-based labels was 0.49 for AU6 and 0.53 for AU12.Furthermore, the accuracy reached 81% for AU6 and 82%for AU12 (Tab. 1). Although the correlation between CV-based labels and EMG-based labels is an important step forthe selection of the EMG components, the selected EMG-based labels might not coincide perfectly with the CV-basedlabels (Fig. 4).

6.3 Synergy analysis6.3.1 Co-occurrence analysisTo assess the co-occurrence frequency between AU6 andAU12, we evaluated how much agreement AU6 and AU12labels have with each other. High agreement indicates that

Fig. 4. The threshold method applied to the ICA components identifiedas AU6 and AU12 for a block with spontaneous smiles (left) and posedsmiles (right). The ICA components are show as blue lines.

TABLE 1Agreement of AU6 and AU12 between human-coded and CV-based

and between human-coded and CV-EMG detected AUs.

Coder1 Coders Codersvs. Coder2 vs. CV vs. EMG

AU6 Agreement κ 0.78 0.42 0.49Accuracy - 0.83 0.81Precision - 0.93 0.62Recall - 0.34 0.60

AU12 Agreement κ 0.84 0.43 0.53Accuracy - 0.83 0.82Precision - 0.90 0.63Recall - 0.35 0.68

both AUs appear at the same time. For the human labeleddata we see a Cohen’s Kappa of 0.84 (Tab. 2). Similarly, theCV-based labels show a Cohen’s Kappa of 0.62. Finally, theAUs detected from EMG show a Cohen’s Kappa of 0.34.A Wilcoxon rank sum test on the AU co-occurence patternanalysis (Fig. 5) showed that AU6 and AU12 co-occur moreoften than not (W = 6519.5, p < .01), regardless of theexperimental block (W = 3362, p > .5). Furthermore, AU6tends to start before AU12 most of the time (W = 7385,p < .01). An analogous analysis of CV vision data yieldedno significant differences between posed and spontaneousblocks (W = 1390, p > .5), but showed opposite resultsregarding the order of occurrence between AU6 and AU12.According to CV, AU12 occurred before AU6 more often(W = 0.71, p < .001). To better understand the temporalrelationship between both AUs, we performed a frame-by-frame comparison of AU6 and AU12 activation from humancoded, with CV extracted AUs and AUs labeled with ourmethod (Fig. 6). Each frame was selected with respect to theonset of AU12. A frame started 0.5 seconds before the onsetand had a duration of one second. Whereas human codersrated an almost perfect co-occurrence, EMG and CV codeda higher probability of AU6 being active before AU12 onset.

6.3.2 NNMF VAF Analysis on CV-based AU continuouslabelsAfter iteratively decomposing the synergies from CV-AUlabels using NNMF, the VAF analyses showed that foursynergies account for 85% or more of the variance in bothblocks for all participants. A Wilcoxon rank sum test showedno differences in weights between spontaneous and posed

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 7

TABLE 2Agreement of co-occurrence between AU6 and AU12 for human-coded,

CV-based and from EMG detected AUs.

Agreement κ Accuracy Precision RecallHuman 0.84 0.96 0.77 0.77CV 0.62 0.91 0.57 0.57EMG 0.34 0.75 0.48 0.48

0.0

0.4

0.8

1.2

Occu

rre

nce

ra

tio

Occu

rre

nce

ra

tio

CV-labels

Human-coded labels

AU12 only AU12 + AU6AU6 only

AU12 only AU12 + AU6AU6 only

Spontaneous Block Posed Block

0.0

0.4

0.8

1.2

Fig. 5. According to human coders, AU6 and AU12 co-occur most of thetime. This differs from the labels according to OpenFace as expectedfrom the low agreement between computer vision and human labels.

blocks (W = 3968700, p > .05). A closer look to the weightsof those four synergies showed that not all AUs have highweights. Therefore, we selected the AUs whose weightscontributed more to the four synergies. The selected AUsare 4, 6, 7, 10, 12, 14, 17, 25, 26, 45 (Fig. 7). The resultingweights were clustered per participant and AU within eachsynergy. AU6 and 12 were grouped together in the first twocomponents, which accounted for most of the variance. AU4was often present, but not clustered with other AUs. In thethird component, AU6 was clustered with AU10. Finally,blinks (AU45) emerged in the last component.

6.3.3 ICA-NNMF source-synergy analyisSince Distal EMG is not measured close to the movementoriginating muscle, we proposed to use ICA first to approx-imate the movement sources before applying NNMF. AfterICA, three sources were approximated and labelled usingour matching algorithm. Next, only two NNMF componentscan be approximated due to the lose of one degree of free-dom after ICA. The weights for the two resulting synergy

Fig. 6. Frame-by-frame comparison of AU6 with respect to the onsetof AU12 from human coded AUs, CV extracted AUs and AUs labeledwith our method. The left figure shows results from spontaneous blocks,while the right figure shows the results from posed blocks. Human codedAUs are depicted in pink, CV extracted AUs in green and AUs from ourmethod in blue.

−25

0

25

50

75

100

AU

4

AU2

5

AU4

5

AU2

6

AU1

7

AU1

4

AU

7

AU1

0

AU

6

AU1

2

2117253941833142318112254372438292761032720113428313622401316391530263519

3921712123293337382011825414262236126327242815102731334351930165940184113

AU

4

AU2

5

AU4

5

AU2

6

AU1

7

AU1

4

AU

7

AU1

0

AU

6

AU1

2

8236391411321161753272663728710333515312211349321225292318403824192043014

AU

4

AU

7

AU

17

AU

45

AU

26

AU

14

AU

25

AU

12

AU

6

AU

10

8413940361923164141261830171322296382834271037242151131332351292057212533

AU

45

AU

10

AU

7

AU

17

AU

4

AU

25

AU

26

AU

12

AU

14

AU

6

Component 1 Component 2 Component 3 Component 4

AU4 AU6 AU7 AU10 AU12 AU14 AU17 AU25 AU26 AU45

Component 1 Component 2 Component 3 Component 4

Fig. 7. Weight averages per AU for the four components selected usingthe NNMF VAF as criteria. AU6 and 12 are grouped in components 1and 2.

−0.25

0.00

0.25

0.50

0.75

1.00

NN

MF

We

igh

ts

−0.25

0.00

0.25

0.50

0.75

1.00

NN

MF

We

igh

ts

Component 1 Component 2Spontaneous Block

Posed Block

AU6 AU12 Other

AU6 AU12 Other

Fig. 8. AU synergies identified per experimental block and NNMF-components. The horizontal axis shows the AUs detected by our match-ing algorithm. The vertical axis show the component weights per AU.

components are shown in Fig. 8. Interestingly, the differencebetween experimental blocks of the synergy weights of AU6and AU12 was significant according to a Wilcoxon rank sumtest (W = 9886.5, p < .005). In the spontaneous block, onesynergy is comprised of AU12 and others, and the secondsynergy is the joint activity of AU6 and AU12. On thecontrary, the posed block yielded to two distinct synergies,one corresponding to AU6, and the other to AU12, whichsuggests that both facial movements are jointly executedonly during spontaneous smiles. Finally, the ”Other” labelis comprised of any other AUs detected by the EMG elec-trodes.

6.3.4 Temporal matching

Fig. 9 shows the CV-derived components per AU that aremost correlated to the EMG identified AU activations perexperimental block. A Wilcoxon rank sum test showedno significant difference between posed and spontaneousexperimental blocks (W = 3968700, p > .05). Fig. 10 showsa heatmap depicting the synergy weights per AU andparticipants. The AUs with weight loads close to zerowere excluded. Since no differences were found betweenexperimental blocks, the weights among the two blocks areaveraged. AU6 and 12 were clustered together in two com-ponents, whilst in the first AU6 was grouped with AU25,suggesting that the participants smiled with lips apart. Fur-thermore, AU12 and AU10 were clustered together, whichmight be due to a confusion of the OpenFace algorithm.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 8

Fig. 9. CV labeled AU weights after temporal matching with EMG com-ponents per experimental block. Components are grouped along thehorizontal axis in AUs detected by OpenFace and experimental block.Each bar corresponds to the weight of a CV components with maximalcorrelation to the corresponding emg component.

AU7

AU10

AU12

AU4

AU17

AU26

AU14

AU25

AU6

242522112328263292735191443121361333018629331210165158393772034381712

AU4

AU17

AU26

AU25

AU14

AU10

AU7

AU6

AU12

303110192729232632113837615413212579353328228163432420217361411218395

AU4

AU17

AU7

AU10

AU26

AU25

AU14

AU6

AU12

252321363972017619134351491830221325163326284153811318291224273233710

Component 1 Component 2 Component 3

Fig. 10. CV synergy cross-correlated with EMG IC Matching for selectedAUs. Spontaneous and posed blocks were averaged given the lack ofsignificant differences.

6.3.5 Spatial matching

To match the dimensions of the EMG signal, which hasfour channels, four AUs were selected. These were AU6,AU7, AU10, and AU12. These are the AUs with the largestweight from previous analyses that are smile-related. Giventhe high occurrence of AU7 and AU10, we hypothesizedthat OpenFace might be confusing them with AU6 andAU12 respectively. A Wilcoxon rank sum test showed nodifferences in weights between spontaneous and posedblocks(W = 27872, p > .05). Figure 11 shows the synergyweights per AU. We can observe that AU6 and AU12 areclustered together in components 2 and 3. In component 1,AU6 is grouped with AU10 for most participants.

7 DISCUSSION

We observed co-varying activation patterns of pre-processedEMG, and continuous AUs extracted from CV. There was adelay between CV AU activation and EMG activation, withEMG activation leading by 374 ms. This was expected asEMG originates skin displacement. This delay was largerthan that observed from proximal EMG measurements (av-erage of 230 ms) [44].

We showed that AU6 and AU12 can be detected froma multimodal algorithm that labels EMG signatures esti-mated with ICA with labels generated automatically withCV. These EMG signatures do not always correspond tothose dictated by the CV algorithm during the calibration.However, they yield to a slightly higher agreement than ifCV was used alone. The main reason why the accuracy of

−50

0

50

100

150

AU

7

AU

12

AU

6

AU

10

452023921367162225233062693231271124341835291533383121371713814281019

AU

7

AU

10

AU

6

AU

12

281216372038172211839135733634241527143010293593226194622312523181133

AU

7

AU

10

AU

6

AU

12

152122417120167256914301123362233853419282139437262732132938183133510

Component 1 Component 2 Component 3

AU6 AU7 AU10 AU12

Component 1 Component 2 Component 3

−50

0

50

100

150

Fig. 11. Four smile-related AUs were decomposed in three synergiesfor CV and EMG independently. The top plot shows the average weightloads per AU. The heatmaps below show the weight loads per partici-pant and action unit per component. From the heatmaps, AU6 and AU12were clustered together in components 2 and 3.

the CV labels did not constrain the accuracy of the EMG-derived labels is that CV is used only for initial identificationof the EMG components that are more correlated to eachAU. Afterwards, the EMG-based identification is indepen-dent of the OpenFace labels.

We proposed the framework of Sensing-Source-Synergies.We distinguish the measurements made from the sourcesof the movement, and we use those sources to estimatesynergies. We suggested to use ICA to search for activityfrom different muscle sources when measuring with EMG.On the other hand, NNMF attempts to find joint activationof different muscles. Several analysis pipelines in differentsensing modalities were suggested to quantify the synergiesbetween multiple sources. The human-coded co-occurringblocks of AU6 and AU12 showed that the Duchenne markerappeared simultaneously to the lip movements about 90%of the time, independently of the experimental block. In-terestingly, AU6 leading AU12 occurred in more cases thanthe opposite. A similar analysis on the binary CV-based la-bels showed a lesser percentage of simultaneous activation,followed by an even less simultaneous activation depictedby the EMG labels. This is probably because the automaticalgorithms tend to detect gaps in between AU6 activationwhen human coders do not report so. Nevertheless, in thecases where the Duchenne marker and the lip movent co-occurred, AU6 was active before AU12 above 80% of thetime in the spontaneous block; and above 60% of the timein the posed block. The agreement of AU6 and AU12 labelsper modality was the highest for human-coding, followedby CV and EMG. Probably EMG had the least co-occurenceagreement because of its higher temporal resolution, and,most importantly, because muscle activations lead visibleactivity. This suggests that indeed, muscle synergies do notperfectly match visible facial activity.

To better interpret the amount of AU synergies, NNMFwas also performed on the AU labels derived from CV, anda VAF metric was used to determine the ideal number ofsynergies. The results suggested four synergies. The weightssuggested that not all AUs contributed to these synergies,and no differences between posed and spontaneous exper-

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 9

imental blocks was found. Another method proposed toselect synergies was temporal matching with the EMG ICcomponents. This method yielded three synergies with sim-ilar characteristics to the ones selected with the VAF criteria.Also, no significant differences were found between posedand spontaneous blocks. This entails that even though AU6and AU12 weights are often clustered together within theselected synergies, there were no differences between posedand spontaneous smiles. According to these analyses, par-ticipants also displayed other AUs, notably AU4. This wasexpected, especially in the posed block where there mightbe a conflict between what the participants felt and the hap-piness they were asked to convey with a smile. It does notmean that those AUs are part of a smile, but that they werealso present in the block. This suggested that antagonisticAU synergies might also occur, as to try to inhibit or maskfacial expressions other than the intended one. Furthermore,we observed that OpenFace often lead to high weights inAUs that are similar to AU 6 and 12. For example, AU6might be confused with AU7, and AU12 with AU10. Thus,when we had to further reduce the AUs to match to theEMG results, we chose these. The results showed that AU6and 12 are often grouped together, and in one case, AU6 wasgrouped with AU10. Thus, whilst CV-based identificationhas a higher spatial resolution to identify several AUs, it stillconfuses several labels. Additionally, NNMF was used toidentify joint muscle activity directly from the pre-processedEMG in an spatial matching algorithm. The results weresimilar to those found by applying NNMF on CV AUs only.This is in line with the hypothesis that NNMF considers theZM and the OO to move as one single synergy. A closer lookinto the NNMF results suggested that the aforementionedmuscles move in a single synergy picked up by the lowerelectrodes of the wearable. Thus, probably the synergy isdominated by the ZM. This is in line with results showingthe significant strength of the ZM when compared to othermuscles, and it might be related to muscle length [13].To reiterate, our proposed method is using OpenFace asa rough guide for EMG labeling. Our algorithm does notneed the OpenFace to be perfect, as we consider only thedynamics of the facial activations as input for the cross-correlation. Since the EMG data contains the movementsources measured distally, and ICA estimates them blindly,CV is only used for (soft-) labeling. In later stages, thealgorithm is completely independent from the CV accuracy.Evidence of this is the superior performance of EMG withrespect of CV with regards to the AU detection. In otherwords, the performance of the OpenFace algorithm did notconstrain our overall detection accuracy.

We also used the results of our proposed multimodalidentification of AUs using EMG, as an input to the NNMFalgorithm. Although the number of synergies that can beidentified using this method is only two, the results weresurprising. We had hypothesized that a challenge for ICAmight be to disentangle multiple AUs from the mixed EMGgiven the joint activation of the ZM and the OO. However,the results showed that ICA actually boosted the NNMFsynergy detection by transforming the data to the sourcespace first. Only in this case, we found a strong differencebetween posed and spontaneous smile AU synergies. Inposed smiles AU6 and 12 seem to operate independently,

whereas a joint movement of AU 6 and 12 was observedin spontaneous smiles. This is somehow in line with theDuchenne marker hypothesis. Even though the appearanceof the Duchenne marker can be simulated voluntarily, theunderlying muscle synergies are distinct. The success ofEMG to recognize subtle differences might be that (1) withour matching algorithm EMG already contains informationfrom the CV-based labeling; (2) the forced use of a reducedset of synergies might have made the differences moresalient.

In summary, our framework uses feature engineeringto estimate AUs as muscle movement sources from distalEMG, overcoming the source muscle identification limita-tion. It has an advantage over relying only on visual cues be-cause muscle activation does not translate 1:1 to movementkinematics. We used the results to analyze facial musclesynergies (NNMF) in an automatic manner; compared itto CV-only annotated data; and Human-annotated data, tounderstand the potential and limitations of this method. Weconducted the co-occurrence analysis using coders data, CV,and EMG. The objective was to find common factors in howa smile is produced, and the relationship between visiblemovement and EMG synergies.

8 LIMITATIONS OF THIS STUDY

One of the limitations of this study was the number ofelectrodes provided in the EMG wearable. Whilst four elec-trodes provide a good trade-off between wearability, smileand AU detection, they are limited to conduct muscle syn-ergy analysis. Therefore, we opted to determine the optimalnumber of synergies using CV only, and a synergy-matchingstrategy between EMG and CV. Increasing the electrodenumber will enable us to explore synergies containing morefacial expressions. In this case, we opted to model mainlyAU6 and AU12, and to consider other AUs in the EMGas “noise”. Furthermore, synergy analysis based on NNMFrequires continuous labels. The human-labeled AUs wereperformed frame-by-frame, but the coders only indicatedpresence, not intensity. Thus, we opted to do a block analysison single and simultaneous activation. Labeling AU inten-sity as well would have been useful to apply our NNMFmethod directly on the ground-truth labels. Fortunately,OpenFace derives both continuous and binary AU labelswhich were useful for our method. However, the CV-onlydetection still could be improved. In particular, we foundout that CV might have confused AU10 for AU12; and AU6for AU7. Finally, we conducted this study on data aimed toelicit smiles. It would be interesting to assess the synergiespresent in other types of facial expressions.

9 CONCLUSIONS AND FUTURE DIRECTIONS

Our Sensing-Source-Synergy approach led to good AU iden-tification from a multimodal system, and aided to fine-grained AU synergy analysis. Our results suggest that theDuchenne marker can be displayed in both posed andspontaneous smiles, but the underlying synergy differs.However, these results should be assessed carefully. Ourmethod is computationally inexpensive and performs betterestimating AUs than OpenFace alone. The EMG already

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 10

contains the mixed activity of the movement sources mea-sured distally. ICA estimates them blindly. OpenFaces AUdynamics are inputs for cross-correlation with EMG, leadingto a CV-aided AU label approximation. The final outcomeis determined by a threshold on the EMG activity of theapproximated source. Accordingly, the EMG AU detectionperformance is not constrained by OpenFaces detectionaccuracy. Nevertheless, the agreement with human coders(i.e., the ground truth) can still be improved. Future workshould aim to achieve better agreement to refine the synergyanalysis presented here. Another alternative would be touse our proposed algorithm only with human-coded labels,instead of the OpenFace algorithm, in a cross-validationschema that ensures generalizability of the results, giventhat the labels would be also the ground truth.

REFERENCES

[1] P. Ekman, W. Friese, and R. Davidson, “The Duchenne Smile: Emo-tional Expression And Brain Physiology II,” Journal of Personalityand Social Psychology, vol. 58, no. 2, pp. 342–353, 1988.

[2] P. Ekman, W. V. Friesen, and M. O’Sullivan, “Smiles when lying.”Journal of Personality and Social Psychology, vol. 54, no. 3, pp. 414–420, 1988.

[3] P. Ekman, “Basic Emotions,” in Handbook of cognition and emotion,T. Dalgleish and M. Power, Eds. John Wiley & Sons, Ltd., 1999,ch. 3, pp. 45–60.

[4] H. Guo, X.-h. Zhang, J. Liang, and W.-j. Yan, “The DynamicFeatures of Lip Corners in Genuine and Posed Smiles,” Frontiersin psychology, vol. 9, no. February, pp. 1–11, 2018.

[5] K. Schmidt, S. Bhattacharya, and R. Denlinger, “Comparison of de-liberate and spontaneous facial movement in smiles and eyebrowraises,” Nonverbal Behaviour, vol. 33, no. 1, pp. 35–45, 2009.

[6] E. G. Krumhuber and A. S. R. Manstead, “Can Duchenne smiles befeigned? New evidence on felt and false smiles.” Emotion, vol. 9,no. 6, pp. 807–820, 2009.

[7] S. Namba, S. Makihara, R. S. Kabir, M. Miyatani, and T. Nakao,“Spontaneous Facial Expressions Are Different from Posed FacialExpressions: Morphological Properties and Dynamic Sequences,”pp. 1–13, 2016.

[8] D. S. Messinger, “Positive and negative: Infant facial expressionsand emotions,” Current Directions in Psychological Science, vol. 11,no. 1, pp. 1–6, feb 2002.

[9] J. M. Girard, G. Shandar, Z. Liu, J. F. Cohn, L. Yin, and L.-P. Morency, “Reconsidering the Duchenne Smile: Indicator ofPositive Emotion or Artifact of Smile Intensity?” PsyArXiv, 2019.

[10] P. Ekman and W. P. Friesen, “Measuring facial movement withthe Facial Action Coding System,” in Emotion in the human face,2nd ed., P. Ekman, Ed. Cambridge University Press, 1982, ch. 9,pp. 178–211.

[11] P. Ekman, W. Friesen, and J. Hager, “FACS investigator’s guide,”2002.

[12] T. Baltrusaitis, A. Zadeh, Y. C. Lim, and L.-P. Morency, “OpenFace2.0: facial behavior analysis toolkit,” in 2018 13th IEEE InternationalConference on Automatic Face & Gesture Recognition (FG 2018). IEEE,may 2018, pp. 59–66.

[13] A. van Boxtel, “Facial EMG as a Tool for Inferring AffectiveStates,” in Proceedings of Measuring Behavior, A. Spink, F. Grieco,Krips OE, L. Loijens, L. Noldus, and P. Zimmerman, Eds., Eind-hoven, 2010, pp. 104–108.

[14] K. L. Schmidt and J. F. Cohn, “Dynamics of facial expression:Normative characteristics and individual differences,” in IEEEProceedings of International Conference on Multimedia and Expo.Tokyo: IEEE, 2001, pp. 728–731.

[15] M. Hamedi, S.-H. Salleh, M. Astaraki, and A. M. Noor, “EMG-based facial gesture recognition through versatile elliptic basisfunction neural network.” Biomedical engineering online, vol. 12,no. 1, p. 73, 2013.

[16] A. Gruebler and K. Suzuki, “Design of a Wearable Device forReading Positive Expressions from Facial EMG Signals,” IEEETransactions on Affective Computing, vol. PP, no. 99, pp. 1–1, 2014.

[17] M. C. Tresch and A. Jarc, “The case for and against musclesynergies,” pp. 601–607, dec 2009.

[18] I. Schultz and M. Pruzinec, “Facial Expression Recognition usingSurface Electromyography,” Ph.D. dissertation, Karlruhe Instituteof Technology, 2010.

[19] M. Perusquıa-Hernandez, M. Hirokawa, and K. Suzuki, “A wear-able device for fast and subtle spontaneous smile recognition,”IEEE Transactions on Affective Computing, vol. 8, no. 4, pp. 522–533,2017.

[20] A. Funahashi, A. Gruebler, T. Aoki, H. Kadone, and K. Suzuki,“Brief report: The smiles of a child with autism spectrum disorderduring an animal-assisted activity may facilitate social positivebehaviors - Quantitative analysis with smile-detecting interface,”Journal of Autism and Developmental Disorders, vol. 44, no. 3, pp.685–693, 2014.

[21] L. Oberman, P. Winkielman, and V. Ramachandran, “Face toface: blocking facial mimicry can selectively impair recognitionof emotional expressions.” Social neuroscience, vol. 2, no. 3-4, pp.167–78, sep 2007.

[22] P. Comon, “Independent component analysis, A new concept?”Signal Processing, vol. 36, no. 36, pp. 28–314, 1994.

[23] Y. Takano and K. Suzuki, “Affective communication aid usingwearable devices based on biosignals,” in Proceedings of the 2014conference on Interaction design and children - IDC ’14. New York,New York, USA: ACM Press, 2014, pp. 213–216.

[24] M. Perusquıa-Hernandez, M. Hirokawa, and K. Suzuki, “Sponta-neous and posed smile recognition based on spatial and temporalpatterns of facial EMG,” in Affective Computing and IntelligentInteraction, 2017, pp. 537–541.

[25] M. Perusquıa-Hernandez, S. Ayabe-Kanamura, K. Suzuki, andS. Kumano, “The Invisible Potential of Facial Electromyography: AComparison of EMG and Computer Vision when DistinguishingPosed from Spontaneous Smiles,” in Proceedings of the 2019 CHIConference on Human Factors in Computing Systems - CHI ’19. NewYork, New York, USA: ACM Press, 2019, pp. 1–9.

[26] H. S. Cha, S. J. Choi, and C. H. Im, “Real-time recognition offacial expressions using facial electromyograms recorded aroundthe eyes for social virtual reality applications,” IEEE Access, vol. 8,pp. 62 065–62 075, 2020.

[27] L. Inzelberg, M. David-Pur, E. Gur, and Y. Hanein, “Multi-channelelectromyography-based mapping of spontaneous smiles - IOP-science,” Journal of Neural Engineering, vol. 17, no. 2, apr 2020.

[28] V. Bettadapura, “Face expression recognition and analysis: thestate of the art,” CoRR, pp. 1–27, 2012.

[29] B. Martinez, M. Valstar, B. Jiang, and M. Pantic, “Automatic Anal-ysis of Facial Actions: A Survey,” IEEE Transactions on AffectiveComputing, jul 2017.

[30] S. Wang, C. Wu, and Q. Ji, “Capturing global spatial patternsfor distinguishing posed and spontaneous expressions,” ComputerVision and Image Understanding, vol. 147, pp. 69–76, jun 2016.

[31] H. Dibeklioglu, A. A. Salah, and T. Gevers, “Recognition of gen-uine smiles,” IEEE Transactions on Multimedia, vol. 17, no. 3, pp.279–294, 2015.

[32] J. Yang and S. Wang, “Capturing spatial and temporal patternsfor distinguishing between posed and spontaneous expressions,”in Proceedings of the 2017 ACM on Multimedia Conference - MM ’17.New York, New York, USA: ACM Press, 2017, pp. 469–477.

[33] M. Kostinger, P. M. Roth, and H. Bischof, “Synergy-based learningof facial identity,” in Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics), vol. 7476 LNCS. Springer, Berlin, Heidelberg,2012, pp. 195–204.

[34] M. Haris Khan, J. McDonagh, and G. Tzimiropoulos, “Synergybetween face alignment and tracking via Discriminative GlobalConsensus Optimization,” in 2017 IEEE International Conference onComputer Vision (ICCV), 2017, pp. 3791–3799.

[35] R. P. Henry and R. Alfred, “Synergy in Facial Recognition Ex-traction Methods and Recognition Algorithms,” in Lecture Notesin Electrical Engineering, vol. 488. Springer Verlag, nov 2018, pp.358–369.

[36] G. Torres-Oviedo and L. H. Ting, “Subject-specific muscle syn-ergies in human balance control are consistent across differentbiomechanical contexts,” Journal of Neurophysiology, vol. 103, no. 6,pp. 3084–3098, jun 2010.

[37] G. Torres-Oviedo and L. H. Ting, “Muscle synergies characterizinghuman postural responses,” Journal of Neurophysiology, vol. 98,no. 4, pp. 2144–2156, oct 2007.

[38] C. K. Tan, H. Kadone, H. Watanabe, A. Marushima, M. Yamazaki,Y. Sankai, and K. Suzuki, “Lateral Symmetry of Synergies in

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 11

Lower Limb Muscles of Acute Post-stroke Patients After RoboticIntervention,” Frontiers in Neuroscience, vol. 12, no. APR, p. 276,apr 2018.

[39] C. K. Tan, H. Kadone, H. Watanabe, A. Marushima, Y. Hada,M. Yamazaki, Y. Sankai, A. Matsumura, and K. Suzuki, “Differ-ences in Muscle Synergy Symmetry Between Subacute Post-strokePatients With Bioelectrically-Controlled Exoskeleton Gait Trainingand Conventional Gait Training,” Frontiers in Bioengineering andBiotechnology, vol. 8, p. 770, jul 2020.

[40] M. Perusquıa-Hernandez, S. Ayabe-Kanamura, and K. Suzuki,“Human perception and biosignal-based identification of posedand spontaneous smiles,” PLOS ONE, vol. 14, no. 12, p. e0226328,dec 2019.

[41] A. Hyvarinen and E. Oja, “Independent component analysis:algorithms and applications.” Neural networks: the official journal ofthe International Neural Network Society, vol. 13, no. 4-5, pp. 411–30,2000.

[42] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, oct 1999.

[43] J. Cohen, “Weighted kappa: Nominal scale agreement provisionfor scaled disagreement or partial credit.” Psychological Bulletin,vol. 70, no. 4, pp. 213–220, 1968.

[44] J. F. Cohn and K. Schmidt, “The timing of facial motion inposed and spontaneous smiles,” International Journal of Wavelets,Multiresolution and Information Processing, vol. 2, pp. 121–132, 2004.

Monica Perusquıa-Hernandez received theBSc in electronic systems engineering (2009)from the Tecnologico de Monterrey, Mexico;the MSc in human-technology interaction (2012)and the professional doctorate in engineering inuser-system interaction (2014) from the Eind-hoven University of Technology, the Netherlands.In 2018, she obtained the PhD in Human Infor-matics from the University of Tsukuba, Japan.Her research interests are affective computing,biosignal processing, augmented human tech-

nology, and artificial intelligence.

Felix Dollack received the BEng degree in hear-ing technology and audiology from the Jade Uni-versity of Applied Science in Oldenburg, Ger-many, in 2013 and the MSc degree in hear-ing technology and audiology from the Carl-von-Ossietzky University in Oldenburg, Germany, in2015. In 2020, he obtained the PhD degree inHuman Informatics at the Empowerment Infor-matics Program, University of Tsukuba, Japan.His research interests include psychophysics,biosignal processing and artificial intelligence.

Chun Kwang Tan received his MSc in Au-tonomous Systems from the University of Ap-plied Science Bonn-Rhein-Sieg, Sank Augustin,Germany, in 2016. In 2020, he received PhD de-gree in Human Informatics at the EmowermentInformatics Program, University of Tsukuba,Japan. His research interests includes EMGanalysis, gait analysis and neural motor control.

Shushi Namba received the Ph.D. degree inpsychology from Hiroshima University, Japan,in 2018. He is currently Assistant Professor ofHiroshima University. His research interests in-clude genuine facial displays, human computingand social cognition.

Saho Ayabe-Kanamura received the PhD degree in psychology fromthe University of Tsukuba, Japan, in 1998. Since 2013, she is professorof the Faculty of Human Sciences at the University of Tsukuba. Her re-search interest includes human chemosensory perceptions, perceptuallearning and cognitive neurosciences.

Kenji Suzuki received the Ph.D. degree in pureand applied physics from Waseda University,Tokyo, Japan, in 2003. He is currently Professorwith the Faculty of Engineering, Information andSystems, and also Principal Investigator withthe Artificial Intelligence Laboratory, University ofTsukuba, Tsukuba, Japan. Prior to joining theUniversity of Tsukuba in 2005, he was a Re-search Associate with the Department of AppliedPhysics, Waseda University. He was also a Vis-iting Researcher with the Laboratory of Physiol-

ogy of Perception and Action, College de France, Paris, France, in 2009,and with the Laboratory of Musical Information, University of Genoa,Genoa, Italy, in 1997. His research interests include artificial intelligence,cybernetics, assistive and rehabilitation robotics, computational behav-ior science, and affective computing.