missing samples estimation in electromagnetic ... · gorithm, forward step uses the kalman ﬁlter...

Missing samples estimation in electromagnetic articulography data usingequality constrained Kalman smoother

Sujith P1, Prasanta Kumar Ghosh2

1Department of Electrical Communication Engineering, IISc, Bangalore, India2Department of Electrical Engineering, IISc, Bangalore, [email protected], [email protected]

AbstractElectromagnetic articulography (EMA) data provides the move-ment of sensors attached to different articulators of a subjectwhen the subject is speaking. EMA data often contains miss-ing segments due to sensor failure. In this work, we propose anequality constrained Kalman smoother to estimate the missingsamples in the EMA data. We incorporate the dynamics of thearticulatory movement for missing samples estimation by con-sidering the EMA data vector as the observations from a lineardynamical system. The proposed approach gives 41% reduc-tion on the root mean square error of the estimates compared tothe minimum mean square error estimator which does not utilizethe dynamics of the articulatory movement. When compared tothe maximum a-posteriori estimation with continuity constraints(MAPC) which incorporates smoothness of the articulatory tra-jectory during estimation, the proposed approach gives an aver-age performance improvement of 4.8%.Index Terms: Constrained Kalman smoother, ElectromagneticArticulography, Missing samples estimation

1. IntroductionArticulatory data provides information about the articulatory dy-namics of a subject during speech production [1, 2]. Severaltechniques are available for the articulatory data collection pro-cess such as X-ray [3], X-ray microbeam [4], Magnetic reso-nance imaging (MRI) [5] and Electromagnetic articulography(EMA) [6]. In this work, we focus on the EMA data whichprovides the movement of articulators in the form of the posi-tion vectors of the sensors attached to different articulators. TheEMA data, thus collected, may contain missing segments dueto sensor failure or sensor detachment [2]. Since the data col-lection process is expensive, missing samples estimation tech-niques could be used to reconstruct the missing segments ratherthan discarding the data that contains the missing segments [7].

In previous works on missing samples estimation, Qin etal. [7] proposed minimum mean squared error (MMSE) esti-mator in which missing samples in a frame are estimated fromthe known samples of that frame. In maximum a-posteriori esti-mation with continuity constraints (MAPC) [8], missing samplesin an articulatory trajectory are jointly estimated by maximizingan auxiliary function that contains posterior probability term andthe negative of the energy of the high frequency content of thetrajectory to make sure that the estimated and known segmentsform a continuous trajectory. Q. Fang et al. [9] used dynamicalfeatures for the estimation of missing samples in the EMA data.

In the MMSE estimator, the dynamics of the articulatorymovement is not used for the estimation. In the MAPC estimator,

Work supported by Department of Science and Technology (DST),Govt. of India.

the missing samples of the articulators are estimated by consid-ering one articulator at a time and the dynamics of the respectivearticulatory trajectory is incorporated in the form of a smooth-ness criterion [8]. In this work, we propose a Kalman smootherwith equality constrained state to estimate the missing samples.The Kalman smoother does not work on one articulator at a timerather it incorporates the dynamics of the articulatory system byconsidering the articulatory data as the observation from a lineardynamical system. In order to estimate missing samples in theEMA data of an utterance, Kalman smoother utilizes the knownsamples at the missing location as well as at the remaining loca-tions. We assume that the known samples do not contain noiseand we consider those samples to be the observations from a lin-ear dynamical system with zero measurement noise. The systemparameters are estimated using the maximum likelihood crite-rion from the training data which does not contain any missingsamples. In the estimation process, the state corresponding to aframe is calculated using all known samples of an utterance andthe missing samples are estimated from the estimated state us-ing the observation model. We begin with the description of thedataset used in this work.

2. Dataset and preprocessingFor the experiments in this work, we use the Multichannel Artic-ulatory (MOCHA) [6] database which contains both articulatoryas well as speech data of 460 utterances each spoken by one maleand one female subjects. The articulatory data is collected at asampling rate of 500Hz. We follow the preprocessing steps out-lined in [10] and thus obtain the articulatory data at a rate of 100Hz. We use 14 dimensional EMA feature vector in each framedenoted by ULx, LLx, JAWx, TTx, TBx, TDx, VELx, ULy, LLy,JAWy, TTy, TBy, TDy and VELy.

3. Constrained Kalman smootherIn this section, we briefly describe the formulation of the con-strained Kalman smoother, which will be used for missing sam-ple estimation.

3.1. Kalman Smoother

Kalman smoother [11] is a forward backward algorithm whichgives the optimal estimate of the states of a linear dynamical sys-tem in the presence of Gaussian noise. In Kalman smoother al-gorithm, forward step uses the Kalman filter and in the backwardpass the states get modified using the states from succeedingtime indices. Kalman filter provides the optimal state estimateof a linear dynamical system at an instant considering all obser-vations till that instant in the presence of Gaussian noise [12].Consider a discrete time linear time-invariant system which is

Copyright © 2014 ISCA 14-18 September 2014, Singapore

INTERSPEECH 2014

716

represented by a pair of equations as follows:

sk = Ask−1 + wk−1 (1)xk = Hsk + vk, (2)

where sk is the state of the system at the k-th time instant andsk−1 is the state of the system at (k-1)-th time instant, xk is theobservation, wk is the process noise and vk is the observationnoise at the k-th time instant. Both wk and vk are assumed to bezero mean Gaussian noises. A is the state transition model and His the observation model. Let Q and R be the covariance matricesof the process noise and the observation noise respectively.

Kalman filtering can be considered as a two step process[13] in which at each time instant k, the prediction step givesestimate of the state (sk) from the previous estimated states andin the correction step, the a-priori estimated state gets modified(ssk) using the current observation. After using Kalman filter inthe forward direction, the state at a time index is modified usingthe states at the proceeding time indices. This is referred to asthe backward step in the Kalman smoother [11].

3.2. Constrained Kalman smoother

Consider the case where we have perfect measurements from alinear dynamical system for all observation samples at all timeindices. Then the system can be represented as an equality con-strained linear dynamical system as follows:

sk = Ask−1 + wk−1 (3)xk = Hsk (4)

Several approaches have been proposed for the state esti-mation of the equality constrained dynamical system using con-strained Kalman filter [14, 15, 16].

The steps in the proposed constrained Kalman smoother aresummarized below.Forward pass

The prediction and the correction steps of the Kalmanfilter[13] for constrained linear dynamical system can be ex-pressed as follows:

Prediction step:

s−k = Ask−1 (5)

P−k = E

[(sk − s−k )

T (sk − s−k )]= APk−1AT + Q (6)

Correction step:

sk = s−k + Kk(xk −Hs−k ) (7)

Kk = argmin E[(sk − sk)T (sk − sk)

]Substituting sk from (7) and differentiating w. r. t. Kk and

setting it to zero we obtain

Kk = P−k HT (HP−

k HT )−1 (8)

Pk = E[(sk−sk)T (sk − sk)

]= (I−KkH)P−

k , (9)

where s−k is the a-priori estimate of the state, sk is the updatedstate estimate and Kk is the Kalman gain. P−

k and Pk are theerror covariance matrices of the a-priori state estimate andupdated state estimate respectively. It is easy to verify from (7)(using (8) for Kk) that Hsk = xk.

Backward pass

The backward pass in the constrained Kalman smoother [11]can be represented using the following equations

Lk = PkAT (P−k+1)

−1 (10)

ssk = sk + Lk

(ssck+1 − s−k+1

)(11)

ssck is obtained by projecting ssk on to the constrained space bysolving the following optimization problem [15]

ssck = arg min (s−ssk)T (s−s sk), subjected to Hs = xk

This gives the constrained state estimate as [15]

ssck = HT (HHT )−1(Hssk − xk) (12)

4. Missing samples estimation usingconstrained Kalman smoother

We model the articulatory system by an equality constrained lin-ear dynamical system with the EMA data vector as the observa-tion. In the case of the data vector with missing samples, wemodel the known samples as the observation from the equal-ity constrained linear dynamical system by selecting only theknown samples from the entire data vector. Let xp

k be the vectorof known samples at the time index k and it is represented by amatrix vector equation as xp

k = Wxk, where xk is the full ob-servation vector. W is a |xp

k|× |xk|matrix obtained by removingrows whose indices correspond to the missing samples’ indicesfrom a |xk| × |xk| identity matrix.

Thus, at the time index k the equality constrained linear dy-namical system representing the articulatory dynamics can berepresented as

sk = Ask−1 + wk−1 (13)

xpk = Wxk = WHsk = Hsk, where H = WH (14)

ConstrainedKalman smoother

( withmissingsamples)

( withoutmissingsamples)

- - A, H, Q

?

ML Estimation ofsystem ((13)-(14))

parameters

Observationmodel- -

?xk

xk = Hssck

?

Training data

Test datassck

Figure 1: Missing sample estimation using constrained Kalmansmoother

Figure 1 shows the block diagram of the proposed ap-proach for missing sample estimation using constrained Kalmansmoother. It consists of two main components: 1) estimationof the system parameters from the training data 2) estimation ofthe missing samples using constrained Kalman smoother. Thesystem parameters are estimated from the training data usingmaximum likelihood (ML) criterion. Note that the training datadoes not contain any missing samples. From the test data con-taining missing segments, the state estimate is obtained using

717

constrained Kalman smoother ((5)-(12)) using the system pa-rameters obtained from the training data. The estimates of themissing samples are obtained from the estimated states using theobservation model.

4.1. ML estimation of system parameters

We use the Expectation Maximization (EM) algorithm [17], typ-ically used for estimating parameters of a general linear dy-namical system [18, 19] to obtain the maximum likelihood esti-mates of the equality constrained system’s parameters (A,H,Q).Since we assume that the equality constraint holds for all theobservations, we don’t have to find the estimate of R. LetX =

[x1, x2 . . . xN

]be the set of all observations and S =[

s1, s2, . . . sN]

be the set of all states corresponding to the ob-servations, then ML estimates of the parameters are obtained bymaximizing the likelihood function p

(S,X|A,H,Q

). The steps

in the ML estimation of parameters are summarized in Algo-rithm 1.

Algorithm 1 Algorithm for the ML estimation of equality con-strained linear dynamical system parameters

Initialize the system parameters A, H, QThe number of states= K + 1for i = 1 to ITER do

Expectation stepfor k = 1 to K + 1 do

Obtain sk, Pk, P−k+1 using (5)-(9)

end forfor k = K + 1 to 1 do

Obtain ssk using (10)-(11)and sk, Pk, P−k+1

end forMaximization step

A =( K∑

k=1

ssk+1ssTk)( K∑

k=1

sskssTk)−1

H =( K∑

k=1

xkssTk)( K∑

k=1

sskssTk)−1

Q =1

K

( K∑k=1

ssk+1ssTk+1 − ssk+1ssTk AT−

AsskssTk+1 + Assk+1ssTk AT)

end for

In the ML estimation, the parameters are initialized with ran-dom values. In the maximization step, we use the unconstrainedstate estimate obtained from (11) for the estimation of parame-ters. We find ITER = 10 to be sufficient for convergence.

4.2. Missing samples estimation

Once the system parameters are learnt, the missing segment froma test observation sequence is estimated by first estimating statesusing constrained Kalman smoother. In the Kalman filteringstage, sk from (5) is updated using xp

k in (7). Thus, the cor-rection steps of the Kalman filter ((7)-(9)) are used by replacingxk, H with xp

k and H respectively.In the backward pass of Kalman smoother, the state esti-

mated using (10) and (11) is projected in to the constrained spaceusing (12) by replacing xk with xp

k and H with H. Finally, the es-timate of the missing samples is obtained from the state estimateusing the observation model as follows.

xk = Hssck (15)

5. Experiments and results5.1. Experimental setup

We compare the performance of the proposed equality con-strained Kalman smoother (ECKS) with the MMSE and MAPCapproaches. For this purpose, we reconstruct the artificiallyblacked-out portion of articulators’ trajectories at random lo-cations. The blacked-out portion is treated as the missing seg-ment and estimated from the remaining known portions. Toevaluate the effect of the missing segment duration in the per-formance, we consider four different missing segment durations(MSD) 100ms, 200ms, 400ms and 800ms in the estimation pro-cess. Since we have both X and Y trajectories for each sensor,to simulate the sensor failure we black out both X and Y trajec-tories of one sensor and estimate from the remaining 6 sensors’data. We consider a five-fold cross validation set-up using 368sentences for training and remaining 92 as the test set in eachfold. The performance of the constrained Kalman smoother iscompared with that of MMSE and MAPC estimators. Note thatdifferent missing sample estimators utilize the dynamics of artic-ulatory trajectories in various ways. For example, ECKS exploitsthe dynamics of the system by modeling it as a linear dynami-cal system but the MMSE estimator does not use any dynamicalconstraint for the estimation process. The MAPC estimator usesthe dynamics of the trajectory whose missing segment are to beestimated in the form of continuity constraint.

Figure 2: RMSE (average± one standard deviation) of estima-tion for different articulators for different MSD for male speaker.

For the proposed missing sample estimation using ECKS,

718

we choose the dimension of the state vector as 14. In MMSE andMAPC estimator, the joint distribution is modeled using GMMwith 8 components and MAPC estimator uses a high-pass filterof order 30 with cut-off frequency of 20Hz as in [8].

5.2. Results and discussions

We use the root mean square error (RMSE) to evaluate the per-formance of various approaches. Figure 2 and Figure 3 showthe average RMSE of reconstructed trajectories using differentapproaches for the MSD of 100ms, 200ms, 400ms, 800ms formale and female speakers respectively. It is clear from the fig-ures that for most of the MSDs, the proposed ECKS gives betterestimate compared to the MMSE estimate in terms of averageRMSE. This could be because the articulatory dynamics couldbe well captured by the linear dynamical system modeling in theECKS but no such dynamic modeling is done in the case of theMMSE estimator. For the male speaker the ECKS gives a per-

Figure 3: RMSE (average± one standard deviation) of esti-mation for different articulators for different MSD for femalespeaker.

formance improvement of 70.9%, 47.9%, 26.3% and 12.5% overMMSE estimate for MSD 100ms, 200ms, 400ms and 800ms re-spectively. Compared to MAPC estimate ECKS gives an im-provement of 29.5%, 16.9% and 7.5% for MSD 200ms, 400msand 800ms respectively. But in the case of 100ms the perfor-mance decreases by 40.4%. In the case of the female speakerthe performance benefits are 75.0%, 51.9%, 30.6%, 14.5% overthe MMSE estimate and -40.2%, 34.2%, 21.0% , 9.9% over theMAPC estimate. The information about the missing segment

could be obtained either from the known articulatory trajectoriesduring missing portion (due to inter-articulator correlation) orfrom the known portion of the trajectories which contain miss-ing segments ( due to the continuity in trajectory ). While ECKSutilizes the former, MAPC exploits the later. It could be thatfor small MSD (e.g., 100ms), the benefit of using the informa-tion from known trajectories during the missing segment is lessthan that from the known portion of the trajectory which containsmissing segment.

Figure 4: Reconstructed and original articulatory trajectoriesfor different utterances from the female speaker of MOCHAdatabase.

Figure 4 illustrates the original and reconstructed trajecto-ries of some articulators estimated using different approaches fora duration of 200ms in the case of female speaker. From the fig-ure we can see that ECKS and MAPC estimates form smoothtrajectories compare to the MMSE estimates.

6. ConclusionsWe have proposed an equality constrained Kalman smootherfor missing samples estimation in the EMA data. It is foundthat, on average, the proposed approach performs better than theMMSE and the MAPC estimator for MSD of 200ms, 400ms and800ms. But in the case of small missing segment length suchas 100ms, MAPC gives a better estimate compared to the con-strained Kalman smoother. As a future work it will be interestingto compare the performance of the constrained Kalman smootherand the hidden Markov model (HMM) [20] for the missing sam-ple estimation task.

719

7. References[1] J. Westbury, P. Milenkovic, G. Weismer, and R. Kent, “X-ray mi-

crobeam speech production database,” The Journal of the Acousti-cal Society of America, vol. 88, p. S56, 1990.

[2] A. A. Wrench, “A multi-channel/multi-speaker articulatorydatabase for continuous speech recognition research.” Phonus.,vol. 5, pp. 1–13, 2000.

[3] Y. Laprie and M.-O. Berger, “Extraction of tongue contours in x-ray images with minimal user interaction,” in Spoken Language,1996. ICSLP 96. Proceedings., Fourth International Conferenceon, vol. 1, 1996, pp. 268–271.

[4] J. Westbury, P. Milenkovic, G. Weismer, and R. Kent, “X-ray mi-crobeam speech production database,” The Journal of the Acousti-cal Society of America, vol. 88, p. S56, 1990.

[5] E. Bresch, Y.-C. Kim, K. Nayak, D. Byrd, and S. Narayanan, “See-ing speech: Capturing vocal tract shaping using real-time magneticresonance imaging,” IEEE Signal Processing Magazine, vol. 25,no. 3, pp. 123–132, 2008.

[6] P. W. Schonle, K. Grabe, P. Wenig, J. Hohne, J. Schrader, andB. Conrad, “Electromagnetic articulography: Use of alternatingmagnetic fields for tracking movements of multiple points insideand outside the vocal tract,” Brain and Language, vol. 31, no. 1,pp. 26–35, 1987.

[7] C. Qin and M. A. Carreira-Perpinan, “Estimating missing data se-quences in x-ray microbeam recordings.” in Proc.INTERSPEECH,2010, pp. 1592–1595.

[8] P. Sujith and P. K. Ghosh, “Maximum a-posteriori estimation ofmissing samples with continuity constraint in electromagnetic ar-ticulography data.” in ICASSP, 2014.

[9] Q. Fang, J. Wei, F. Hu, A. Li, and H. Wang, “Estimating the posi-tion of mistracked coil of EMA data using GMM-based methods,”in Signal and Information Processing Association Annual Summitand Conference (APSIPA), 2013 Asia-Pacific. IEEE, 2013, pp.1–4.

[10] P. K. Ghosh and S. Narayanan, “A generalized smoothness cri-terion for acoustic-to-articulatory inversion,” The Journal of theAcoustical Society of America, vol. 128, pp. 2162–2172, 2010.

[11] S. Sarkka, A. Vehtari, and J. Lampinen, “Time series predictionby Kalman smoother with cross-validated noise density,” in Neu-ral Networks, 2004. Proceedings. 2004 IEEE International JointConference on, vol. 2. IEEE, 2004, pp. 1653–1657.

[12] R. E. Kalman, “A new approach to linear filtering and predictionproblems,” Journal of basic Engineering, vol. 82, no. 1, pp. 35–45,1960.

[13] G. Welch and G. Bishop, “An introduction to the Kalman filter,”1995.

[14] J. De Geeter, H. Van Brussel, J. De Schutter, and M. Decreton,“A smoothly constrained Kalman filter,” Pattern Analysis and Ma-chine Intelligence, IEEE Transactions on, vol. 19, no. 10, pp.1171–1177, 1997.

[15] D. Simon, “Kalman filtering with state constraints: a survey oflinear and nonlinear algorithms,” Control Theory & Applications,IET, vol. 4, no. 8, pp. 1303–1318, 2010.

[16] D. Simon and T. L. Chia, “Kalman filtering with state equality con-straints,” Aerospace and Electronic Systems, IEEE Transactionson, vol. 38, no. 1, pp. 128–136, 2002.

[17] A. P. Dempster, N. M. Laird, D. B. Rubin et al., “Maximum like-lihood from incomplete data via the EM algorithm,” Journal of theRoyal statistical Society, vol. 39, no. 1, pp. 1–38, 1977.

[18] V. Digalakis, J. R. Rohlicek, and M. Ostendorf, “ML estimation ofa stochastic linear system with the EM algorithm and its applica-tion to speech recognition,” Speech and Audio Processing, IEEETransactions on, vol. 1, no. 4, pp. 431–442, 1993.

[19] R. H. Shumway and D. S. Stoffer, “An approach to time seriessmoothing and forecasting using the EM algorithm,” Journal oftime series analysis, vol. 3, no. 4, pp. 253–264, 1982.

[20] J. A. Gonzalez, A. M. Peinado, N. Ma, A. M. Gomez, andJ. Barker, “MMSE-based missing-feature reconstruction with tem-poral modeling for robust speech recognition,” Audio, Speech, andLanguage Processing, IEEE Transactions on, vol. 21, no. 3, pp.624–635, 2013.

720

missing samples estimation in electromagnetic ... · gorithm, forward step uses the kalman ﬁlter...

Documents