recursive pca for adaptive process monitoringciecfie.epn.edu.ec/wss/virtualdirectories/80... · t...

Recursive PCA for adaptive process monitoring

Weihua Li, H. Henry Yue, Sergio Valle-Cervantes, S. Joe Qin *

Department of Chemical Engineering, The University of Texas at Austin, Austin, TX 78712, USA

Abstract

While principal component analysis (PCA) has found wide application in process monitoring, slow and normal process changesoften occur in real processes, which lead to false alarms for a ®xed-model monitoring approach. In this paper, we propose tworecursive PCA algorithms for adaptive process monitoring. The paper starts with an e�cient approach to updating the correlation

matrix recursively. The algorithms, using rank-one modi®cation and Lanczos tridiagonalization, are then proposed and theircomputational complexity is compared. The number of principal components and the con®dence limits for process monitoring arealso determined recursively. A complete adaptive monitoring algorithm that addresses the issues of missing values and outlines is

presented. Finally, the proposed algorithms are applied to a rapid thermal annealing process in semiconductor processing foradaptive monitoring. # 2000 Elsevier Science Ltd. All rights reserved.

Keywords: Recursive principal component analysis; Adaptive process monitoring; Rank-one modi®cation; Lanczos tridiagonalization

1. Introduction

Principal component analysis (PCA) has been success-fully applied to the monitoring of industrial processes,including chemical and microelectronics manufacturingprocesses [1,2]. Formulated as a multivariate statisticalprocess control (MSPC) task, these applications use PCAto extract a few independent components from highlycorrelated process data and use the components tomonitor the process operation. Typically, two majormonitoring indices are calculated, the squared predic-tion error (SPE) and the Hotelling T2 index. An abnor-mal situation will cause at least one of the two indices toexceed the control limits. Multi-way PCA, as a variationof PCA, was successfully applied to batch monitoring, inwhich the data have three dimensions: batches, variables,and samples in a batch [3].A major limitation of PCA-based monitoring is that the

PCA model, once built from the data, is time-invariant,while most real industrial processes are time-varying [4].The time-varying characteristics of industrial processesinclude: (i) changes in the mean; (ii) changes in the var-iance; and (iii) changes in the correlation structure

among variables, including changes in the number ofsigni®cant principal components (PCs). When a time-invariant PCA model is used to monitor processes withthe aforementioned normal changes, false alarms oftenresult, which signi®cantly compromise the reliability ofthe monitoring system.Industrial processes commonly demonstrate slow time-

varying behaviors, such as catalyst deactivation, equip-ment aging, sensor and process drifting, and preventivemaintenance and cleaning. So far, little has been reportedon the development of adaptive process monitoring. Gal-lagher et al. [4] pointed out the need to monitor microelec-tronics manufacturing processes. Wold [5] discussed theuse of exponentially weighted moving average (EWMA)®lters in conjunctionwith PCA and PLS. Rigopoulos et al.[6] use a similar moving window scheme to identify sig-ni®cant modes in a simulated paper machine pro®le. Ran-nar et al., [7] use a hierarchical PCA for adaptive batchmonitoring in a way that is similar to EWMA based PCA.However, we still lack a recursive PCA that resembles theadaptation mechanism of recursive PLS [8].A complete recursive PCA scheme should consider

the following issues:

1. Recursive update of the mean for both covariance-based PCA and correlation-based PCA.

2. E�cient algorithms for the PCA calculation, includ-ing sample-wise update and block-wise update.

0959-1524/00/$ - see front matter # 2000 Elsevier Science Ltd. All rights reserved.

PI I : S0959-1524(00 )00022-6

Journal of Process Control 10 (2000) 471±486

www.elsevier.com/locate/jprocont

* Corresponding author. Tel.: +1-512-471-4417; fax: +1-512-471-

7060.

E-mail address: [email protected] (S.J. Qin).

3. Recursive determination of the number of principalcomponents, as the signi®cant number of compo-nents changes; and

4. Recursive determination of the con®dence limitsfor SPE and T2 in real-time to facilitate adaptivemonitoring.

Here we propose two recursive PCA (RPCA) algo-rithms to adapt for normal process changes and thusreduce false alarms. The challenge is to e�ciently makeuse of the old model to calculate the new model. Sincethe variables in PCA are usually scaled to zero meanand unit variance, changes in the variable mean andvariance complicate the recursive computation. While inrecursive regression, such a recursive PLS in Qin [8], onecan simply use a bias term to adapt for the mean chan-ges, the same technique does not work for recursivePCA. Let xo 2 Rm be a vector of sensor measurementswith possibley changing means. A recursive PLS schemecan use the augmented vector �xoT 1� as a regressor toadapt for the e�ect of the time-varying mean [8]. How-ever, this augmentation in recursive PCA does not yielda model that is consistent with one from mean-centereddata. For example, consider as a limiting case a processwith only one variable whose mean is shifted from 0 to0.5. As shown in Fig. 1(a), a proper adaptation of themean will result in a PCA model with one componentthat coincides with the x01 axis. With the augmenteddata as shown in Fig. 1(b), two PCs are required. The

®rst PC goes through the origin; the second PC must beorthogonal to the ®rst. However, neither captures thedata variation correctly.We o�er an e�cient recursive calculation for the cor-

relation matrix with time varying mean and variance(Section 2) and propose two numerically e�cient algo-rithms (Section 3) to update the PCA model: (i) use ofrank-one modi®cation for sample-wise recursion; and(ii) use of Lanczos tridiagonalization for block-wiserecursion (Section 4). Section 5 compares the computa-tional costs of the two PCA algorithms. The recursivedetermination of the number of PCs is presented inSection 6, where the variance of reconstruction error(VRE) method, Akaike information criterion (AIC),minimum description length (MDL) and other methodsare compared. Section 7 investigates the recursivedetermination of con®dence limits for SPE and T2. Sec-tion 8 illustrates an application to a rapid thermalannealing process in semiconductor processing, and wedraw our conclusions in Section 9.

2. Recursive update for the correlation matrix

2.1. Conventional PCA

In the conventional batch-wise PCA, the raw datamatrix X0 2 Rn�m of n samples (rows) and m variables(columns) is ®rst normalized to a matrix X with zero

Fig. 1. The signi®cance of mean adaptation in RPCA.

472 W. Li et al. / Journal of Process Control 10 (2000) 471±486

mean and unit variance. Then the normalized matrix Xis decomposed as follows:

X � TPT � X~

� TPT � T~P~T � T T~

h iP P~h iT �1�

where T 2 Rn�l and P 2 Rm�l are the PC scores andloadings, respectively. The PCA projection reduces theoriginal set of variables to l principal components. The

residual matrix X~ can be further decomposed into T~P~T

if desired. The decomposition is made such that �TT~ � isorthogonal and �PP~ � is orthonormal.The correlation matrix of the variables can be

approximated as:

R � 1

nÿ 1XTX �2�

The columns of P are actually eigenvectors of Rassociated with the l largest eigenvalues, and the col-umns of P~ are the remaining eigenvectors, convertingthe calculation of the PCA model P� � to an eigenvectorproblem. If only the P matrix is needed, very e�cientnumerical algorithms based on the Lanczos method areavailable [9]. Occasionally, the covariance matrix isused to derive a PCA model. In this case, the data isscaled to zero-mean, but the variance is unscaled. Thevariance scaling a�ects the relative weighting of allvariables [10].

2.2. Recursive correlation matrix calculation

If a block of process data has been used to build aninitial PCA model, we need to update the PCA modelwhen a new block of data becomes available. Let X0

1 2Rn1�m be the raw initial data block. Then the mean ofeach column is given in the vector

b1 � 1

n1X0

1

ÿ �T1n1 �3�

where 1n1 � 1; 1;� � �; 1� �T2 Rn1 . The data scaled to zeromean and unit variance is given by

X1 � X01 ÿ 1n1b

T1

ÿ ��ÿ11 �4�

where

�1 � diag �1 �1; . . . ; �1�m� � �5�

whose ith element is the standard deviation of the ithsensor i � 1; . . . ;m� �. The correlation matrix is

R1 � 1

n1 ÿ 1XT

1X1 �6�

The new block of data is expected to augment the datamatrix and calculate the correlation matrix recursively.Assume that bk Xk and Rk have been calculated whenthe kth block of data is collected. The task for recursivecalculation is to calculate bk�1;Xk�1 and Rk�1 when thenext block of data Xn

nk�1 2 Rnk�1�m is available. Denoting

X0k�1 �

X0k

X0nk�1

� ��7�

for all the k� 1 blocks of data, the mean vector bk�1 isrelated to bk by the following relation:

Xk�1i�1

ni

!bk�1 �

Xki�1

ni

!bk � X0

nk�1

� �T1nk�1 �8�

Denoting Nk �Pk

i�1ni; Eq. (8) yields the followingrecursive calculation:

bk�1 � Nk

Nk�1bk � 1

Nk�1X0

nk�1

� �T1nk�1 �9�

The recursive calculation of Xk�1 is given by

Xk�1 � X0k�1 ÿ 1k�1bTk�1

� ��ÿ1k�1

� X0k

X0nk�1

� �ÿ 1k�1bTk�1

� ��ÿ1k�1

� X0k ÿ 1k�bTk�1 ÿ 1kb

Tk

X0nk�1 ÿ 1nk�1b

Tk�1

" #�ÿ1k�1

� Xk�k�ÿ1k�1 ÿ 1k�bTk�1�

ÿ1k�1

Xnk�1

� ��10�

where

Xk � X0k ÿ 1kb

Tk

ÿ ��ÿ1k

Xnk�1 � Xnk�1 ÿ 1nk�1bTk�1

ÿ ��ÿ1k�1

�j � diag �j�1;� � �; �j�mÿ �

; j � k; k� 1

�bk�1 � bk�1 ÿ bk

�11�

Note that 1k � 1;� � �; 1� �T2 RNk .The recursive computation of the standard deviation,

derived in Appendix A, has the following relationship,

Nk�1 ÿ 1� ��2k�1�i � Nk ÿ 1� ��2ki �Nk�b2k�1 i� �� X0

nk�1 :; i� � ÿ 1nk�1bk�1 i� � 2 �12�

where X0nk�1 :; i� � is the ith column of the associated

matrix; bk�1 i� � and �bk�1 i� � are the ith elements of theassociated vectors.Similarly, the recursive calculation of the correlation

matrix, derived in Appendix B, has the following form,

W. Li et al. / Journal of Process Control 10 (2000) 471±486 473

Rk�1 � 1

Nk�1 ÿ 1XT

k�1Xk�1

ÿ Nk ÿ 1

Nk�1 ÿ 1�ÿ1k�1�kRk�k�

ÿ1k�1

� Nk

Nk�1 ÿ 1�ÿ1k�1�bk�1�bTk�1�

ÿ1k�1

� 1

Nk�1 ÿ 1XT

nk�1Xnk�1

�13�

Regarding the above recursive relation, not that:

1. The e�ect of mean changes �bk�1 on the correla-tion matrix is only a rank-one modi®cation.

2. As a special case, if the model needs to be updatedafter every new sample is taken, the recursiverelation reduces to:

Rk�1 � kÿ 1

k�ÿ1k�1�kRk�k�

ÿ1k�1

��ÿ1k�1�bk�1�bTk�1�ÿ1k�1 �

1

kxk�1xTk�1

�14�

which is two rank-one modi®cations. Algorithms areavailable for the eigenvector calculation of rank-onemodi®cations [11], which will be elaborated later in thispaper (Section 3).

3. Often old data are exponentially ignored as theydo not represent the current process. The recursivecalculations for Eqs. (9), (12) and (13) with a for-getting factor are:

bk�1 � �bk � 1ÿ �� 1

nk�1X0

nk�1

� �T1nk�1 �15�

�2k�1�i � � �2k�i ��b2k�1 i� �ÿ �� 1ÿ �� 1

nk�1

� X0nk�1 :; i� � ÿ 1nk�1bk�1 i� �

2 �16�

and

Rk�1 � ��ÿ1k�1 �kRk�k ��bk�1�bTk�1ÿ �

�ÿ1k�1

� 1ÿ�� 1

nk�1XT

nk�1Xnk�1 �17�

for Nk >> 1. In the above relations, 0 < �4 Nk

Nk�1< 1 is

the forgetting factor. Smaller � tends to forget old datamore quickly, whereas

� � Nk

Nk�1�18�

recovers the case of no forgetting. Similar to the win-dow size in a moving window approach, the forgetting

factor is a tuning parameter that varies depending onhow fast the normal process can change.

3. RPCA by rank-one modi®cation

If one chooses to update the RPCA model after eachsample, the recursive correlation matrix becomes

Rk�1 � ��ÿ1k�1 �kRk�k ��bk�1�bTk�1ÿ �

�ÿ1k�1� 1ÿ �� xk�1xTk�1

�19�

while the change in mean is only a rank-one modi®ca-tion to the correlation matrix, the change in variance�k�1� � completely changes the eigenstructure. Note thatthe role of variance update is di�erent from that of themean update. If the mean changes but is not updated, asshown in Fig. 1, the calculated principal componentdirection p1

ÿ �can be very di�erent from the actual

principal component direction p01ÿ �

after the mean isupdated. On the other hand, the variance-update a�ectsonly the relative weighting of each variable. If the var-iance does not change dramatically, we can use theinitial variance to scale the data and do not updatevariance. For the case of covariance-based PCA, there isno need to scale the variance at all. Therefore, Eq. (19)without variance updating reduces to,

Rk�1 � � Rk ��ÿ11 �bk�1�bTk�1�ÿ11

ÿ �� 1ÿ �� xk�1xTk�1�20�

In other words, Rk�1 is modi®ed with two rank-onematrices, �ÿ11 �bk�1�bTk�1�

ÿ11 and xk�1xTk�1. Others

have studied the rank-one updating of the eigenproblemand have proposed related special algorithms in thesymmetric case [11,12].The problem of rank-one modi®cation can be sum-

marized as follows: given a diagonal matrix K 0 � diagl0l ; . . . ; l0mÿ �

; we seek the eigenpairs p; l�

of K � ��2zzT such that K � � �2zzTÿ �

pÿlp, where z is of unitnorm.This problem leads to the system

K � ÿ lI� �2zzTÿ �p � 0 �21�

The eigenvalues li for iÿ 1;� � �;m can be obtained by®nding the roots of the following secular equation [12]:

f l� � ÿ 1� �2Xmj�1

z2j

l0j ÿ l� � � 0 �22�

where zj denotes the jth entry of z. Once the eigenvaluesli are known, the corresponding eigenvectors can becalculated from Eq. (21) as:


pi �K � ÿ K iI� �ÿ1zK � ÿ K iI� �ÿ1z ; 14i4m �23�

where the vector K � ÿ liI� �ÿ1z is previously calculatedin ®nding roots of Eq. (22).To compute each pi, we need only additional 2m

multiplications; computing p1; . . . ; pm thus needs addi-tional 2m2 multiplications. Bunch et al. [11] reportedthat the average number of iterations required to ®nd asingle root is about 4.4, making the computational costof the eigenpair of K � � �2zzT very low.Using the above algorithms, we may compute the

eigenpairs of Rk�1 by applying the rank-one modi®ca-tion algorithm twice. Since Rk � PkK kP

Tk is known in

the existing model, where K k is a diagonal matrix con-taining all the eigenvalues, Eq. (20) can be rewritten asfollows,

Rk�1 � �Pk K k � �bk�1�bTk�1ÿ �

PTk

� 1ÿ �� xk�1xTk�1 �24�

where �bk�1 � PTk�ÿ11 �bk�1. After the ®rst rank-one

modi®cation,

� K k � �bk�1�bTk�1ÿ � � Pk�K k�PT

k� �25�

we obtain,

Rk�1 � PkPk�K k�PTk�P

Tk � 1ÿ �� xk�1xTk�1

� P� k�K k�P�Tk� � 1ÿ �� xk�1xTk�1� P� k� K k� � 1ÿ �� zk�1zTk�1

ÿ �P�Tk� �26�

where zk�1 � P�Tk�xk�1 and P� k� � PkPk� is an orthogo-nal matrix because Pk and Pk� are orthogonal.Applying the rank-one modi®cation algorithm again,

K k� � 1ÿ �� zk�1zTk�1 � Vk�1K k�1VTk�1 �27�

the eigendecomposition of the correlation matrix is

Rk�1 � P� k�Vk�1K k�1VTk�1P�

Tk� � Pk�1K k�1PT

k�1 �28�

where Pk�1 � P� k�Vk�1.

As shown above, computing the eigenpair of Rk�1requires two rank-one modi®cations and two matrix±matrix products, i.e. P� k� � PkPk� and Pk�1 ÿ P� k�Vk�1.The computational cost of two rak-one modi®cations isnegligible as compared with the cost of computing thetwo matrix±matrix products, which requires O 2m3

ÿ �multiplications. Therefore, based on the known eigen-structure of Rk, we require O 2m3

ÿ �multiplications to

compute the eigenstructure of Rk�1.

A major drawback of the rank-one modi®cation algo-rithm is that all the eigenpairs have to be calculated,although only a few principal eigenpairs of the correla-tion matrix are needed. This is an excessive com-putational burden. Moreover, if one needs to update themodel based on a block of data instead of a sample,a costly sequence of rank-one modi®cations (dependingon the rank of the matrix XT

nk�1Xnk�1 ) has to be applied.The following alternative method avoids these draw-backs.

4. RPCA by the Lanczos tridiagonalization

The Lanczos algorithms for symmetric matrices wereintroduced in the 1950s and are described, for instance, inGolub and Van Loan [13]. The algorithms received moreattention following Paige's results [14] and were studied indetail by Parlett [15]. The main advantages of the Lanczostridiagonalization methods come from the following:

1. The original matrix is not overwritten.2. Little storage is required since only matrix±vector

products are computed; and3. Only a few (lk) of the largest eigenvalues are calcu-

lated. Usually lk is much smaller than the dimensionof the matrix Rk.

These features make the Lanczos procedure useful forlarge matrices, especially if they are sparse.

4.1. Lanczos tridiagonalization

Applying the Lanczos procedure to the symmetric cor-relation matrixRk 2 Rm�m yields a symmetric tridiagonalmatrix C; i.e.

ÿ � �TRk�

�

�1 �1 . . . 0

�1 �2. .

. ...

..

. . .. . .

.�m1ÿ1

0 . . . �m1ÿ1 �m1

2666664

3777775�29�

where � 2 Rm�m1 is a matrix containing the orthogonalLanczos vectors and m1 < m. Although ÿ has a smallerdimension than Rk, their extreme eigenvalues areapproximately the same [8,9].Note that because of roundo� errors, a loss of ortho-

gonality among the computed Lanczos vectors occursafter some eigenpairs have already been calculated, leadingto several undesirable phenomena [64,15]. To cope withthis problem, a re-orthogonalization procedure is usuallyintroduced. Therefore, we compute the principal eigen-pairs of the correlation matrix using the tridiagonalization


algorithm with complete re-orthogonalization. The wholeprocedure includes two stages:

1. The tridiagonalization of the correlation matrix Rk

using the Lanczos algorithm with a re-orthogona-lization procedure.

2. The calculation of the principal eigenpairs of theresulting tridiagonal matrix.

A typical algorithm to compute the diagonal elements�i and the sub-diagonal elements �i for i � 1;� � �;m1 isgiven in Golub and Van Loan [8]. Approximations to theeigenvectors of Rk require the storage of the generatedLanzcos vectors and the computation of the eigenvectorsof the matrix ÿ. The algorithm terminates as soon as m(or fewer) orthogonal Lanczos vectors have been gener-ated. In each step, the complexity is dominated by amatrix±vector product.

4.2. Termination of the Lanczos algorithm

The dimension of the tridiagonal matrix ÿ;m1, whichis usually smaller than the dimension of the correlationmatrix Rk, must be determined in the Lanczos tridiago-nalization algorithm. A properly selected m1 shouldguarantee that the eigenvalues of ÿ are approximately thesame as them1 largest eigenvalues of Rk. In this paper, wepropose a practical criterion associated with the eigen-values of Rk for the termination of the Lanczos algo-rithm. It is known that trace Rk� � is equal to the sum ofall the eigenvalues of Rk. If ÿ includes all signi®canteigenvalues of Rk, trace ÿ� � will be very close to traceRk� �. Therefore, we terminate the Lanczos algorithmand determine the value of m1 if

trace ÿ� �trace Rk� �5�

where � is a selected threshold. For example, � � 0:995means that 99.5% of the variance in Rk is representedby the m1 eigenvalues.With the computed ÿ, we can calculate its ®rst few

largest eigenvalues l1;� � �; llk , which are approximatelyequal to the lk principal eigenvalues of Rk, and theassociated eigenvectors Clk � c1;� � �; clk

� �with

i 2 Rm1 for i � 1;� � �; lk. Consequently, the PCA loadingmatrix for Rk is

Plk � �Clk �30�

4.3. Computation of the principal eigenvalues

In this subsection, we present an e�cient approach tocomputing the principal eigenvalues l15l25;� � �;5llk

� of ÿ in descending order with lk4m1. Let ÿ� denote theleading �� principal sub-matrix of the tridiagonal

matrix ÿ and p� l� � � det ÿ� ÿ lI� � denote the associatedcharacteristic polynomial for � � 1;� � �;m1. A simpledeterminantal expansion [13] shows that

p� l� � � �� ÿ l� �p�ÿ1 l� � ÿ �2�ÿ1p�ÿ2 l� �; � � 2; . . . ; lk

�31�where p0 l� � � 1.Denote the upper and lower limits of l by lmax and

lmin, respectively. For any lmin4l4lmax with p0 l� � � 1,using Eq. (31), we can generate a sequence.

p0 l� �; p1 l� �; . . . ; pm1l� �� 32�

Let � l� � denote the number of sign changes in thissequence. Then, based on Gershgorin Theorem andSturm sequence property [13], we develop an algorithmto calculate the principal eigenvalues of ÿ, given inAppendix C.We evaluate the overall complexity to compute Plk as

follows:

1. The updating of the correlation matrix Rk in Eq.(19) requires nkm

2 � 4m2 � 4m multiplications.2. The Lanczos tridiagonalization to compute ÿ and

� requires 4m2m1 ÿ 1:5mm21 � 6:5mm1 ÿm2 ÿ 2m2

1

multiplications [13].3. The computation of lk eigenvalues requires 3m1lks

multiplications, where s is the number of iterationsrequired to ®nd an eigenvalue and is evaluated inAppendix C.

4. The computation of Plk from � and Clk requiresmm1lk multiplications.

Therefore, the overall complexity is approximately equalto nkm

2 � 4m2m1 ÿ 1:5mm21 �mm1lk � 3m1lks multi-

plications, where lk4m14m. Obviously, the complexityreaches its maximum value nkm

2 � 2:5m3 �m2lk � 3mlkswhen m1 � m.

5. Comparison of computational complexity

In this section, we compare the computational costsfor rank-one modi®cation, Lanczos tridiagonalization,and the standard singular value decomposition (SVD)algorithm for RPCA calculation. For the block-wisecase, since XT

nkXnk can be decomposed into a sum of nk

rank-one matrices, using rank-one modi®cation requiresnk � 1� �m3 multiplications to compute the PCA loadingmatrix. In this case, the RPCA by Lanczos tridiagonaliza-tion is much more e�cient. Because it is recognized as oneof the most e�cient algorithms among the SVD algo-rithms [13], we choose for comparison the symmetric QR,which also includes two steps: (1) the tridiagonalizationof the correlation matrix Rk; (2) calculation for theeigenpairs of the computed tridiagonal matrix. Based on


Golub and Van Loan [13] (Algorithm 8.3.1), the tridia-gonalization of Rk sing the symmetric QR algorithmneeds 8

3m3 multiplications. As pointed out previously,

even in the worst case m � m1� �, tridiagonalizing Rk

using the Lanczos approach needs 2.5m3 multiplications.The computational costs can be further reduced if thecomplete re-orthogonalization of the Lanczos vectors isreplaced by partial re-orthogonalization [14]. In addi-tion, the second step of the symmetric QR algorithmrequires m2lk � 3mlks. Therefore, the Lanczos tridiago-nalization approach for RPCA is more e�cient thanSVD based on the symmetric QR algorithm. The overallcomplexity using these three approaches is listed inTable 1.In the special case of sample-wise updating, i.e. nk � 1,

computing the PCA loading matrix Plk using the RPCAby Lanczos tridiagonalization needs approximately2:5m2m1

ÿ �multiplications, while the rank-one mod-

i®cation requires O 2m3ÿ �

multiplications. We havecompared the computational costs among the afore-mentioned three RPCA approaches when they areapplied to correlation matrices with di�erent dimen-sions. As depicted in Fig. 2, RPCA by rank-one mod-i®cation is the most e�cient, the RPCA by Lanczostridiagonalization is the second most e�cient, and thestandard SVD is most costly. The Lanczos algorithmcan be more e�cient when there are many variables thatare highly correlated.

6. Recursive determination of the number of PCs

Since the number of signi®cant principal componentscan change over time, it is necessary to determine thisnumber recursively in RPCA modeling. There are manyways of determining the number of PCs in batch-wisePCA, including:

. Autocorrelation [17]

. Cross-validation [18,19]

. Cumulative percent variance [20]

. Scree test [21]

. Average eigenvalues

. Imbedded error function [20]

. Xu and Kailath's approach [9]

. Akaike information criterion [22,23]

. Minimum description length criterion [23,24]

. Variance of reconstruction error [25]

However, not all the approaches are suitable for recur-sive PCA. For example, the cross-validation approach isnot suitable because old data are not representative for thecurrent process. Therefore, we consider only the followingmethods to determine the number of PCs recursively.

1. Cumulative percent variance (CPV): The CPV is ameasure of the percent variance captured by the ®rst lkPCs:

CPV lk� � ��lk

j�1lj�m

j�1lj100%

Since only the ®rst lk eigenvalues are calculated in theblock-wise RPCA, we use the following formula:

CPV lk� � ��lk

j�1ljtrace Rk� � 100% �33�

The number of PCs is chosen when CPV reaches apredetermined limit, say 95%.

2. Average eigenvalue (AE): The AE approach selectsthe eigenvalues greater than the mean of all eigenvaluesas the principal eigenvalues and discards those smallerthan the mean. Therefore, for Rk, only the eigenvaluesgreater than

trace Rk� �m

are used as the principal eigenvalues.

3. Imbedded error function (IEF). The IEF is given by:

IEF lk� � ��

lk�1Nkm mÿ lk� �

s�34�

where

�1 �Xmj�l

k�1

lj � trace Rk� � ÿXlkj�1

lj

� trace ÿ� � ÿXlkj�1

lj �35�

and Nk is the length of the data set used to compute Rk.If forgetting factor � is employed, Nk is estimated fromEq. (18) as

Nk � nk1ÿ�

If the noise in each variable has the same variance,IEF(lk) will have a minimum corresponding to the num-ber of PCs. Note that IEF works for covariance-basedPCA only.

Table 1

Complexity of the RPCA computations

Approach used Overall complexity

Rank-one modi®cation nk � 1� �m3

Lanczos tridiagonalization nkm2 � 4m2m1 ÿ 1:5mm2

1

�mm1lk � 3m1lks

The symmetric QR algorithm nkm2 � 8

3m3 �m2lk � 3mlks


4. Xu and Kailath's approach: Xu and Kailath [9]determine the number of PCs for a covariance matrixbased on a few of the largest eigenvalues. This approachalso assumes that the smallest eigenvalues of the covar-iance matrix Rk are equal. If the remaining eigenvaluesafter lk are identical,

�lk � log

��2

mÿ lk

r�1

mÿ lk

0BB@1CCA

should be approximately zero, where

�2 �Xmj�lk�1

l2j � trace R2k

ÿ �ÿXlkj�1

l2j � trace ÿ2ÿ �ÿXlk

j�1l2j

�36�

Xu and Kailath [9] showed that if the observation vec-tor xk is sampled from a zero-mean multivariate normal

process, then Nk mÿ lk� ��lk will be asymptotically chi-square distributed with 1

2 mÿ lk� � mÿ lk � 1� � ÿ 1 degreesof freedom. Therefore, with a chosen level of signi®cance�, we can select the threshold �2� for �l. If Nk mÿ lk� ��l4�2#, we accept the null hypothesis that l � lk. Thisapproach is valid only for covariance-based PCA.

5. Information theoretic criteria: Wax and Kailath [23]propose to determine the number of PCs for a covariancematrix based on AIC [22] and MDL [24] criteria. Thisapproach assumes that the observation vector xk issampled from a zero-mean multivariate normal processand the covariance matrix has repeated smallest eigenva-lues. Wax and Kailath [23] propose a practical criterionand show that the following two functions will be mini-mized when lk is equal to the number of PCs.

AIC lk� � � ÿ2log P mj�lk�1l

1mÿlkj

�1mÿ lk

0BB@1CCA

mÿlk� �Nk

�2M

Fig. 2. Comparison of computational costs in the sample-wise case.


MDL lk� � � ÿ2logP m

j�lk�1l1

mÿlkj

�1mÿ lk

0BB@1CCA

mÿlk� �Nk

�Mlog Nk� �

where

�mj�lk�1lj �

det Rk� ��lk

j�1lj� det ÿ� �

�lkj�1lj

For real valued signals,

M � lk 2mÿ lk � 1� �2

and for complex valued signals,

M � lk 2mÿ lk� �

Wax and Kailath [23] point out that MDL gives aconsistent estimate but AIC usually overestimates thenumber of PCs when Nk is very large. In addition, AIC

and MDL criteria are valid only for covariance-basedPCA and fail for rank de®cient covariance matrices, e.g.more variables than samples.

6. Variance of reconstruction error. Qin and Dunia [25]propose selecting the number of PCs based on the bestreconstruction of the process variables. An importantfeature of this approach is that the index has a mini-mum corresponding to the best reconstruction and isvalid for both covariance and correlation-based PCA,no matter how the observation vector xk is distributed.The VRE method assumes that each sensor (or group ofsensors) is faulty and reconstructed from other sensorsbased on the PCA model, Plk . The variance of thereconstruction error for the ith sensor is [26]:

ui � rk i; i� � ÿ 2cTk :; i� �rk :; i� � � cTk :; i� �Rkck :; i� �1ÿ ck i; i� �� 2 ;

i � 1;� � �;m�37�

where ck :; i� � and ck i; i� � are the ith column and iithelement of matrix PlkP

Tlk, respectively. Similarly, rk :; i� �

Fig. 3. SPE, and T2 calculated from a ®xed model.


and rk i; i� � are the ith column and iith element of matrixRk, respectively. The VRE is de®ned to account for allsensors,

VRE lk� � �Xmi�1

uivar xi� � �38�

where var(xi) stands for the variance of the ith elementof the observation vector. Therefore, VRE(lk) can becalculated recursively using only the eigenvalues andeigenvectors of Rk up to lg. The VRE method selects thenumber of PCs that gives a minimum VRE.

7. Recursive process monitoring

When PCA is used to monitor an industrial process,the squared prediction error Q and the Hotelling T2 areusually used. For slowly time-varying processes, thecon®dence limits for those detection indices will changewith time, making adaptation of these limits necessaryfor on-line monitoring. In this section, we discuss the

recursive calculation of the con®dence limits for thesetwo statistics. Q and T2 using the RPCA models. Sinceour objective is to use recursive PCA for adaptive pro-cess monitoring in real time, we must consider the pre-sence of outliers and missing values. A completemonitoring algorithm that handles these practical issueswill be discussed in this section.

7.1. Adaptation of the alarm thresholds

The Q statistic is given by:

Qk � xT Iÿ PlkPTlk

� �x �39�

where x is a new observation vector to be monitored.The Q statistic indicates the extent to which each sampleconforms to the PCA model. It is a measure of theamount of variation not captured by the principalcomponent model.Based on the work by Jackson and Mudholkar [27],

the upper limit for the Q statistic is given by,

Fig. 4. The recursively calculated SPE and T2.


Qk�� 1��

��2�2h20

q�1

� 1� �2h0 h0 ÿ 1� ��21

24 351h0

�40�

where

h0 � 1ÿ 2�1�33�22

�41�

and �� is the normal deviate corresponding to the upper1ÿ �� percentile. The calculation of �1 and �2 has beenshown in Eqs. (35) and (36), respectively. Further,

�3 �Xmi�lk�1

l3l � trace R3k

ÿ �ÿXlkj�1

l3j

� trace ÿ3ÿ �ÿXlk

j�1l3j �42�

In the recursive implementation, �1; �2; �3 and h0 areupdated using the lk largest eigenvalues after each newdata block, making the limit Qk��, time-varying.The T2 statistic is de®ned by

T2k � xTPlk�

ÿ1lkPT

lkx �43�

where

�lk � diag l1;� � �; llk

ÿ �follows a chi-square distribution with lk degrees of free-dom [10]. Given a level of signi®cance �, the upper limit fortheT2 statistic is �2� lk� �, which also needs to be updated foreach new data block because lk can be time-varying.

7.2. A complete adaptive monitoring scheme

To make the proposed recursive PCA algorithmsuseful in real time, we propose an adaptive monitoringscheme that addresses the issues of missing values andoutliers, frequently seen in industrial processes with alarge number of sensors.Outliers are invalid sensor readings that violate the

normal PCA model. Therefore, we can detect outliersusingQ and T2 indices before the PCA model is updated.

Fig. 5. SPE and 2 calculated from a ®xed model for 1000 wafers.


The sensor reconstruction and identi®cation methods inDunia et al. [26] are easily used to di�erentiate anoccasional outlier (invalid sensor reading) from anabnormal process fault. If there are missing values in thecurrent measurement, the same reconstruction algo-rithm in Dunia et al. [26] can reconstruct the missingvalues, then use the reconstructed measurement forprocess monitoring.With the tolerance limits Qk�a and �2� lk� � and the com-

puted PCA loading matrix Plk a complete procedure foradaptive process monitoring, including normal dataadaptation and fault detection, can be implemented inreal-time, as follows:

1. For an initial data block (kÿ 1), calculate Plk ; lk;Qk�a and �2� lk� �.

2. Collect new data samples. If there are missingvalues in the samples, reconstruct them using thealgorithms in Dunia et al. [26].

3. Calculate Qk from Eqs. (39) and (43). If Qk > Qk�aor T2

k > �2� lk� �, conduct sensor reconstruction and

identi®cation to see if the out-of-limit is due tooutliers or sensor faults. If they are outliers,reconstruct them using Dunia et al. [26] and accu-mulate them until a new block size is reached(nk�1). Otherwise, the out-of-limit is due to anabnormal process situation. The model updating isterminated and a process alarm is triggered.

4. Update the PCA model based on the new datablock if no process alarms have occurred. Calcu-late Plk�1;Qk�1�� and �2� lk�1� �. Set k � k� 1and goto step 2.

A potential adaptation problem is that the principalcomponent direction could change directions withminor perturbations in the data [5]. This can happenbecause of the bilinear decomposition, X � TPT � E �ÿT� � ÿP� �T�E, which can switch the signs of the load-ings and scores simultaneously. While this switchingcould introduce instability if one monitors the scores inan SPC chart, it does not a�ect the results if one usesfault directions and a subspace approach [25,26].

Fig. 6. The recursively calculated SPe and T2 for 1000 wafers using a forgetting factor �ÿ0.995.


8. Application to a rapid thermal annealing process

To illustrate the recursive PCA and adaptive monitor-ing method, data from a rapid thermal annealing (RTA)process in semiconductor processing are used. The objec-tive of this application is to build a RPCAmodel to adaptfor slow, normal process changes and detect abnormaloperations. The RTA processes experience normal driftsfrom wafer to wafer due to machine cleaning and con-tamination build-up. RTA is also subject to frequentoperational errors, which usually result in o�-speci®cationproducts. The RTA operation is a batch process that lastsabout 2 min. Six process measurements including tem-perature, ¯ow and power are relevant to the monitoringtask. Due to the batch nature of this process, the RPCAmodel is a three-way PCA that augments the batch timedimension as variables. Overall 1000 wafers (batches) ofdata are used to demonstrate how to update the RPCAmodels, determine the number of PCs, and calculate thecon®dence limits for SPE and T2 recursively.The ®rst experiment emulates the case of limited sam-

ples to initialize the monitoring procedure. An initial

PCA model is built using the ®rst 50 wafers. When thisinitial PCA model is used to monitor the process and notupdated with new data blocks (Fig. 3), the calculated SPEand T2 frequently exceed their respective con®dence limitseven though the process is operating normally. Fig. 4shows the SPE, and T2 calculated from the recursivelyupdated PCAmodel as well as the 95 and 99% con®dence.Wafers 23 and 48, outside theT2 limit, are deleted from theinitialmodeling.During the recursive updating,wafer 61 isoutside the SPE limit and wafer 71 is outside the T2 limit.For all other wafers, both indices are within their con-®dence limits, signi®cantly eliminating false alarms.The second experiment proceeds with the monitoring of

the entire 1000 wafers. From here on, the RPCA modeluses a forgetting factor of 0.995 to ignore old data. The®rst 220 wafers are used to build the initial PCA model,then the model is updated using the block-wise Lanczosalgorithm, every ®ve wafers based on the remaining 780wafers. The monitoring results without model updatingappear in Fig. 5, and those with RPCA model updatingare shown in Fig. 6. Without model updating, the SPEand T2 demonstrate an obvious trend that tends to violate

Fig. 7. The recursively calculated number of Pcs using the VRE method.


the control limits. The results with model updating e�ec-tively capture this trend in the model and signi®cantlyreduce false alarms. Notice that in Fig. 6 the control limitsare also updated based on new data. The number of sig-ni®cant principal components is also calculated recur-sively using the VREmethod and is shown in Fig. 7, where156 model updating iterations (780 wafers) are used. Thenumber of PCs calculated during the ®rst 20 iterations¯uctuates because the associated data (from wafer 220to 300) change abruptly. A snapshot of the VRE in onemodel iteration is shown in Fig. 8. Although the VREvalue calculated by Eq. (38) reaches a minimum whenthe number of PCs is equal to 21, the VRE is rather ¯atnear the minimum. Therefore, slight ¯uctuations ofVRE will change the number of PCs.

9. Conclusions

Two recursive PCA algorithms based on rank-onemodi®cation and Lanczos tridiagonalization are proposedto adapt for normal process changes such as drifting. The

number of PCs and the con®dence limits for processmonitoring are also calculated recursively. The applica-tion of the proposed algorithms to a rapid thermal processdemonstrates the feasibility and e�ectiveness of the recur-sive algorithms for adaptive process monitoring. As mostindustrial processes experience slow and normal changessuch as equipment aging, sensor drifting, catalyst deacti-vation, and periodic cleaning, the adaptive monitoringscheme is expected to have broad applicability in industry.

Acknowledgements

Financial support for this project from National Sci-ence Foundation under CTS-9814340, DuPont andAdvanced Micro Devices is gratefully acknowledged.

Appendix A. Updating the standard deviation

We follow the notations used in Section 2.2. All the k+1blocks of data, after beingmean conferred, are described by

Fig. 8. The number of PCs is calculated using the VRE method.


X0k�1 ÿ 1k�1bTk�1 �

X0k ÿ 1k�bTk�1 ÿ 1kb

Tk

X0nk�1 ÿ 1nk�1b

Tk�1

" #

The standard deviation for the ith variable is

�2k�1�i ÿX0

k :; i� � ÿ 1k�bk�1 i� � ÿ 1kbk i� �X0

nk�1 :; i� � ÿ 1nk�1bk�1 i� � 2

Nk�1 ÿ 1

and for the ith standard deviation, we are led to

Nk�1 ÿ 1� ��2k�1�i � X0k :; i� � ÿ 1kbk i� � 2�Nk�b2k�1 i� �

� X0nk�1 :; i� � ÿ 1nk�1bk�1 i� �

2where two relationships, 1Tk X0

k :; i� � ÿ 1kbk i� �ÿ � � 0 and1Tk1k � Nk have been applied.Further, consider the de®nition that

�2k�i Nk ÿ 1� � � X0k :; i� � ÿ 1kbk i� � 2

we have Eq. (12) instantly.

Appendix B. Updating the correlation matrix

Substituting Xk�1 given by Eq. (10) into the de®nition

Rk�1 � 1

Nk�1 ÿ 1XT

k�1Xk�1

directly yields

Nk�1 ÿ 1� �Rk�1 � �ÿ1k�1�kXTkXk�k�

ÿ1k�1

� 1Tk1k�ÿ1k�1�bk�1�bTk�1�

ÿ1k�1

ÿ 2�ÿ1k�1�bk�11TkXk�k�ÿ1k�1 � XT

nk�1Xnk�1

Using the facts that

1TkXk � 0

1Tk1k � Nk

Nk ÿ 1� �Rk ÿ XTkXk

in the above equation gives Eq. (13) immediately.

Appendix C. Computing the principal eigenvalues of C

Given a tridiagonal matrix ÿ 2 Rm1�m1 and an errorlimit ", e.g. " � 10ÿ10, the following algorithm calculatesthe lk largest eigenvalues of ÿ with l15l25 . . . ;5llk

lmin � max min �i ÿ �i�� ÿ �iÿ1

�� ÿ �; 0

� lmax � min max �i � �i

�� iÿ1�� ÿ ��

;trace ÿ� �g; with 14i4m1

for i � 1 to lkx1 � lmin; x2 � lmax

while x1 ÿ x2j j > " x1j j � x2j j� �lÿ x1 � x2� �=2calculate a l� �if a l� � > m1 ÿ i

x2 � lelse

x1 � lend

endlmax � l

end

In this algorithm, the bisection approach is used tosearch for an eigenvalue, and the search process will beterminated after s steps if

lmax ÿ lmin

2s� "

or equivalently

s � log lmax ÿ lmin� � ÿ log"

log2

Since lmax ÿ lmin4lmax 4 trace Rk�1� � � m, it is clearthat

s4log m� � ÿ log"

log2:

References

[1] B.M. Wise, N.L. Ricker, Recent advances in multivariate statistical

process control: Improving robustness and sensitivity, in: Proceed-

ings of the IFAC, ADCHEM Symposium, 1991, pp. 125±130.

[2] J.F. MacGregor, M. Koutodi, Statistical process control of mul-

tivariate process, Control Engineering Practice 3 (1995) 403±414.

[3] P. Nomikos, J.F. MacGregor, Multivariate SPC charts for mon-

itoring batch processes, Technometrics 37 (1) (1995) 41±59.

[4] V.B. Gallagher, R.M. Wise, S.W. Butler, D.D. White, G.G.

Barna, Development and benchmarking of multivariate statistical

process control tools for a semiconductor etch process; improving

robustness through model updating, in: Proc. of ADCHEM 97,

Ban�, Canada, 9±11 June 1997, pp. 78±83.

[5] S. Wold, Exponentially weighted moving principal component

analysis and projection to latent structures, Chemometrics and

Intelligent Laboratory systems 23 (1994) 149±161.

[6] A. Rigopoulos, Y. Arkun, F. Kayihan, E. Hanezyc, Identi®cation

of paper machine full pro®le disturbances models using adaptive

principal component analysis, in: Chemical Process Control Ð V.

Tahoe, 7±12 January 1996, pp. 275±279.

[7] S. Rannar, J.F. MacGregor, S. Wold, Adaptive batch monitoring

using hierarchical PCA, in: AIChE Annual Meeting, Los

Angeles, 16±21 November 1997.


[8] S.Joe Qin, Recursive PLS algorithms for adaptive data modeling,

Comput. Chem. Eng. 22 (1998) 503±514.

[9] G. Xu, T. Kailath, Fast estimation of principal eigenspace using

Lanczos algorithm, SIAM J. Matrix Anal. Appl 15 (1994) 974±

994.

[10] J.E. Jackson, A User's Guide to Principal Components, Wiley-

Interscience, New York, 1991.

[11] J.R. Bunch, C.P. Nielsen, D.C. Sorensen, Rank-one modi®cation

of the symmetric eigenproblem, Num. Math 31 (1978) 31±48.

[12] G.H. Golub, Some modi®ed matrix eigenvalue problems, SIAM

Rev. 15 (1973) 318±334.

[13] Golub, G.H., Van Loan, C.F. Matrix Computations, 3rd Edi-

tion, The Johns Hopkins University Press, 1996.

[14] C.C. Paige, Accuracy and e�ectiveness of the Lanczos algorithm

for the symmetric eigenproblem, Linear Algebra and Its Appl 34

(1980) 235±258.

[15] B.N. Parlett, The Symmetric Eigenvalue Problem, Prentice Hall,

Englewood Cli�s, NJ, 1980.

[16] J. Cullum, R.A. Willoughby, Lanczos algorithms for large sym-

metric Eigenvalue Computations, Vol. I, Theory, Birkhauser,

Boston, 1985.

[17] R.I. Shrager, R.W. Hendler, Titration of individual components

in a mixture with resolution of di�erence spectra, pks, and redox

transition, Analytical Chemistry 54 (7) (1982) 1147±1152.

[18] S. Wold, Cross validatory estimation of the number of components

in factor and principal component analysis, Technometrics 20

(1978) 397±406.

[19] D.W. Osten, Selection of optimal regression models via cross-

validation, Journal of Chemometrics 2 (1988) 39±48.

[20] F.R. Malinowski, Factor Analysis in Chemistry, Wiley-Inter-

science, New York, 1991.

[21] R.B. Cattell, The scree test for the number of factors, Multi-

variate Behav. Res. 1 (1966) 245.

[22] H. Akaike, A new look at the statistical model identi®cation,

IEEE Trans. Auto. Con 19 (1974) 716±723.

[23] M. Wax, T. Kailath, Detection of signals by information theore-

tic criterial, IEEE Trans. Acoust. Speech, Signal Processing,

ASSP 33 (1985) 387±392.

[24] J. Rissanen, Modeling by shortest data description, Automatica

14 (1978) 465±471.

[25] R. Dunia, J. Qin, A subspace approach to multidimensional fault

identi®cation and reconstruction, AIChE Journal 44 (1998) 1813±

1831.

[26] R. Dunia, J. Qin, T.F. Edgar, T.J. McAvoy, Sensor fault identi-

®cation and reconstruction using principal component analysis,

in: 13th IFAC World Congress, San Francisco, 1996, pp. N2959±

264.

[27] J.E. Jackson, G. Mudholkar, Control procedures for residuals

associated with principal component analysis, Technometrics 21

(1979) 341±349.


recursive pca for adaptive process monitoringciecfie.epn.edu.ec/wss/virtualdirectories/80... · t...

Documents