liu, r. y. (1995). control charts for multivariate processes

Control Charts for Multivariate ProcessesAuthor(s): Regina Y. LiuReviewed work(s):Source: Journal of the American Statistical Association, Vol. 90, No. 432 (Dec., 1995), pp. 1380-1387Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2291529 .Accessed: 28/08/2012 14:20

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

http://www.jstor.org/action/showPublisher?publisherCode=astata

http://www.jstor.org/stable/2291529?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp

Control Charts for Multivariate Processes Regina Y Liu

This article uses the concept of data depth to introduce several new control charts for monitoring processes of multivariate quality measurements. For any dimension of the measurements, these charts are in the form of two-dimensional graphs that can be visualized and interpreted just as easily as the well-known univariate X, X, and CUSUM charts. Moreover, they have several significant advantages. First, they can detect simultaneously the location shift and scale increase of the process, unlike the existing methods, which can detect only the location shift. Second, their construction is completely nonparametric; in particular, it does not require the assumption of normality for the quality distribution, which is needed in standard approaches such as the x2 and Hotelling's T2 charts. Thus these new charts generalize the principle of control charts to multivariate settings and apply to a much broader class of quality distributions.

KEY WORDS: Control charts; Q chart; Quality control; r chart; S chart; Statistical process control.

1. INTRODUCTION

Control charts are useful tools for monitoring/controlling a manufacturing process. With properly chosen control limits, a control chart can detect a shift from a "good" quality distribution to a "bad" one. When the measurement, denoted by X, of a particular characteristic of a product is used to gauge the quality of the product, the most commonly used charts are the X chart, the X chart (or Shewhart chart), and the cumulative sum (CUSUM) chart. These charts are easy to construct, visualize, and interpret, and most important, have been proven effective in practice. However, they are usually suitable only when the observa- tion X is univariate, and their validity often relies on the assumption of normality of X, which is not always realistic.

In real life, we often encounter multivariate quality measurements rather than univariate, since the overall quality of a product is usually determined by more than one quality characteristic. For example, the quality of a certain type of tablets may be determined by weight, degree of hardness, thickness, width, and length. These quality characteristics are clearly correlated, and control charts for monitoring individual quality characteristics may not be adequate for detecting changes in the overall quality of the product. Thus it is desirable to have control charts that can monitor multivariate measurements directly.

There are some methods for constructing multivariate control charts in the literature (see, for example, Alt and Smith 1988 for a thorough survey and for further references). However these methods are usually restricted to the case of normal distributions and are difficult to visualize and interpret. The main idea behind our control charts is to reduce each multivariate measurement to a univariate index-namely, its relative center-outward ranking induced by a data depth (cf. Sec. 2). Representing the original quality measurements by their corresponding univariate ranks, we are able to develop control charts based on these-ranks following the same principles for the univariate X, X, and

Regina Y Liu is Professor, Department of Statistics, Rutgers University, New Brunswick, NJ 08903. The author gratefully acknowledges support from National Science Foundation Grants DMS-90-04658 and DMS 90- 22126. The author thanks Kay Tatsuoka for his computing assistance and the referees, associate editor, and editor for their helpful comments.

CUSUM charts. The geometric nature of the notion of data depth makes it easy to interpret the values of statistics derived from those ranks and to visualize their plots. This approach is completely nonparametric, and thus the result- ing charts are valid without parametric assumptions on the process model. Moreover, these charts allow us to detect simultaneously the location change and the scale increase in a process. In Section 3 three types of control charts-the r, Q, and S charts-are proposed and justified. They can be viewed as data-depth-based multivariate generalizations of the univariate X, X, and CUSUM charts. Their names are suggested respectively by the relative ranks of sample points with respect to a reference sample, by the quality index introduced in Liu and Singh (1993), and by a plot of sums of deviations. In Section 2 a brief description of data depths and definitions of the relevant statistics suitable for plotting in control charts are presented. A simulated bivariate data set is used to demonstrate the construction of the proposed charts. The results, presented in Figures 1-5, appear to support our methods. A detailed discussion of the simulation is given in Section 4, and some concluding remarks are presented in Section 5.

2. SOME STATISTICS DERIVED FROM DATA DEPTH

Assume that k (k > 1) characteristics of each product are used to determine the quality of the product. The process is considered to be in control if the measurements are following a prescribed quality distribution (required by customers or designing engineers). Let G denote the prescribed k- dimensional distribution, and let Y1, . . ., Y- be m random observations from G. The sample Y1, . . ., Ym, is generally referred to as a reference sample in the context of quality control, and considered as the measurements of products produced by an in-control process. Let X1, X2,... be the new observations from the manufacturing process. Assume that the Xi's follow a distribution F. Based on the observations Xi's, we would like to determine whether the quality of the product has deteriorated or whether the process is out of control. This would mean that the Xi's are not meeting

? 1995 American Statistical Association Journal of the American Statistical Association

December 1995, Vol. 90, No. 432, Theory and Methods

1380

Liu: Control Charts for Multivariate Processes 1381

1.0

0.8

0.6

0 .5 . ...... .. ....

0.4

0.2

o.o .,,,,,,,, ..,.... . ....

0 20 40 60 80 Figure 1. r Chart.

the prescribed G(.) in a certain sense. Thus we need to compare F with G. The statistics that we use to character- ize certain aspects of the difference between G and F are based on the notion of data depth, so we begin by describing some concepts of data depth.

For any point y in Rk, the simplicial depth (Liu 1990) of y with respect to G is defined to be

SDG(Y) = PG{Y E S[Y1,. ... i Yk+1]}, (1)

where s[Yl,... , Yk+ 1] is the open simplex whose vertices Y1,.. ., Yk+l are (k + 1) random observations from G. The value of SDG is a measure of how "deep," or how "central," y is with respect to G. When G is unknown and only a sample {Yi,... , Ym} is given, the sample simplicial depth of y is defined as

SDGm(Y) (k+1 )yEI(y c s[Yil, ... ,Yik+l]), (2)

which measures how deep y is within the data cloud {y1,... , Ym}. Here I(.) is the indicator function; that is, I(A) = 1 if A occurs and I(A) = 0 otherwise. The function Gm( ) denotes the empirical distribution of {Yl, . . . , Ym} and (*) runs over all possible subsets of {Y1, ... , Ym} of size (k + 1). A fuller motivation together with the basic properties of SDG(.) can be found in an earlier work (Liu 1990), where it was shown in particular that SDG (.) is affine invariant and that SDGm (.) converges uniformly and strongly to SDG (). The affine invariance will ensure that our proposed control charts are coordinate free, and the convergence of SDGm to SDG will allow us to approximate SDG(.) by SDGm (.) when C is not specified.

Another notion of depth is based on the Mahalanobis distance. Here how deep a point y is with respect to a given distribution G is measured by how small its quadratic distance is to the mean

MDG(Y) = 1/[1 + (Y - UGEG (Y - ILG)] (3)

where ,UG and EG denote the mean and the covariance matrix of G, ""'denotes the transpose of a (k x 1) vector, and "-1" denotes the inverse of a matrix. The empirical version of MDG(Y) is

MDGm(Y) = 1/[1 + (y-Y)'S-l(Y - Y)], (4)

where Y is the sample mean of Y1, Y.-., and S is the sample covariance matrix. We observe that MDGO() is also affine invariant.

There are several other affine-invariant notions of data depth, including Tukey's depth (Tukey 1975) and the majority depth of Singh (Liu and Singh 1993). As a matter of fact, all control charts proposed herein are also valid for these two depths. (See Liu and Singh 1993 for a fuller discussion of various notions of data depth.) The simplicial depth and the Mahalanobis depth suffice for our purposes, because they illustrate well the contrasting properties of probabilistic geometry and metric distances. Henceforth we use the same notation DG(.) to denote either notion of depth, unless indicated otherwise. We also assume that G and F are two absolutely continuous distributions.

Clearly, a data depth induces a center-outward ordering of the sample points if depth values for all points are computed and compared. More specifically, if we arrange all DG(Yi)'S in an ascending order and use Y[j] to denote the sample point associated with the jth smallest depth value, then Y], Y[2],.. . , Y[,,] are the order statistics of Yi 's, with

0.8

0.7

0.6

0.5

0.4

0.3

0.1

0.0

5 10 15 20 Figure 2. 0 Chart (n = 4).

1382 Journal of the American Statistical Association, December 1995

0.7 -

0.6-

0.5-

0.4

0.3

0.2

0.1

0.0

2 4 6 8 Figure 3. Q Chart (n = 10).

Y[m] being the most central point. The smaller the order (or the rank) of a point, the more outlying that point with respect to the underlying distribution G(.). We now proceed to list some statistics derived from data depth that are used in the next section to construct control charts. We write Y - G to indicate that the random variable Y follows the distribution G, and set

rG(Y) = P{DG(Y) < DG(Y)lY G} (5)

and

rGm (y) #{Yj IDGm (Yj) < DGm (Y), 1, , m}/m (6)

Let F (.) denote the empirical distribution of the sample {Xl, ..., X . We can now define

Q(G, F) = P{DG(Y) < DG(X) lY - G, X F}

= EF[rG(X)]), (7)

Q(G, Fn) - IrG (Xi), (8) n

and

Q(Gm, Fn) Z rGm (Xi) (9) ni= 1

3. CONTROL CHARTS BASED ON DATA DEPTH

We now introduce three control charts-the r chart, Q chart, and S chart-which can be viewed as the X chart, X chart, and CUSUM chart, after the multivariate data have

been transformed into univariate data by data depth. In principle, a control chart consists of critical values, the upper control limit (UCL) and the lower control limit (LCL), for a sample quality measurement. Between the two control limits is the center line (CL), which represents no deviation from the prescribed distribution. Samples from the manufacturing process are recorded in time order, and their measurements are plotted on the chart. By convention, those sample points are connected by a straight line, so that the sequence of activities over time can be easily visualized. The region above UCL or below LCL is termed the out-of- control region. A sample point falling in the out-of-control region is interpreted as evidence that the process is out of control, and a proper corrective action is sought. If the process is declared out-of-control when in fact it is not, we say that we have a "false alarm." The UCL and LCL are chosen so that the false alarm rate is small, say a. Thus a control chart at every plotted point is a visualization of an a-level test with the null hypothesis Ho: G = F. The re- jection region in this test corresponds to the out-of-control region in the control chart. (A more detailed discussion of control charts can be found in, for example, Banks 1989 and Wadsworth, Stephen, and Godfrey 1986.)

3.1 The r Charts

The r chart introduced in this section is similar to the X chart for univariate data. It is based on the statistics r* (.) of (5) and (6). First we discuss the X chart. Assume that the observations Y., . . , Ym and Xl, . . , X, are univariate and that our main concern is a possible shift in the mean in the Xi's. If G is a normal distribution with mean ,u and standard deviation a, then the following is a typical X chart of Xi's:

Xi

UCL ---------------------------------

CL

LCL ----__------------------_-L----

XTime 1 2 3 4 5 6 7 8 9 10

In this example, UCL = CL + Z,/2u, LCL = CL -Z/2,

and CL = ,u if ,u is known and = Y otherwise. Here zc, indicates the upper a critical value of the standard normal distribution; that is, a = P(Z > z,), where Z KV(O, 1). The X chart allows us to detect a possible mean shift from the prescribed value ,u or the existence of any trend or pattern in the sequence of observations. It is a simple but effective tool for monitoring an univariate process; however, it does not generalize easily to the multivariate case. For bivariate normal G, a bivariate X chart with elliptical contours as control limits, also called control ellipses, was studied by Alt and Smith (1988). Besides the restriction of normality, it is also difficult to visualize and detect any pattern or trend, because the chronological order of the observations


5 0 . ... . . . . . . . . . . .. . . . . . . . . . . .

0-2

-10

-15

-20 - _ ._ ._ ._ _

0 20 40 60 80 Figure 4. S Chart.

is lost in the plot. Furthermore, when the dimension k goes beyond 3, it does not seem possible to follow the same idea to construct charts that are easy to visualize.

Our r chart is constructed as follows. Compute {rG (Xi), rG(X2) ...} (or rGm (X1), rGm (X2), ... if only Yi, . . ., Ym are available, but not G), following (5) (or (6)). The r chart is the plot of rG(Xi)'s (or rGm(Xi)'s) against time i, with CL = .5 and the control limit a. The process is declared out-of-control if rG () falls below a. Recall that a is the false alarm rate, which generally is close to zero, so the r chart only has LCL = a but no UCL. The motivation and justification of the r chart as a control chart are given next.

The expression (6) shows that rGm (X) is an indication of how outlying X is with respect to the data cloud Yi's. A very small value of rGm (X) means that only a very small proportion of Yi's are more outlying than X. Thus X is at the "outskirt" and is not conforming to most of the central part of the good data set. Assuming that X - F, a small value of rGm () then suggests a possible deviation from G to F. Since rGm (.) is defined according to data depth, the possible deviation here can be a shift in "center" and/or an increase in scale. (A detailed mathematical justification of this interpretation can be derived from Liu and Singh 1993, sec. 3.) Thus the r chart with LCL = a corresponds to an a-level test of the following hypotheses:

Ho: F = G vs. Ha: there is a location shift

and/or a scale increase from G to F. (10)

We observe that the alternative hypothesis is particularly suitable for detecting quality deterioration in quality control, as it presents a loss of accuracy and/or a loss of precision. This also justifies viewing the process as out-of-

control when Ho is rejected or, equivalently, when an ob- servation falls below a in the r chart.

To explain the choice of CL = .5 and LCL = a in the r chart, we require the properties of rG (X) and rGm (X)

established by Liu and Singh (1993) and listed in Proposi- tion 3.1.

Proposition 3.1. Assume that F = G and X - F. Let U[O, 1] denote a uniform distribution supported in [0, 11, and let the notation ,L stand for convergence in law. If DG(X) has a continuous distribution, then

a. rG(X) U[O, 1], and b. as m r oc, rGm(X) ,L U[O,1] along almost all

{ y1,... ,Ym} sequences, provided that DGm() converges to DG (-) uniformly as m -* oo.

Remark 3.1. The uniform convergence of DGm ( ) holds for the simplicial depth if G is absolutely continuous, and for the Mahalanobis depth if G has a bounded second absolute moment.

Under Ho: F = G, Proposition 3.1 implies that the expected value of rG(X) is .5 and that of rGm (X) is .5 almost surely for all sequences .Y., .. , Ym } for large m. This justifies choosing .5 to be CL of the r chart. When rG(X) (or rGm (X)) is much smaller than .5, there is doubt for Ho and evidence to support Ha, signaling a possible quality deterioration. When rG(X) (or rGm (X)) is larger than .5, there is indication of a decrease in scale with perhaps a negligible location shift. This is seen as an improvement in quality, termed a gain in precision, and thus the process should not be viewed as out-of-control. Therefore, there is only an LCL in the r chart. The uniform distribution of rG(X) (or rGm (X)) implies clearly that LCL should be a.

1

0

-2 -.-

'-3

-4

-5

-6

-7

0 20 40 60 80 Figure 5. S* Chart.


Table 1. Simplicial Depth Values and Ranks

X D(X) r(X) X D(X) r(X)

1 .0028 .082 41 0 .022 2 .2263 .948 42 0 .022 3 .1794 .840 43 0 .022 4 .0196 .256 44 .0107 .194 5 .1144 .670 45 0 .022 6 .0025 .074 46 .0041 .100 7 .0115 .196 47 0 .022 8 .0443 .392 48 0 .022 9 .0389 .358 49 .0111 .194

10 .0268 .296 50 .0261 .290 11 0 .022 51 0 .022 12 .1962 .888 52 0 .022 13 .1651 .812 53 0 .022 14 .1835 .852 54 0 .022 15 .0249 .280 55 0 .022 16 .0583 .446 56 0 .022 17 .1106 .658 57 0 .022 18 .0022 .068 58 0 .022 19 .2315 .962 59 0 .022 20 .0366 .348 60 0 .022 21 .0711 .502 61 .0932 .588 22 .0645 .472 62 0 .022 23 .0103 .186 63 0 .022 24 .0797 .542 64 0 .022 25 .0870 .566 65 0 .022 26 .0051 .114 66 0 .022 27 .0518 .424 67 0 .022 28 0 .022 68 .0123 .202 29 .0044 .102 69 0 .022 30 .0903 .576 70 .1984 .896 31 .1900 .866 71 .0250 .280 32 .1621 .800 72 .0087 .160 33 .1499 .768 73 0 .022 34 .0757 .528 74 0 .022 35 .0514 .420 75 0 .022 36 .0581 .444 76 0 .022 37 .1096 .656 77 0 .022 38 .0570 .436 78 0 .022 39 .2082 .920 79 0 .022 40 .1927 .876 80 0 .022

Remark 3.2. Even though the r chart does not have the UCL to make its CL the center line of the in-control region, the CL here does serve as a reference point to allow us to observe whether a pattern or trend is developing in a sequence of samples.

3.2 The Q Charts

The idea behind the Q chart is similar to that of the univariate X chart. When X1, X2, ... are univariate and G is normal, the X chart plots the averages of consecutive subsets of the Xi's. The X chart may prevent a false alarm when the process is actually in control but some individual sample point falls outside the control limits merely due to random fluctuations. This is an advantage over the X chart. In the multivariate setting we propose to plot the averages of subsets of the rG (Xi)'s (or rGm (Xi)'s). Assume that each subset has size n. In the notation of (8) and (9), the averages of the rG(Xi)'s and rGm (Xi)'s are given by Q(G, Fn) and Q(Gm,F ). Here Fn is the empirical distribution of the Xi's in the jth subset, j =1, 2,...The Q chart plots

or

IQ(Gm, Fnl)l Q(Gm, Fn2) , ... .}

if only Yi, Ym are available. The main issue now is to set the correct values for CL and

LCL in this Q chart. This depends on the choice of n. We shall see that when n is large, in view of the approximations described in Proposition 3.2, CL should be .5, whereas LCL should be (.5 - z (12n) -1/2) for plotting {Q(G, Fn ) }'s and

{.5 - zc, 12 [(1/m) + (1/n)]} for plotting {Q(Gm ,Fj)} '

(cf. Fig. 3). This approximation seems to be quite reasonable even when n is as small as 5. In practice, however, n can be even smaller, say 3 or 4. In this case, we may use the exact distributions for Q(G, Fn) given in Proposition 3.3. It turns out that for a small a value the Q chart should have CL = .5 and LCL = (n!a)l/n/n.

First we describe the large n asymptotics. The Q chart corresponds to the a-level test based on Q(G, Fn) (or Q(Gm, Fn)) for testing the same set of hypotheses in (10). These are actually two of the several multivariate rank tests studied by Liu (1992) and Liu and Singh (1993). Their main asymptotic properties are as follows.

Proposition 3.2. Assume that the conditions in Propo- sition 3.1 hold. Then

a. as n -s oc, [Q(G,Fn) - L K(O, 1/(12n)); and b. as min(m,n) - oo,[Q(Gm,Fn) - 2] K{O,

[(1/m) + (1/n)]/(12)}, under the following additional condition: if MD(.) is used to define Q(., .), and G has a bounded fourth absolute moment; if SD(.) is used to define Q(., *), and G is a one-dimensional distribution and its density is bounded above and below in a neighborhood of the median (or center).

The statement (a) is a straightforward application of the central limit theorem, because Q(G, Fn) is just the average of n iid uniform random variables. The statement (b) has been established by Liu and Singh (1993). Although (b) has been proven only for R1 in the case of SD, it was conjec- tured by Liu and Singh (1993) with the support of simulation results that it actually holds for any k-dimensional G. It is now evident that CL and LCL should be set to the values indicated earlier when n is large.

When n is small, the foregoing asymptotic results may not be applicable. Since LCL in this case is the ath quan- tile of the distribution of Q(G, Fn) = (1/n) En 1 rG(Xi), we need the distribution of the average of uniform random variables (cf. Prop. 3.1). This follows directly from the formula for the distribution of the sum of uniform random variables provided in Proposition 3.3.

Proposition 3.3. Let {U1, ... , Un } be an iid sample from U[O, 1], and let Hn(t) be the distribution function of nZ> Ui; that is, Hn(t) = P{ZE>n Ui < t}. Then for each n= 1,2,...,H(t) 0 fort <0 and

Hn(t) = LT E -


Table 2. Q-values (n = 4)

.5315 .3330 .3910 .5975 .5090 .4255 .2815 .5860 .5400 .7220

.0650 .0415 .1320 .0220 .0220 .1635 .0670 .3395 .0220 .0220

where

(x)+ =0, if x < 0; - xn if x > 0.

This formula has been derived by Feller (1971). The expression (11) shows that Hn (.) is a piecewise polynomial. For our purpose, the most relevant part of the polynomial is

Hn (t) = !tn, if 0 < t < 1;

(_n 1) = ! (tT-n(t-1) ), if 1<t<2;

= ! -n _ (t_1)n+ n( 2 ) (t-2)n if 2<t<3. (12)

To determine LCL for our Q chart for small n, we need to find the value wg, such that P(1/n En1= Ui < WC,) -=a or, equivalently, Hn(nma) = a. Formula (12) implies that for a < 1/n!, (nmw,)n/n! = a. Consequently,

-ce = (n! a)1/n/n. This justifies our choice of LCL for the Q chart. For example, when n = 4 and a = .025, then W.025 = [24(.025)] 1/4/4 = .220. This value is used as the LCL for the Q chart in Figure 2, where the Xi's are grouped in sets of 4. It is also clear that CL here should be .5, because it is the expected value of the average of n iid U[0, 1] random variables. Note that in prac- tical situations in quality control, a is usually chosen to be .0027 or smaller. Thus when n is not greater than 4, the LCL w., is given by (n!a) 1/n/n as shown earlier. How- ever, if for whatever reasons, a is chosen to be greater than 1/n!, then the proper piecewise formula in (12) should be used to determine the value for w. For example, for n = 4 and a = .1, we would need to solve the equa- tion 1/4! ((4w. 1)4 -4 ((4w. 1- _1)4) = .1. The solution is unique, because Hn (.) is a strictly increasing function. In general, there are no convenient closed forms for solutions of polynomial equations of high orders. However, they can be easily obtained by using Newton's method or by using computer algorithms in, say, Mathematica.

3.3 The S Charts

We shall use the univariate CUSUM chart to motivate the S chart. When the Xi's are univariate, the simplest CUSUM chart is basically the plot of En (Xi - A), which reflects the pattern of the total deviation from the expected value. It is more effective than the X chart or the X chart in detecting small process change and is perhaps the most used chart. In the multivariate setting, the idea of CUSUM chart naturally suggests plotting the values Sn (G) and Sn (Gm)

defined by

Sn(G) = E [rc(Xi) - (13)

and

Snt(Gm) = l - 2I (14)

Since Sn(G) n[Q(G,Fn) - 1/2] and Sn(Gm) = n[Q(Gm, Fn) -1/2], we can immediately deduce the following from Proposition 3.2.

Proposition 3.4. Under the conditions described in Proposition 3.2, we have

a. Sn(G) )L N(O, n/12) as n -* o0, and b. Sn(Gm) __*L K(0, nm2[(1/m) + (1/n)]/12), as

min(m, n) -* oo.

Proposition 3.4 implies that the LCL for the S chart based on Sn(G) is -(z,(n/12)1/2) and the LCL for the S chart based on Sn(Gm) is -{z, Vn2[(l/m) + (1/n)]/12}. We observe that the control limit here is a curve rather than a line, as shown in Figure 4. In fact, the control limit curves down following . When n is large, the S chart can easily exceed the standard paper size, which is impractical. Thus it is convenient to standardize all the CUSUM's to have a straight line control limit (see Fig. 5). This means plotting Sn(G) = Sn(G)/ m/12 or Sn*(Gm) = Sn(Gm)/ /n2[(1/m) + (1/n)]/12 for n =1,2 .... This S* chart has CL = Oand LCL =-z,.

4. SIMULATION RESULTS

In this section we use a bivariate data set to illustrate the construction of the control charts discussed earlier. The simulation is carried out using S language on a SUN work- station.

The data set is obtained as follows. Let G - V((8)0 ( ?1)) . We generate 540 sample points from G, labeling the first 500 as Y1, . . , Y500 and the last 40 as X1, . , X40. We also generate 40 sample points from the distribution KV ((2), (o4 )) and label these 40 sample points as X41,.. ., X80. The distributions here have been chosen to be normal just to make the evaluation of the outcome easier. Normality is not required for the applicability of the charts. Note that there is a clear mean shift and a scale increase in the distribution for the last 40 Xi's. In principle, we should expect all our charts to detect this change. As Figures 1-5 show, this is indeed the case.

For each Xi, we compute its simplicial depth, using the FORTRAN algorithm developed by Rousseeuw and Ruts

Table 3. Q-values (n = 10)

.4112 .5336 .3506 .6714 .0910 .02200 .1840 .06160


Table 4. S-values

-.418 .030 .370 .126 .296 -.130 -.434 -.542 -.684 -.888 -1.366 -.978 -.666 -.314 -.534 -.588 -.430 -.862 -.400 -.552

-.550 -.578 -.892 -.850 -.784 -1.170 -1.246 -1.724 -2.122 -2.046 -1.680 -1.380 -1.112 -1.084 -1.164 -1.220 -1.064 -1.128 -.708 -.332

-.810 -1.288 -1.766 -2.072 -2.550 -2.950 -3.428 -3.906 -4.212 -4.422 -4.900 -5.378 -5.856 -6.334 -6.812 -7.290 -7.768 -8.246 -8.724 -9.202 -9.114 -9.592 -10.070 -10.548 -11.026 -11.504 -11.982 -12.28 -12.758 -12.362

-12.582 -12.922 -13.400 -13.878 -14.356 -14.834 -15.312 -15.79 -16.268 -16.746

(1992). This algorithm is highly efficient, because it requires only 0(m logm) steps in computing the simplicial depths for m data point, instead of O(m4) steps as required by direct computation based on solving systems of linear equations. The simplicial depth values of Xi's are recorded in the first column of Table 1. Based on these values we can compute all rGm (Xi) using (6), and record them in the second column of Table 1. Figure 1 gives the plot of the rGm(Xi)'S with CL = .5 and LCL = .025, which is the a value that we choose for all five charts. It clearly shows that the process is out-of-control in the second half, with most of the rGm (Xi)'s falling below LCL. The few false alarms in the first half of the Xi's should be attributed to random fluctuations in the same manner that false alarms are characterized in a univariate X chart.

Figures 2 and 3 show the Q charts with the group size n = 4 and n = 10. The {Q(Gm,Fn),j = 1,2,...} are computed according to the definition (9) and are recorded in Tables 2 and 3. For Figure 2, the CL has been set to .5 and the LCL has been set to .220, following Proposition 3.3. In Figure 3, the results in Proposition 3.2 lead to the choice of CL = .5 and LCL = {- zc,1/12[(1/m) + (1/n)]}, which turns out to be .3193 when a = .025. Both plots clearly show that the process is out-of-control in the second half. We also observe that the averaging of rGmH()'s in Q has eliminated the random fluctuations appearing in the first half of the r chart in Figure 1. In principle, because the underlying distribution here is specified, we can use for example the computing package Mathematica to compute the exact values of DG(.)'S and hence Q(G, Fn), j = 1, 2, ... and give the corresponding Q chart. The difference of this chart and our Figure 2 appears to be negligible.

Figure 4 illustrates the S chart of the Sn(Gm) values in Table 4. Since the S values are not standardized here, the LCL is -z./( 2/12)[(1/m)+(1/n)]. To keep the chart within standard paper size, we need to adopt a much smaller scale for the S axis. By contrast, in Figure 5, the S values have been standardized, and hence no severe rescaling is needed. The standardized S values are recorded in Table 5, labeled as S*. The control limit LCL is a straight line -z.,

which is -1.96 in this case. For both figures, CL equals zero.

In the simulation here, we have chosen m = 500. Clearly, larger values of m give better approximations to the limiting distributions stated in Propositions 3.1, 3.2, and 3.4 and to LCL's for the r, Q, and S charts. Our experience shows that the approximation results are reasonable when m is as small as 50 in the bivariate case. We would recommend larger values for higher-dimensional observations.

5. CONCLUDING REMARKS

In addition to the X, X and CUSUM charts, there are more complicated control charts for monitoring a univariate process mean change, such as the moving average chart, the EWMA chart and the CUSUM chart with a V mask (cf. Wetherill 1977). It would be interesting to develop our charts further along these lines. For example, a moving average chart based on the r* (.) values in (5) or (6) can be readily constructed. To obtain proper control limits for this chart, one may apply the moving blocks bootstrap techniques of Liu and Singh (1992) to develop the distributions of the moving averages.

As discussed by Alt and Smith (1988), the classical multivariate control charts based on the x2 or Hotelling's T2 statistics (Hotelling 1949) are valid only when the process follows a normal distribution and can be used to detect a mean shift only. When the process is bivariate, a control ellipse may be used instead of the foregoing two charts. The control ellipse approach also requires the normality assumption for the underlying process, and it loses the chronological order of the plotted observations. In a different direction, one may use separate X charts for individual component variables and then apply Bonferroni's inequality to provide a bound for the level of the combined test. As pointed out by Alt (1982), this inequality is not sharp enough to give an accurate level unless the component variables are independent. More precisely, this approach tends to overestimate the probability for asserting that the process is in control.

Since the sample Mahalanobis depth defined in (4) and Hotelling's T2 are both measuring the quadratic distance of

Table 5. S*-values

-1.447 .073 .738 .217 .456 -.183 -.564 -.659 -.783 -.963 -1.411 -.966 -.632 -.287 -.471 -.501 -.355 -.691 -.312 -.419

-.407 -.418 -.630 -.587 -.530 -.775 -.809 -1.098 -1.327 -1.257 -1.014 -.819 -.649 -.623 -.659 -.680 -.585 -.611 -.378 -.175

-.421 -.661 -.895 -1.037 -1.261 -1.442 -1.656 -1.866 -1.989 -2.066 -2.264 -2.459 -2.650 - 2.837 -3.020 -3.200 -3.377 -3.550 3.721 -3.889 -3.816 -3.980 -4.142 -4.300 -4.457 -4.610 -4.762 -4.840 -4.987 -4.794 -4.840 -4.932 -5.075 -5.216 -5.355 -5.492 -5.627 -5.760 -5.892 -6.022


a point to its mean, one may attempt to equate Hotelling's T2 chart to our r or Q charts when Mahalanobis depth is used. Note that in our approach, Mahalanobis depth serves only as a stepping stone to reduce the observations to "ranks." What we chart here are the "ranks" but not the Mahalanobis depth values themselves. The determi- nation of the control limit in Hotelling's T2 plot requires the exact sampling distribution of Hotelling's T2 statistic, whereas this is not needed in our charts due to the further transformation of statistics into ranks. Consequently, our charts based on Mahalanobis depth are different from the Hotelling T2 plots. Regarding the choice of data depth for our charts, we note that if the underlying distribution is close to elliptical, then it is more efficient to use Ma- halanobis depth. Otherwise, the more geometric type of depth, such as majority depth, simplicial depth, and Tukey's depth, may be more desirable, because they do not require moment conditions.

[Received September 1993. Revised January 1995.]

REFERENCES

Alt, F. (1982), "Multivariate Quality Control: State of the Art," ASQC Annual Quality Congress Transactions, pp. 886-893.

Alt, F., and Smith, N. (1988), "Multivariate Process Control," in Hand-

book of Statistics, 7, eds. P. R. Krishnaiah and C. R. Rao, Amsterdam: Elsevier, pp. 333-351.

Banks, J. (1989), Principles of Quality Control, New York: John Wiley. Feller, W. (1971), Introduction to Probability Theory and Its Applications

(2nd ed.), New York: John Wiley. Hotelling, H. (1949), "Multivariate Quality Control," in Techniques in Sta-

tistical Analysis, eds. C. Eisenhart, M. W. Hastay, and W. A. Wallis, New York: McGraw-Hill.

Liu, R. (1990), "On a Notion of Data Depth Based on Random Simplices," The Annals of Statistics, 18, 405-414.

(1992), "Data Depth and Multivariate Rank Tests," in L1 -Statistical Analysis and Related Methods, ed. Y. Dodge, Amsterdam: Elsevier, pp. 279-294.

Liu, R., and Singh, K. (1992), "Moving Blocks Bootstrap and Jackknife Capture Weak Dependence," in Exploring the Limits of Bootstrap, eds. R. LePage and L. Billard, New York: John Wiley, pp. 225-248.

(1993), "A Quality Index Based on Data Depth and Multivariate Rank Tests," Jourmal of the American Statistical Association, 88, 252- 260.

Mahalanobis, P. C. (1936), "On the Generalized Distance in Statistics," Proceedings of the National Academy India, 12, 49-55.

Rousseeuw, P. J., and Ruts, I. (1992), "Bivariate Simplicial Depth," techni- cal report, University of Antwerp, Dept. of Mathematics and Computer Science.

Tukey, J. W. (1975), "Mathematics and Picturing Data' Proceedings of the 1975 International Congress of Mathematics, 2, 523-531.

Wadsworth, H., Stephen, K. S., and Godfrey, A. B. (1986), Modern Meth- ods for Quality Control and Improvement, New York: John Wiley.

Wetherill, G. B. (1977), Sampling Inspection and Quality Control (2nd ed.), New York: Chapman and Hall.

liu, r. y. (1995). control charts for multivariate processes

Documents

new control charts

principle of control

introduction control

x chart

new charts

cusum charts

used charts

overall quality