a review and analysis of the mahalanobis—taguchi
TRANSCRIPT
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
1/16
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
2/16
A Review and Analysisof the Mahalanobis–Taguchi System
William H. Woodall and Rachelle Koudelik
Department of Statistics
Virginia Polytechnic Institute
and State University
Blacksburg, VA 24061
( [email protected]; [email protected] )
Kwok-Leung Tsui and Seoung Bum K im
School of Industrial and Systems Engineering
Georgia Institute of Technology
Atlanta, GA 30332
( [email protected] ch.edu; [email protected])
Zachary G. Stoumbos
Department of Management Science
and Information Systems and Rutgers Center
for Operations Research (RUTCOR)
Rutgers, The State University of New Jersey
Piscataway, NJ 08854
( [email protected] s.edu)
Christos P. Carvounis, MD
State University of New York at Stony Brook
Nassau University Medical Center
East Meadow, NY 11554
The Mahalanobis–Taguchi system (MTS) is a relatively new collection of methods proposed for diagno-
sis and forecasting using multivariate data. The p rimary proponent of the MTS is Genichi Taguchi, who
is very well known for his controversial ideas and methods for using designed experiments. The MTS
results in a Mahalanobis distance scale used to measure the level of abnormality of “abnormal” items
compared to a group of “normal” items. First, it must be demonstrated that a Mahalanobis distance
measure based on all available variables on the items is able to separate the abnormal items from the
normal items. If this is the case, then orthogonal arrays and signal-to-noise ratios are used to select
an “optimal” combination of variables for calculating the Mahalanobis distances. Optimality is dened
in terms of the ability of the Mahalanobis distance scale to match a prespecied or estimated scale
that measures the severity of the abnormalities. In this expository article, we review the methods of
the MTS and use a case study based on medical data to illustrate them. We identify some conceptual,
operational, and technical issues with the MTS that lead us to advise against its use.
KEY WORDS: Classication analysis; Discriminant analysis; Medical diagnosis; Multivariate analy-sis; Pattern recognition; Signal-to-noise ratio; Taguchi methods.
1. INTRODUCTION
Genichi Taguchi is most well known for his work on
the design of experiments. His ideas have generated a
considerable amount of discussion and controversy and his
methods are widely used (see, e.g., Taguchi and Wu 1980;
Box 1996; Montgomery 1992; Nair 1992; Tsui 1996; Wu and
Hamada 2000; Taguchi, Chowdhury, and Taguchi 2000). The
general consensus, among statisticians at least, seems to be
that although many of Taguchi’s overall ideas on experimental
design are very important and inuential, the techniques thathe proposed should be replaced with simpler, more effective
statistical methods.
It is not as well known that Taguchi also proposed on-line
quality control methods (Taguchi 1981; Taguchi, Elsayed, and
Hsiang 1989). Adams and Woodall (1989) and Nayebpour
and Woodall (1993), among others, have studied these on-line
methods. Taguchi’s off-line ideas have had a much greater
impact than his ideas on on-line quality control.
We study a new set of methods proposed by Taguchi,
Chowdhury, and Wu (2001) and Taguchi and Rajesh (2000)
collectively referred to as the Mahalanobis–Taguchi system
(MTS). The MTS is proposed as a diagnosis and forecastingmethod using multivariate data. In this approach, this mul-
tivariate data must be available on a “healthy” or “normal”
group of items and a number of “abnormal” items that may
sometimes be classied into groups based on the severity
levels of the abnormalities. In the MTS, it must rst be
conrmed that the relative sizes of the Mahalanobis distances
(MDs) based on the standardized variables of the healthy
group can discriminate between normal and abnormal items.
Once this fact is established, the number of variables used
is reduced, if possible, using orthogonal arrays (OAs) and
signal-to-noise (S/N) ratios to evaluate the contribution of
each variable. Each row of the OA determines a subset of the original variables. The recommended S/N ratio measures
the ability of the MDs, corresponding to the abnormal items
and calculated using this subset of variables, to reect a
prespecied or estimated measure of the severity of the
abnormalities. Only those variables with effects that show
an increase in the average S/N ratio are retained. The MD
scale using these variables has a number of stated purposes,
including diagnosis and forecasting.
© 2003 American Statistical Association and
the American Society for Quality
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
DOI 10.1198/004017002188618626
1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
3/16
2 WILLIAM H. WOODALL ET AL.
Taguchi et al. (2001) listed a number of areas of application
for the MTS, including inspection and sensor systems in man-
ufacturing, patient monitoring, re detection, earthquake fore-
casting, weather forecasting, credit scoring, and voice recog-
nition. They also described case studies involving engineering
applications of the MTS in many large companies, includ-
ing Nissan Motor, Mitsubishi Space Software, Xerox, Delphi
Automotive Systems, ITT Industries, Ford Motor, Fuji PhotoFilm, and others.
We review the MTS by explaining the approach and calcula-
tions in Section 2. In Section 3 we discuss the MTS and iden-
tify some conceptual, operational, and technical issues asso-
ciated with the methods. We present a detailed case study in
Section 4. We discuss other aspects of the MTS in Section 5,
and present concluding remarks in Section 6. A primary con-
clusion is that the methods of the MTS are, in some respects,
not well dened conceptually or operationally.
2. DESCRIPTION OF THE
MAHALANOBIS–TAGUCHI SYSTEM
In this section we provide a detailed explanation of the MTS
and the required computations, as presented by Taguchi and
Rajesh (2000). These authors break the MTS into four stages.
In stage 1, the variables that dene the “healthiness” of an
item are identied. Data are collected on the healthy or normal
group. As described later, the variables are standardized and
the MDs calculated for the normal items. These values dene
the “Mahalanobis space” used as a frame of reference for the
MTS measurement scale.
We refer to the variables collected on each item to deter-
mine its “healthiness” as V i , i D 11 21 : : : 1p. We denote byV ij the observation of the ith variable on the j th item, i D11 21 : : : 1p, j D 11 21 : : : 1m. Thus the p 1 data vectors forthe normal group are denoted by vj , j D 11 21: : : 1 m.
Each individual variable in each data vector is standardized
by subtracting the mean of the variable and dividing by its
standard deviation, with both statistics calculated using data
on the variable in the normal group. Thus we have the stan-
dardized values
Zij D 4V ij ƒSV i5¯
S i1 i D 11 21 : : : 1 p1 j D 11 21 : : : 1 m1(1)
where
SV i DmX
j D1V ij ̄ m
and
S i Ds
mXj D1
4V ij ƒSV i52¯
4mƒ150
Next, the values of the MDs, MDj , j D 11 21: : : 1 m, are cal-culated for the normal items using
MDj D 41¯
p5zT j Sƒ1
zj 1 (2)
where zT j D 6Z1j 1 Z2j 1 : : : 1Zpj 7 and S is the sample correlationmatrix calculated as
SD 1¯
4mƒ15mX
j D1zj z
T j 0
Taguchi and Rajesh (2000) stated that the MDj values in (2)
have an average value of unity. For this reason, they also refer
to the Mahalanobis space as the unit space.
In stage 2, abnormal items must be selected. There is no
uncertainty incorporated into the MTS regarding the status of
each item used for determining the MTS measurement scale.
As in discriminant analysis, it is assumed that each item is
known to be either normal or abnormal.The MDs of the abnormals with data vectors denoted by vj ,
j D mC 11 mC21 : : : 1mC t are calculated after the variablesare standardized using the normal-group means and standard
deviations. Thus we have MDj , j D mC11 mC 21: : : 1 mC t,with MDj dened in (2), where the ith element of zj in
(2), Zij , is calculated using (1), for i D 11 21 : : : 1p andj D mC11 mC 21 : : : 1mC t.
According to the MTS, the resulting MD scale is good if
the MDj values for the abnormal items are higher than those
for the normal items.
In stage 3, OAs and S/N ratios are used to identify the most
useful set of variables. An OA is a design matrix that containsthe levels of various factors in the runs of an experiment to
investigate the effects of the variables on a response of inter-
est. Each factor of the experiment is assigned to a column of
the OA, and the rows of the matrix correspond to the experi-
mental runs. The MTS has p factors in the experiment, each
with two levels. The level of a factor signies the inclusion
or exclusion of a variable in the MTS analysis. The p factors
are assigned to the rst p columns of the OA, with the other
columns ignored. Thus the OA selected must initially have
at least p columns. Each row of the OA determines which
variables are included in any given experimental run. For each
of these runs, the MD values are calculated for the abnormalsas in stage 2, but using only the indicated variables. These
MD values are then used to calculate the value of a S/N ratio,
which becomes the response for the run.
Many different S/N ratios are used in Taguchi’s analysis
of designed experiments. These are dened in such a way
that larger S/N ratio values are preferred. One option men-
tioned in the MTS is to use Taguchi’s larger-is-better S/N ratio,
dened as
ƒ10log"
41=t5mCtX
j DmC1
1
MDj
2́#
1
because larger MD values further separate the abnormals fromthe normal group. Taguchi and Rajesh (2000) recommended
using the dynamic type S/N ratio instead. For the dynamic
S/N ratio to be calculated, the severity value of each abnormal
item must be established. These severity levels are denoted by
M j , j DmC11 mC21 : : : 1mC t. Larger values of M j indicatea greater degree of abnormality. The goal of this stage is to
select a subset of the original variables such that the result-
ing MDj values of the abnormals most appropriately reect
the levels of severity M j . If the values of M j are unknown,
Taguchi and Rajesh (2000) recommended grouping the abnor-
mal items into classes based on a general level of severity,
perhaps obtained subjectively. The value of M j used for eachmember of a class is the average value of the square roots
of the MDs for the members in the class. These MDs are
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
4/16
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
5/16
4 WILLIAM H. WOODALL ET AL.
not understood in the context of a meaningful sampling (and
conceptual) framework.”
In addition, in our view, the use of the MTS measurement
scale has never been clearly explained. Taguchi and Rajesh
(2000), for example, stated that the problem of the MTS is not
one of classication of a future observation into one of two
populations corresponding to normal and abnormal. Taguchi
et al. (2001, p. 7) stated that the MD values should be used“in continuous mode rather than discrete mode.” Nevertheless,
a university admission process is given as an application of
the MTS that would seem to require classication. Also, the
use of a threshold for MD in the MTS seems to imply classi-
cation. It is clear, however, that the MTS results in an MD
measurement scale that should measure the degree of abnor-
mality of the items. Use of the MD scale is similar to that of a
discriminant function in discriminant analysis. This similarity
is discussed further in the case study in Section 4. Another sta-
tistical option would be to use standard model-tting methods,
such as ordinal logistic regression, with the level of severity as
the dependent variable and the variables V i, iD
11 21 31 : : : 1p,
as the explanatory variables.
3.2 Operational Issues
In stage 2, it must be shown that the MD values of the
abnormal items are higher than those for the normal items. No
operational denition is given, however, for “higher than.” If
the criterion means that the smallest MD value for the abnor-
mal items must be higher than the largest value for the normal
items, as in the case study in Section 4, then this would appear
to limit the usefulness of the approach. If normal and abnor-
mal items are not clearly distinguishable, then it seems that
misclassication probabilities must be considered, somethingnot possible under the MTS framework that eschews the use
of probability.
A designed fractional factorial experiment is used as a
search algorithm for optimization in the MTS. The run for
which all factors are at their low levels is not a valid run,
however, because at least one variable must be used in the
analysis. Thus an OA containing this run could not be used.
The OA and the experimental design methods are used as an
optimization technique to nd the combination of variables
that maximize the S/N ratio. As illustrated in the case study
in Section 4, this optimal combination is not always obtained.
Fractional factorial designs are used in industry to reduce thenumber of runs, because each run is often expensive. This
goal seems much less important in an optimization application
involving only computations. Of course, the MTS approach
could be modied to include a better search algorithm for
the optimal combination of variables or another S/N ratio,
e.g., one based on a rank correlation coefcient that would
lead to an MTS scale that would match, to the greatest extent
possible, the order of the given severity levels of the abnormal
items.
3.3 Technical Issues
Taguchi and Rajesh (2000) stated that the expected value of MDj in (2) for the normal items is unity. This is an approxi-
mation, however, evidently based on a chi-squared distribution
with p degrees of freedom. This is the probability distribution
of pMDj , provided that sampling is from a multivariate nor-
mal distribution and the mean vector and variance-covariance
matrix are assumed to be known and used in the calculations
instead of the estimates. Under the assumption of multivariate
normality and estimation of the mean vector and variance-
covariance matrix, Tracy, Young, and Mason (1992) reported
that the marginal distribution of MD
j is related to a beta dis-tribution and has a mean of (mƒ 15=m, not unity. The meanof MDj is also (mƒ15=m if the m observations in the normalgroup represent the entire population of normal items. Finally,
it can be shown using matrix algebra that the average MD
value for the m items in the normal group is always exactly
(mƒ 15=m.Moreover, Taguchi and Rajesh (2000) stated that O‚ from (4)
is 1 when working averages are used to t the regression line
through the origin. This is true, however, only if the working
averages are calculated using the variables included in the par-
ticular run being considered. It is not reasonable to use just
the variables in each run to calculate the working averages,
because this would cause the measure of the degree of severity
of abnormal items, and their relative rankings, to vary from
run to run. Although descriptions of the MTS do not specify
explicitly the variables used to obtain the working averages,
all of the variables are used to obtain the working averages in
the medical data case study of Taguchi and Rajesh (2000).
4. A MEDICAL CASE STUDY
Taguchi and Rajesh (2000) and Taguchi et al. (2001) justi-
ed their MTS approach solely through the use of case studies.
In this section we consider a medical diagnosis case study
of Taguchi and Rajesh (2000) involving liver disease. Thestudy group comprised a healthy group of 200 people and
an unhealthy group of 17 people. This healthy group was
also used in a case study presented by Taguchi et al. (2001,
chap. 3).
The data variables consist of age (V 1), gender (V 2 ), and the
15 blood test measurements listed in Table 1. The data are
available in EXCEL format from the rst author.
4.1 Results of the MTS
As described by Taguchi and Rajesh (2000), the MD val-
ues were calculated in stage 1 for the healthy group, forming
the Mahalanobis space. The reported MD values ranged from
.3784 to 2.3581. The average MD value is given as .9951,
which is, apart from rounding error, equal to (mƒ 15=m D199=200 D 0995, as expected. In stage 2, the MD values cal-culated using the observations from the unhealthy group were
higher, ranging from 7.7274 to 135.6978, so the measurement
scale was said to be good. We note that such a wide, clear
separation between the groups of interest is often not possible
in many applications of traditional statistical methods.
Because there were 17 variables, Taguchi and Rajesh
(2000) selected an L32 (2315 OA in stage 3. This fractional
factorial design can accommodate up to 31 factors with 32
runs. Taguchi and Rajesh assigned the 17 variables to the rst17 columns of the array. The remaining columns are ignored.
The MD values were calculated for all 17 unhealthy patients,
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
6/16
THE MAHALANOBIS–TAGUCHI SYSTEM 5
Table 1. The Case Study Blood Test Variables With Normal Ranges
Variables Symbol Acronym Normal ranges Taguchi et al. (2001) normal ranges
Total protein in blood V 3 TP 6.0–8.3 g/dL 6.5–7.5 g/dL Albumin in blood V 4 Alb 3.4–5.4 g/dL 3.5–4.5 g/dLCholinesterase V 5 ChE Depends on technique; .60–1.00 dpH
(pseudocholinesterase) 8–18 U/mLGlutamate O transaminase V 6 GOT 10–34 IU/L 2–25 U
(asparate aminotransferase)Glutamate P transaminase V 7 GPT 6–59 U/L 0–22 U
(alanine transaminase)Lactic dehydrogenase V 8 LDH 105–333 IU/L 130–250 U
Alkaline phosphatase V 9 Alp 0–250 U/L, normal; 250–750 U/L, 2.0–10.0 Umoderate elevation
r-glutamyl transpeptidase V 10 r-GPT 0–51 IU/L 0–68 U(gamma-glutamate transferase)
Leucine aminopeptidase V 11 LAP Serum: Mal e: 80–200 U/mLFemale: 75–185 U/mL
Total cholesterol V 12 TCh
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
7/16
6 WILLIAM H. WOODALL ET AL.
Group ... :: : .
1 +---------+---------+---------+---------+---------+-------MTS
.. . . . . .2 +---------+---------+---------+---------+---------+-------MTS
.
:..:: .
1 +---------+---------+---------+---------+---------+-------OA Optimal
.... . . .2 +---------+---------+---------+---------+---------+-------OA Optimal
.
:: :. ..1 +---------+---------+---------+---------+---------+-------Optimal
: . : . .2 +---------+---------+---------+---------+---------+-------Optimal 0 25 50 75 100 125
Figure 1. Dotplot of MD Values for MTS, OA Optimal, and Optimal Combinations By Group (1D mild; 2 D moderate).
next examination or the loss increase after having subjective
symptoms followed by taking a complete examination, and
Dü is the “mid-value” of the MD of a patient group havingthe subjective symptoms. It is pointed out that T will vary by
disease, because the costs will vary by disease. The terms used
in (6) are not clearly dened, however, because the meaning of
“subjective symptoms” is not clear. It is important to note that
statistical approaches based on misclassication costs would
incorporate into any decision rule the probability of having the
disease, given the data on a subject (see, e.g., Zielezny and
Dunn 1975).
4.2 Results Using Standard Methods
Descriptions of the MTS do not mention graphical displays
of the raw data. Our rst step in the analysis of the medical
data, however, was to plot each variable by status (healthyD 1;mild diseaseD 2; medium diseaseD 3). These plots are shownin the Appendix.
A key aspect of medical diagnosis involves noting which
variables fall outside their corresponding normal ranges. Nor-
mal ranges are calculated to include 95% of the measurements
on all healthy patients. Taguchi et al. (2001, p. 3) discounted
the usefulness of these ranges based on the work of Kanetaka
(1990), stating that they are arbitrarily determined by test
chemical manufacturers or, in extreme cases, textbook val-
ues used without modication. From the discussion of Harris
(1981), however, it seems that considerable effort has gone
into the determination of normal ranges. The standard practice
of using normal ranges in medical diagnosis does have prob-
lems, however, as listed by Begg (1991), one of which is the
fact that “normalcy is an inherently multivariate concept.”
The normal ranges that we obtained from the National
Library of Medicine (2001) are given in Table 1. The normal
range for alkaline phosphatase (V 9 ) was obtained from
Neuschwander-Tetri (1995). The ranges given by Taguchi
et al. (2001, p. 36) for several of the variables are also shown
in Table 1.
Note that the pair of normal ranges for cholinesterase (V 5 )
in Table 1 do not match each other and are inconsistentwith the values of this variable given in the dataset. Thus
we do not consider the normal range for this variable. In
addition, the normal range given by Taguchi et al. (2001) for
alkaline phosphatase (V 9
) does not match the values given in
the dataset. It can be noted that the normal ranges given by
Taguchi et al. (2001) do not exactly match those given by
the National Library of Medicine for the other variables. It is
not unusual for different sources to give somewhat different
normal ranges. Also, the original study was done in Japan,
so there could be differences in the normal ranges for the
Japanese and the U.S. populations. The normal range for a
variable also depends on the measurement method used. We
have no information on the measurement methods used in
this case study.
Table 3 lists each variable for each unhealthy patient
that is well outside the corresponding normal range for
each unhealthy patient. We use the normal ranges from the
National Library of Medicine, with the exception of alkaline
phosphatase (V 9), because we have the ranges for all variables
and, for the most part, they cover more of the corresponding
values of the healthy group. Note that subjects 2 and 3 do
not have any variables clearly outside any of the normal
ranges, but they differ considerably from the healthy group
Table 3. Variables for Unhealthy Patients
Well Outside Normal Ranges
Su bje ct nu mb er Variab le numb er
1 12, 132 None3 None4 135 106 77 78 139 12, 13
10 4, 1211 10, 1212 1013 1014 10, 13
15 6, 7, 1316 3, 6, 7, 10, 1217 6, 7, 8, 10, 13
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
8/16
THE MAHALANOBIS–TAGUCHI SYSTEM 7
with respect to V 5. The relevance of the various variables to
the diagnosis of liver disease is discussed in Section 4.3.
The following conclusions can be reached by considering
the raw data, the dotplots, and the normal ranges:
1. We note from Figure A.1 that the unhealthy patients are
on average 10 years older than the healthy patients. If the
medical variables vary naturally by age, then it would seem
important to have roughly the same range of ages in the twogroups.
2. From Figure A.14, there is a large difference between
the abnormals and the healthy group for phospholipid (V 14 ),but all values of this variable are within the normal range.
3. It is not clear from the univariate dotplots in Figures
A.15 and A.17 why creatine (V 15 ) and uric acid (V 17 ) should
be declared to be useful variables for the MTS.
4. Some variables dropped under the MTS could be use-
ful in the diagnosis for particular patients. In particular, this
appears true for variables V 6 and V 7 for subjects numbered 15,
16, and 17 in the unhealthy group.
The scatterplot of cholinesterase (V 5 ) and r-GPT (V 10 )
shows a clear separation between the healthy and unhealthy
patients. This plot is shown in Figure 2, with healthy subjects
represented by 1, those with mild disease by 2, and those
with medium disease by 3. All outlying points correspond
to unhealthy patients with two values plotted at the point
(318, 44).
Similarly, the unhealthy patients also show up in the scat-
terplot of PL (V 14 ) versus TCh (V 12 ). This is illustrated in
Figure 3. Taguchi et al. (2001, p. 37) give the correlation
matrix for the variables for the healthy group that shows the
variables V 12 and V 14 as the most highly correlated pair.
There are some signicant differences in variation by gen-
der over all groups. This is illustrated in Figure 4 by the r-GPT
(V 10 ).
There has been an extensive amount of research on the use
of statistical modeling for medical diagnosis (see, e.g., Sahai
and Khurshid 1991). We applied the methods of discriminant
analysis to the medical data, as discussed by Albert and Harris
(1987, pp. 101–115). Interestingly, these authors apply dis-
criminant analysis to the diagnosis of liver disease to illustrate
their approach. We performed the discriminant analysis under
the assumption of multivariate normality for two groups, with
7006005004003002001000
250
200
150
100
50
0
V5 ChE
V 1 0 r - G P T
Figure 2. Scatterplot of Variable 10 Versus Variable 5 ( , 1; + , 2; , 3).
300200100
350
250
150
V12 TCh
V 1 4 P L
Figure 3. Scatterplot of Variable 14 Versus Variable 12 ( , 1; + , 2; , 3).
gender excluded in the analysis and a log transformation on
V 10 . The resulting discriminant function did not do as well
as the MTS recommended scale, however, in separating the
patients with mild disease severity from those with medium
disease severity.
From the medical considerations discussed in more detail
later, however, it is not reasonable to simply use collectively
all of the variables in this dataset to assess the severity of liver
disease. As discussed by Bodily and Fitz (1996) and Chopra
(2001), the level of liver disease is most often measured by the
modied Child–Pugh classication score, which is based on
two clinical and three biochemical measures. The two clinical
measures are ascites (uid in the abdomen) and encephalopa-
thy (mental alertness), and the three biochemical measures are
bilirubin, albumin, and prothrombin time (blood clotting fac-
tor). Only albumin [Alb (V 4 )] is included in the dataset used
for this case study. Thus it is not possible to accurately assess
the level of liver disease for the patients listed as “abnormal.”
4.3 Medical Considerations
In this case study we have considered using the MTS
in assessing the presence and extent of liver disease in a
limited group of Japanese patients. Taguchi and Rajesh (2000)
attempted to derive a valid diagnostic scale based on these
data. We have no information on how the patients were
selected, or on the criteria used to identify the severity of their
disease. Despite reservations concerning the data, we have
presented some statistical results regarding the performance of
the MTS. In this section we discuss some important medical
issues concerning the difculty of treating liver disease as
a single entity, the shortcomings resulting from the use of
2001000
V10 r-GPT
V2 Gender
1
10
Figure 4. Dotplot of V-10, r-GPT, by Gender.
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
9/16
8 WILLIAM H. WOODALL ET AL.
so-called “liver function tests” (LFTs), and Taguchi and
Rajesh’s (2000) lack of data from some critical, standard LFTs
used for the diagnosis of liver disease and the classication
of its severity level.
The diagnosis of liver disease is complicated for several
reasons. For one, it is attributed to a diverse number of liver
disorders with highly variable underlying pathophysiologyand
clinical presentations. In addition, the only way to obtain spe-cic diagnostic results is often through invasive techniques
(e.g., radiologic procedures and liver biopsy) or immunologic
tests that allow specic diagnoses (e.g., hepatitis serology).
The LFTs are also often used for diagnostic purposes. They
represent a collection of tests that seldom give a specic diag-
nosis; rather, they suggest a general category of liver disorders
(Pratt and Kaplan 1999). It is essential that LFTs be used
collectively, because they have a limited sensitivity and speci-
city. According to Pratt and Kaplan (1999, p. 206) “when
more than one of these tests provides abnormal ndings or the
ndings are persistently abnormal on serial determinations, the
probability of liver disease is high. When all test results are
normal, the probability of missing occult liver disease is low.”
The LFTs are divided into three major categories: (1) tests
of the liver’s ability to transport organic anions and metabo-
lize drugs, such as serum bilirubin; (2) tests that detect injury
to liver cell, including aminotrasferases, such as GOT (V 6),
transaminases, such as GPT (V 7 ), and alkaline phosphatase
Alp (V 95; and (3) tests of liver’s biosynthetic capacity, includ-
ing serum albumin Alb (V 4 ), and blood clotting factors, such
as prothrombin time (Kaplan 1990). Indeed, three of these
LFTs—Alb (V 4 ), prothombin time, and bilirubin—are usedin the “modied Child–Pugh classication,” the classication
standard for severity of liver disease (Bodily and Fitz 1996;
Chopra 2001). In this classication, the severity level is deter-
mined by two physical ndings (ascites and encephalopathy)
and the three aforementioned LFTs. It should be noted that
Taguchi and Rajesh (2000) made no mention of the modied
Child–Pugh classication and gave no data for the two crit-
ical LFTs (bilirubin and prothrombin time) for the patients
in this case study. The only critical LFT reported by Taguchi
and Rajesh (2000), that of Alb (V 4 ), is consistently normal
(3.6–5.8 g/dL) in all 17 “abnormal” patients. In the modied
Child–Pugh classication, an Alb (V 4 ) level of 2.8–3.5 g/dL
is consistent with mild disease, whereas moderate or severe
disease is often found in patients with an Alb ( V 4) level less
than 2.8 g/dL (Bodily and Fitz 1996; Chopra 2001).
There are two general types of liver disease, acute and
chronic. In acute liver disease, the prominently abnormal
LFTs are the aminotransferases [e.g., GOT (V 65], which
often exceed 500 IU and can frequently reach levels in the
thousands while most other tests remain normal for a while
(Kaplan 1990; Pratt and Kaplan 1999). In contrast, in chronic
liver failure, the aminotransferases [e.g., GOT (V 65] and
transaminases [e.g., GPT (V 75] increase minimally to less than
500 IU, whereas the remaining LFTs are variable, according
to the underlying pathology.
In chronic liver disease, three major subtypes can be
identied: chronic hepatocellular disorders (e.g., cirrhosis oralcoholic liver disease), cholestasis (e.g., obstruction), and
inltrative disorders (e.g., tumors or tuberculosis). Each of
these subcategories has a specic pattern of presentation. In
the case of hepatocellular disorder, an Alb (V 4) level below
3.0 g/dL and an abnormally prolonged prothrombin time, with
only minimally increased aminotransferases [e.g., GOT (V 65]to a level below 300 IU is the norm. A ratio of GPT/GOT
above 2.0 strongly suggests alcoholic liver disease in that
setting (Clermont and Chalmers 1967). Whereas 70% of
patients with alcoholic liver disease have GPT/GOT above2.0, this is encountered in only 5% or less of patients with
other disorders (Cohen and Kaplan 1979).
In the cholestatic form of liver disease, the pattern is differ-
ent. There, the Alp (V 9) is usually elevated out of proportionwith other enzymes. Values exceeding four times the normal
level suggest cholestasis (Pratt and Kaplan 1999). Because
Alp (V 9) has a close linear relation with serum r-glutamyl
transpeptidase [r-GPT (V 10 )], it is logical to look for similar
changes in r-GPT (V 10 ) (Whiteld et al. 1972). If Alp (V 9) is
elevated and r-GPT is not, then one would assume that Alp
(V 9) is not of liver origin (probably of bone disease origin).
Aminotransferases [e.g., GOT (V 6
)] are usually elevated to
levels up to 300 IU, with values exceeding 500 IU being rare.
In cases of inltrative liver disease, the pattern is closer to
that seen with obstruction. Often, the earliest and only abnor-
mal test is Alp (V 9). Aminotransferases [e.g., GOT (V 6 )] are
normal or minimally elevated, and so are Alb (V 4 ) and pro-
thrombin time (Pratt and Kaplan 1999).
Of all LFTs (variables) in the patients’ dataset used for this
case study, the most relevant ones for liver disease diagnosis
and classication are Alb (V 4), GOT (V 6), GPT (V 7 ), Alp (V 9),
and r-GPT (V 10 ). The data results for the LFTs V 3, V 5 , V 8 ,and V 11 –V 17 are not directly relevant to liver disease. From
the foregoing medical discussion and the case study data, it
is quite clear that while “abnormal” patients 15–17 seem to
exhibit some chronic hepatocellular disease (e.g., cirrhosis or
alcoholic liver disease), all other patients, both “normal” and
“abnormal,” do not seem to exhibit any notable liver disease.
In fact, it is quite doubtful that any patient participating in this
case study has any signicant liver disease, certainly not acute,
because no patient has an Alb (V 4) level below 3.5 g/dL.
Although some of the abnormal patients 1–14 could exhibit
some extremely weak signs of chronic cholestasis (e.g.,
obstuction) or inltrative disorders (e.g., tumors), such a
diagnosis would certainly require additional results from
the two critical LFTs (bilirubin and prothrombin time) and
would benet from some physical ndings (e.g., ascites), as
suggested in the modied Child–Pugh classication method.
However, these data are not available for the case study.
Moreover, the use of only 17 “abnormal” patients is an
extremely small sample for liver disease diagnosis and
classication, given the highly diverse number of disorders
attributed to liver disease.
Finally, it is important to note that cluster analysis, which
was applied to the ve most relevant LFTs (variables)
Alb (V 4), GOT (V 6), GPT (V 7 ), Alp (V 9), and r-GPT (V 10 ) (for
the combined sample of male and female patients), yielded
an optimal number of two clusters. One cluster grouped
together the “normal” patients with “abnormal” patients, 1–14,whereas the second cluster consisted of the three “abnormal”
patients, 15–17. That is, the results of cluster analysis are
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
10/16
THE MAHALANOBIS–TAGUCHI SYSTEM 9
in full agreement with a careful medical diagnosis based on
the data available for the case study. When the MTS analysis
was similarly applied to these same ve most relevant LFTs
for both 8 and 32 runs, however, the results were consistent,
but different from the results of the cluster analysis and the
medical diagnosis. This suggests that in this case, the problem
with the MTS analysis is connected with the use of the S/N
ratio measure rather than the interaction issue from the OA.In general, however, both of these issues can cause problems.
5. OTHER ASPECTS OF THE
MAHALANOBIS–TAGUCHI SYSTEM
Taguchi et al. (2001) presented several other methods in the
MTS framework. We summarize these in this section.
5.1 Forecasting
Taguchi et al. (2001, chap. 3) presented an application of
the MTS to evaluate the amount of credit that should be
extended to applicants. The MTS is proposed as an alternative
to credit-scoring methods. Taguchi et al. (2001, p. 25) statedthat traditional methods in this area have not been successful,
because only people who defaulted on loans were studied.
Data on good customers is routinely used to build credit-
scoring models, however, as discussed by Reichert, Cho, and
Wagner (1983).
In the MTS approach, the values of M j represent losses to
the company due to unpaid bills. The regression model in (3)
is tted, and the loss corresponding to an applicant with an
MD value of D 2 is estimated to be
M D D= O‚ 34p
MSE = O‚50This practice of using a tted regression line to estimate the
value of an unobserved independent variable corresponding
to an observed value of the dependent variable is called
“calibration” in the statistical literature. Brownlee (1965,
pp. 361–362) discussed the calibration problem specically
for a line through the origin. As discussed by Mee and
Eberhardt (1996), the statistical approach to this problem
accounts for the error in estimating the variance and the slope
of the line. This sampling variation is ignored in the MTS.
5.2 Use in Clinical Trials
Taguchi et al. (2001, chap. 5) pointed out that clinical tri-
als involve large numbers of subjects, require quite a long
time, and are very expensive. The two reasons given for this
are the large individual differences between patients and the
use of attribute data, not continuous variables. It is stated
that if continuous variables, such as the MD, could be used,
then the study could be conducted by observing only one or
two patients in a short period. Statisticians would nd this
claim astounding, because clinical trials must have sample
sizes sufciently large for investigators to measure effective-
ness relative to other treatments, determine dosage, assess the
side effects of the treatment being studied, and to determine
which types of patients in a very heterogeneous population
benet most from the treatment. Taguchi et al. (2001, p. 4)considered use of the MTS in clinical trials to be its most
exciting potential application.
Taguchi et al. (2001, chap. 5) proposed a method for com-
paring the effectiveness of two treatments. Only one patient
is used for each treatment. The MD values of each patient
are recorded over time during treatment. The MD values of
the two patients are scaled using the corresponding initial
MD values, and regression equations are tted to show the
changes in the transformed MD values over time. The treat-
ments are compared by comparing the estimated slopes of the two lines. In statistical terminology, this corresponds to
a repeated-measurements experiment for two treatments, but
with only one subject in each treatment group. Statisticians
would never recommend this practice, however, because varia-
tion between subjects cannot be assessed. The treatment effect
is confounded with the difference between subjects.
5.3 Use of Principal Components
Taguchi and Rajesh (2000) pointed out that in some applica-
tions of the MTS, there are two types of abnormalities present.
For example, in the graduate student admission process there
could be very good, as well as very bad, applicants. Thus,
they noted that it is important to identify the direction of the
abnormality. They stated further that this cannot be done with
the MD values calculated using the inverse of the correlation
matrix, but it can be done using the Gram–Schmidt orthogo-
nalization process.
The Gram–Schmidt process is recommended for obtaining
a set of mutually perpendicular vectors from a set of linearly
independent standardized original vectors. It appears that this
is a recommendation for obtaining the values of the principal
components of the abnormal items based on the correlation
matrix of the normal group. The discussion is not clear, how-
ever, for several reasons. First, the classication into the good
and bad categories is based on the signs of the principal com-
ponents. Often this would not be any more helpful than using
the signs of the standardized original variables. Second, the
threshold of the MD values shown on bivariate plots should
correspond to an ellipse, but instead linear limits are drawn.
Third, the axes, corresponding to what appear to be the prin-
cipal components in the bivariate plots, are not drawn along
the major and minor axes of the MD contour ellipse and are
not centered at the origin, as would be expected.
6. CONCLUDING REMARKSAs statisticians, we much prefer the multivariate statisti-
cal approaches based on underlying probability models to the
MTS. Mahalanobis (1950) also greatly valued the use of prob-
ability, stating that statistics supplies the basis for choosing a
particular course of action in practical problems by balancing
the risks of gain and loss using the calculus of probability. He
also held that the cross-examination of the data was the rst
responsibility of the statistician (Mahalanobis 1965). Ques-
tioning the validity of the data and the use of exploratory data
analysis is not mentioned as part of the MTS.
Statistical methods are better designed to account for
variation between units in the groups and to account forsampling variation. The MTS does not adequately address
the issue of variation between items, because this variation
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
11/16
10 WILLIAM H. WOODALL ET AL.
typically results in at least some classication errors, when a
classication rule is developed from a dataset. This lack of
attention to variation between units is most evident in the MTS
clinical trials methods, in which variation between individu-
als is completely ignored. In addition, sampling variation is
ignored in the decision rules involving the S/N ratios and in
the calibration problem involving prediction of an M j value
based on the value of MD.It should be noted that some of the application areas men-
tioned for the MTS have been studied extensively in the
statistics and other subject matter literature, including medical
diagnosis and credit scoring. These bodies of work are ignored
in the development of the MTS approaches.
From the case study presented in Section 4, the MTS anal-
ysis based on the S/N ratio in (5) does not necessarily lead to
a good MD scale in that separation between the classes with
different severities of abnormality can be very poor. Of course,
it is possible to use a more effective search algorithm than that
based on the OA and to modify the S/N ratio. Even with such
modications, however, we believe that there are still impor-tant unresolved conceptual issues with the MTS, and that with
further development of the basic approach, one would eventu-
ally need to incorporate methods based on probability. Despite
such serious shortcomings, however, we expect the MTS to
become more widely used in industry. Many practitioners will
understand the advantages of using multivariate data, but will
lack the expertise required to implement statistical approaches.
ACKNOWLEDGMENTSThe research of W. H. Woodall, R. Koudelik, K.-L. Tsui,
and S. B. Kim was partially supported by National Science
Foundation-DMI grant 9908013. K.-L. Tsui’s work was also
partially supported by The Logistic Institute—Asia Pacic,
Singapore. The work of Z. G. Stoumbos was funded in part
by the Law School Admission Council (LSAC) and by a
2001 Rutgers Faculty of Management Research Fellowship.
The opinions and conclusions contained in this publication
are those of the authors and do not necessarily reect the
position or policy of LSAC. We thank Rajesh Jugulum and
Genichi Taguchi for providing the medical case study dataset
and allowing us to distribute it. We also thank the referees andthe associate editor for their helpful comments.
APPENDIX: DOTPLOTS FOR THE MEDICAL DATA VARIABLES
(Status: HealthyD 1, mild diseaseD 2; medium diseaseD 3)
20 30 40 50 60
V1 Age
Status
1
2
3
Figure A.1. Dotplot of V 1 (Age) by Patient Status.
1 2 3 4 5 6 7 8 9 10
V2 GenderEach dot represents up to 3 observations.
Status
1
2
3
Figure A.2. Dotplot of V 2 (Gender) by Patient Status.
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
12/16
THE MAHALANOBIS–TAGUCHI SYSTEM 11
6 7 8
V3 TP
Status
1
2
3
Figure A.3. Dotplot of V 3 (Total Protein) by Patient Status.
3.8 4.8 5.8
3
3.8 4.8 5.8
V4 Alb
Status
1
2
Figure A.4. Dotplot of V 4 (Albumin) by Patient Status.
100 200 300 400 500 600 700
V5 ChE
Status
1
2
3
Figure A.5. Dotplot of V 5 (Cholinesterase) by Patient Status.
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
13/16
12 WILLIAM H. WOODALL ET AL.
50 100 150
V6 GOT
Status
1
2
3
Figure A.6. Dotplot of V 6 (Glutamate O Transaminase) by Patient Status.
20 70 120 170
V7 GPT
Status
1
2
3
Figure A.7. Dotplot of V 7 (Glutamate P Transaminase) by Patient Status.
100 200 300 400
V8 LHD
Status
1
2
3
Figure A.8. Dotplot of V 8 (Lactic Dehydrogenase) by Patient Status.
100 200 300
V9 Alp
Status
1
2
3
Figure A.9. Dotplot of V 9 (Alkaline Phosphatase) by Patient Status.
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
14/16
THE MAHALANOBIS–TAGUCHI SYSTEM 13
0 1 00 200
V10 r-GPT
Status
1
2
3
Figure A.10. Dotplot of V 10 (r-Glutamyl Transpeptidase) by Patient Status.
40 50 60 70 80 90 100 110 120
V11 LAP
Status
1
2
3
Figure A.11. Dotplot of V 11 (Leucine Aminopeptidase) by Patient Status.
100 200 300
V12 TCh
Status
1
2
3
Figure A.12. Dotplot of V 12 (Total Cholesterol) by Patient Status.
100 200 300 400
V13 TG
Status
1
2
3
Figure A.13. Dotplot of V 13 (Triglyceride) by Patient Status.
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
15/16
14 WILLIAM H. WOODALL ET AL.
150 250 350
V14 PL
Status
1
2
3
Figure A.14. Dotplot of V 14 (Phospholipid) by Patient Status.
1.0 1.5 2.0
V15 Cr
Status
1
2
3
Figure A.15. Dotplot of V 15 (Creatinine) by Patient Status.
8 18 2313
V16 BUN
Status
1
2
3
Figure A.16. Dotplot of V 16 (Blood Urea Nitrogen) by Patient Status.
TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1
-
8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi
16/16
THE MAHALANOBIS–TAGUCHI SYSTEM 15
2.5 3.5 4.5 5.5 6.5 7.5 8.5
V17 UA
Status
1
2
3
Figure A.17. Dotplot of V 17 (Uric Acid) by Patient Status.
[Received August 2001. Revised December 2001.]
REFERENCES
Adams, B. M., and Woodall, W. H. (1989), “An Analysis of Taguchi’sOn-Line Process Control Method Under a Random Walk Model,” Techno-metrics, 31, 401–413.
Albert, A., and Harris, E. K. (1987), Multivariate Interpretation of Clinical Laboratory Data, New York: Marcel Dekker.
Begg, C. B. (1991), “Advances in Statistical Methodology for DiagnosticMedicine in the 1980’s,” Statistics in Medicine, 10, 1887–1895.
Bodily, K. O., and Fitz, J. G. (1996), “Approach to the Patient with SuspectedLiver Disease,” in Current Diagnosis & Treatment in Gastroenterology,
eds. J. H. Grendell, K. R. McQuaid, and S. L. Friedman, Stamford, CT:Appleton & Lange, pp. 461–474.
Box, G. E. P. (1996), “The Role of Statistics in Quality and ProductivityImprovement,” Journal of Applied Statistics, 23, 3–20.
Brownlee, K. A. (1965), Statistical Theory and Methodology in Science and
Engineering, New York: Wiley.Chopra, S. (2001), “Diagnostic Approach to the Patient with Cirrhosis,” UpTo-
Date (www.uptodate.com), 9, 1–6.
Clermont, R. J., and Chalmers, T. C. (1967), “The Transaminase Tests inLiver Disease,” Medicine, 46, 197–207.
Cohen, J. A., and Kaplan, M. M. (1979), “The SGOT/SGPT Ratio: An Indi-
cator of Alcoholic Liver Disease,” Digestive Diseases and Sciences, 24,
835–839.Harris, E. K. (1981), “Statistical Aspects of Reference Values in Clinical
Pathology,” in Progress in Clinical Pathology VIII , eds. M. Stefanini andE. Benson, New York: Grune and Stratton, pp. 45–66.
Kanetaka, T. (1990), “Diagnosis of a Special Health Check Using Maha-lanobis Distance,” ASI Journal, 3.
Kaplan, M. M. (1990), “Evaluation of Hepatobiliary Disease,” in Internal
Medicine, (3rd ed.), eds. J. H. Stein et al., Boston, MA: Little, Brown, p. 443.Lunani, M., Nair, V. N., and Wasserman, G. S. (1997), “Graphical Meth-
ods for Robust Design with Dynamic Characteristics,” Journal of Quality
Technology, 29, 327–338.Mahalanobis, P. C. (1950), “Why Statistics?,” Sankhy Na, 10, 195–228.
(1965), “Statistics as a Key Technology,” The American Statistician,
19, 43–46.Mee, R. W., and Eberhardt, K. (1996), “A C omparison of Uncertainty Criteria
for Calibration,” Technometrics, 38, 221–229.
Montgomery, D. C. (1992), “The Use of Statistical Process Control andDesign of Experiments in Product and Process Improvement,” IIE Trans-
actions, 24, 4–17.
Nair, V. N. (ed.) (1992), “Taguchi’s Parameter Design: A Panel Discussion,”
Technometrics, 34, 127–161.
National Library of Medicine (2001), MEDLINEplus Health Information(www.nlm.nih.gov/medlineplus), May, 16, 2001.
Nayebpour, M. R., and Woodall, W. H. (1993), “An Analysis of Taguchi’s
On-Line Quality Monitoring Procedures for Attributes,” Technometrics, 35,53–60.
Neuschwander-Tetri, B. A. (1995), “Common Blood Tests for Liver Disease,”Postgraduate Medicine, 98, 49–63.
Pratt, D. S., and Kaplan, M. M. (1999), “Evaluation of the Liver. A. Lab-oratory Tests,” i n Schiff’s Diseases of the Liver (5th ed.), eds. E. R.Schiff, M. F. Sorrell, W. C. Maddrey, Philadelphia: Lippincott-Raven,pp. 205–244.
Reichert, A. K., Cho, C.-C., and Wagner, G. M. (1983), “An Examinationof the Conceptual Issues Involved in Developing Credit-Scoring Models,”
Journal of Business and Economic Statistics, 1, 101–114.
Sahai, H., and Khurshid, A. (1991), “Mathematical and Statistical Models inComputer-Assisted Medical Diagnosis: An Overview and a Selected Bibli-ography,” Journal of Clinical Computing, 20, 33–81.
Taguchi, G. (1981), On-Line Quality Control During Production, Tokyo:Japanese Standards Association.
Taguchi, S., Chowdhury, S., and Taguchi, S. (2000), Robust Engineering,New York: McGraw-Hill.
Taguchi, G., Chowdhury, S., and Wu, Y. (2001), The Mahalanobis–Taguchi
System, New York: McGraw-Hill.
Taguchi, G., Elsayed, E. A., and Hsiang, T. (1989), Quality Engineering inProduction Systems, New York: McGraw-Hill.
Taguchi, G., and Rajesh, J. (2000), “New Trends in Multivariate Diagnosis,”Sankhy Na, 62, 233–248.
Taguchi, G., and Wu, Y. (1980), Introduction to Off-Line Quality Control,
Nagoya, Japan: Japan Quality Control Organization.Tracy, N. D., Young, J. C., and Mason, R. L. (1992), “Multivariate Control
Charts for Individual Observations,” Journal of Quality Technology, 24,
88–95.Tsui, K.-L. (1996), “A Critical Look at Taguchi’s Modeling Approach for
Robust Design,” Journal of Applied Statistics, 23, 81–95.(1999), “Response Model Analysis of Dynamic Robust Design Exper-
iments,” IIE Transactions, 31, 1113–1122.
Whiteld, J. B., Pounder, R. E., Neale, G., and Moss, D. W. (1972), “Serumƒ -Glytamyl Transpeptidase Activity in Liver Disease,” Gut , 13, 702–708.
Wu, C. F. J., and Hamada, M. (2000), Experiments: Planning, Analysis, and
Parameter Design Optimization, New York: Wiley.Zielezny, M., and Dunn, O. J. (1975), “Cost Evaluation of a Two-Stage Clas-
sication Procedure,” Biometrics, 31, 37–47.