a review and analysis of the mahalanobis—taguchi

8/18/2019 A Review and Analysis of the Mahalanobis—Taguchi

1/16


2/16

A Review and Analysisof the Mahalanobis–Taguchi System

William H. Woodall and Rachelle Koudelik

Department of Statistics

Virginia Polytechnic Institute

and State University

Blacksburg, VA 24061

( [email protected]; [email protected] )

Kwok-Leung Tsui and Seoung Bum K im

School of Industrial and Systems Engineering

Georgia Institute of Technology

Atlanta, GA 30332

( [email protected] ch.edu; [email protected])

Zachary G. Stoumbos

Department of Management Science

and Information Systems and Rutgers Center

for Operations Research (RUTCOR)

Rutgers, The State University of New Jersey

Piscataway, NJ 08854

( [email protected] s.edu)

Christos P. Carvounis, MD

State University of New York at Stony Brook

Nassau University Medical Center

East Meadow, NY 11554

( [email protected] )

The Mahalanobis–Taguchi system (MTS) is a relatively new collection of methods proposed for diagno-

sis and forecasting using multivariate data. The p rimary proponent of the MTS is Genichi Taguchi, who

is very well known for his controversial ideas and methods for using designed experiments. The MTS

results in a Mahalanobis distance scale used to measure the level of abnormality of “abnormal” items

compared to a group of “normal” items. First, it must be demonstrated that a Mahalanobis distance

measure based on all available variables on the items is able to separate the abnormal items from the

normal items. If this is the case, then orthogonal arrays and signal-to-noise ratios are used to select

an “optimal” combination of variables for calculating the Mahalanobis distances. Optimality is dened

in terms of the ability of the Mahalanobis distance scale to match a prespecied or estimated scale

that measures the severity of the abnormalities. In this expository article, we review the methods of

the MTS and use a case study based on medical data to illustrate them. We identify some conceptual,

operational, and technical issues with the MTS that lead us to advise against its use.

KEY WORDS: Classication analysis; Discriminant analysis; Medical diagnosis; Multivariate analy-sis; Pattern recognition; Signal-to-noise ratio; Taguchi methods.

1. INTRODUCTION

Genichi Taguchi is most well known for his work on

the design of experiments. His ideas have generated a

considerable amount of discussion and controversy and his

methods are widely used (see, e.g., Taguchi and Wu 1980;

Box 1996; Montgomery 1992; Nair 1992; Tsui 1996; Wu and

Hamada 2000; Taguchi, Chowdhury, and Taguchi 2000). The

general consensus, among statisticians at least, seems to be

that although many of Taguchi’s overall ideas on experimental

design are very important and inuential, the techniques thathe proposed should be replaced with simpler, more effective

statistical methods.

It is not as well known that Taguchi also proposed on-line

quality control methods (Taguchi 1981; Taguchi, Elsayed, and

Hsiang 1989). Adams and Woodall (1989) and Nayebpour

and Woodall (1993), among others, have studied these on-line

methods. Taguchi’s off-line ideas have had a much greater

impact than his ideas on on-line quality control.

We study a new set of methods proposed by Taguchi,

Chowdhury, and Wu (2001) and Taguchi and Rajesh (2000)

collectively referred to as the Mahalanobis–Taguchi system

(MTS). The MTS is proposed as a diagnosis and forecastingmethod using multivariate data. In this approach, this mul-

tivariate data must be available on a “healthy” or “normal”

group of items and a number of “abnormal” items that may

sometimes be classied into groups based on the severity

levels of the abnormalities. In the MTS, it must rst be

conrmed that the relative sizes of the Mahalanobis distances

(MDs) based on the standardized variables of the healthy

group can discriminate between normal and abnormal items.

Once this fact is established, the number of variables used

is reduced, if possible, using orthogonal arrays (OAs) and

signal-to-noise (S/N) ratios to evaluate the contribution of

each variable. Each row of the OA determines a subset of the original variables. The recommended S/N ratio measures

the ability of the MDs, corresponding to the abnormal items

and calculated using this subset of variables, to reect a

prespecied or estimated measure of the severity of the

abnormalities. Only those variables with effects that show

an increase in the average S/N ratio are retained. The MD

scale using these variables has a number of stated purposes,

including diagnosis and forecasting.

© 2003 American Statistical Association and

the American Society for Quality

TECHNOMETRICS, FEBRUARY 2003, VOL. 45, NO. 1

DOI 10.1198/004017002188618626

1


3/16

2 WILLIAM H. WOODALL ET AL.

Taguchi et al. (2001) listed a number of areas of application

for the MTS, including inspection and sensor systems in man-

ufacturing, patient monitoring, re detection, earthquake fore-

casting, weather forecasting, credit scoring, and voice recog-

nition. They also described case studies involving engineering

applications of the MTS in many large companies, includ-

ing Nissan Motor, Mitsubishi Space Software, Xerox, Delphi

Automotive Systems, ITT Industries, Ford Motor, Fuji PhotoFilm, and others.

We review the MTS by explaining the approach and calcula-

tions in Section 2. In Section 3 we discuss the MTS and iden-

tify some conceptual, operational, and technical issues asso-

ciated with the methods. We present a detailed case study in

Section 4. We discuss other aspects of the MTS in Section 5,

and present concluding remarks in Section 6. A primary con-

clusion is that the methods of the MTS are, in some respects,

not well dened conceptually or operationally.

2. DESCRIPTION OF THE

MAHALANOBIS–TAGUCHI SYSTEM

In this section we provide a detailed explanation of the MTS

and the required computations, as presented by Taguchi and

Rajesh (2000). These authors break the MTS into four stages.

In stage 1, the variables that dene the “healthiness” of an

item are identied. Data are collected on the healthy or normal

group. As described later, the variables are standardized and

the MDs calculated for the normal items. These values dene

the “Mahalanobis space” used as a frame of reference for the

MTS measurement scale.

We refer to the variables collected on each item to deter-

mine its “healthiness” as V i , i D 11 21 : : : 1p. We denote byV ij the observation of the ith variable on the j th item, i D11 21 : : : 1p, j D 11 21 : : : 1m. Thus the p 1 data vectors forthe normal group are denoted by vj , j D 11 21: : : 1 m.

Each individual variable in each data vector is standardized

by subtracting the mean of the variable and dividing by its

standard deviation, with both statistics calculated using data

on the variable in the normal group. Thus we have the stan-

dardized values

Zij D 4V ij ƒSV i5¯

S i1 i D 11 21 : : : 1 p1 j D 11 21 : : : 1 m1(1)

where

SV i DmX

j D1V ij ̄ m

and

S i Ds

mXj D1

4V ij ƒSV i52¯

4mƒ150

Next, the values of the MDs, MDj , j D 11 21: : : 1 m, are cal-culated for the normal items using

MDj D 41¯

p5zT j Sƒ1

zj 1 (2)

where zT j D 6Z1j 1 Z2j 1 : : : 1Zpj 7 and S is the sample correlationmatrix calculated as

SD 1¯

4mƒ15mX

j D1zj z

T j 0

Taguchi and Rajesh (2000) stated that the MDj values in (2)

have an average value of unity. For this reason, they also refer

to the Mahalanobis space as the unit space.

In stage 2, abnormal items must be selected. There is no

uncertainty incorporated into the MTS regarding the status of

each item used for determining the MTS measurement scale.

As in discriminant analysis, it is assumed that each item is

known to be either normal or abnormal.The MDs of the abnormals with data vectors denoted by vj ,

j D mC 11 mC21 : : : 1mC t are calculated after the variablesare standardized using the normal-group means and standard

deviations. Thus we have MDj , j D mC11 mC 21: : : 1 mC t,with MDj dened in (2), where the ith element of zj in

(2), Zij , is calculated using (1), for i D 11 21 : : : 1p andj D mC11 mC 21 : : : 1mC t.

According to the MTS, the resulting MD scale is good if

the MDj values for the abnormal items are higher than those

for the normal items.

In stage 3, OAs and S/N ratios are used to identify the most

useful set of variables. An OA is a design matrix that containsthe levels of various factors in the runs of an experiment to

investigate the effects of the variables on a response of inter-

est. Each factor of the experiment is assigned to a column of

the OA, and the rows of the matrix correspond to the experi-

mental runs. The MTS has p factors in the experiment, each

with two levels. The level of a factor signies the inclusion

or exclusion of a variable in the MTS analysis. The p factors

are assigned to the rst p columns of the OA, with the other

columns ignored. Thus the OA selected must initially have

at least p columns. Each row of the OA determines which

variables are included in any given experimental run. For each

of these runs, the MD values are calculated for the abnormalsas in stage 2, but using only the indicated variables. These

MD values are then used to calculate the value of a S/N ratio,

which becomes the response for the run.

Many different S/N ratios are used in Taguchi’s analysis

of designed experiments. These are dened in such a way

that larger S/N ratio values are preferred. One option men-

tioned in the MTS is to use Taguchi’s larger-is-better S/N ratio,

dened as

ƒ10log"

41=t5mCtX

j DmC1

1

MDj

2́#

1

because larger MD values further separate the abnormals fromthe normal group. Taguchi and Rajesh (2000) recommended

using the dynamic type S/N ratio instead. For the dynamic

S/N ratio to be calculated, the severity value of each abnormal

item must be established. These severity levels are denoted by

M j , j DmC11 mC21 : : : 1mC t. Larger values of M j indicatea greater degree of abnormality. The goal of this stage is to

select a subset of the original variables such that the result-

ing MDj values of the abnormals most appropriately reect

the levels of severity M j . If the values of M j are unknown,

Taguchi and Rajesh (2000) recommended grouping the abnor-

mal items into classes based on a general level of severity,

perhaps obtained subjectively. The value of M j used for eachmember of a class is the average value of the square roots

of the MDs for the members in the class. These MDs are



4/16


5/16


not understood in the context of a meaningful sampling (and

conceptual) framework.”

In addition, in our view, the use of the MTS measurement

scale has never been clearly explained. Taguchi and Rajesh

(2000), for example, stated that the problem of the MTS is not

one of classication of a future observation into one of two

populations corresponding to normal and abnormal. Taguchi

et al. (2001, p. 7) stated that the MD values should be used“in continuous mode rather than discrete mode.” Nevertheless,

a university admission process is given as an application of

the MTS that would seem to require classication. Also, the

use of a threshold for MD in the MTS seems to imply classi-

cation. It is clear, however, that the MTS results in an MD

measurement scale that should measure the degree of abnor-

mality of the items. Use of the MD scale is similar to that of a

discriminant function in discriminant analysis. This similarity

is discussed further in the case study in Section 4. Another sta-

tistical option would be to use standard model-tting methods,

such as ordinal logistic regression, with the level of severity as

the dependent variable and the variables V i, iD

11 21 31 : : : 1p,

as the explanatory variables.

3.2 Operational Issues

In stage 2, it must be shown that the MD values of the

abnormal items are higher than those for the normal items. No

operational denition is given, however, for “higher than.” If

the criterion means that the smallest MD value for the abnor-

mal items must be higher than the largest value for the normal

items, as in the case study in Section 4, then this would appear

to limit the usefulness of the approach. If normal and abnor-

mal items are not clearly distinguishable, then it seems that

misclassication probabilities must be considered, somethingnot possible under the MTS framework that eschews the use

of probability.

A designed fractional factorial experiment is used as a

search algorithm for optimization in the MTS. The run for

which all factors are at their low levels is not a valid run,

however, because at least one variable must be used in the

analysis. Thus an OA containing this run could not be used.

The OA and the experimental design methods are used as an

optimization technique to nd the combination of variables

that maximize the S/N ratio. As illustrated in the case study

in Section 4, this optimal combination is not always obtained.

Fractional factorial designs are used in industry to reduce thenumber of runs, because each run is often expensive. This

goal seems much less important in an optimization application

involving only computations. Of course, the MTS approach

could be modied to include a better search algorithm for

the optimal combination of variables or another S/N ratio,

e.g., one based on a rank correlation coefcient that would

lead to an MTS scale that would match, to the greatest extent

possible, the order of the given severity levels of the abnormal

items.

3.3 Technical Issues

Taguchi and Rajesh (2000) stated that the expected value of MDj in (2) for the normal items is unity. This is an approxi-

mation, however, evidently based on a chi-squared distribution

with p degrees of freedom. This is the probability distribution

of pMDj , provided that sampling is from a multivariate nor-

mal distribution and the mean vector and variance-covariance

matrix are assumed to be known and used in the calculations

instead of the estimates. Under the assumption of multivariate

normality and estimation of the mean vector and variance-

covariance matrix, Tracy, Young, and Mason (1992) reported

that the marginal distribution of MD

j is related to a beta dis-tribution and has a mean of (mƒ 15=m, not unity. The meanof MDj is also (mƒ15=m if the m observations in the normalgroup represent the entire population of normal items. Finally,

it can be shown using matrix algebra that the average MD

value for the m items in the normal group is always exactly

(mƒ 15=m.Moreover, Taguchi and Rajesh (2000) stated that O‚ from (4)

is 1 when working averages are used to t the regression line

through the origin. This is true, however, only if the working

averages are calculated using the variables included in the par-

ticular run being considered. It is not reasonable to use just

the variables in each run to calculate the working averages,

because this would cause the measure of the degree of severity

of abnormal items, and their relative rankings, to vary from

run to run. Although descriptions of the MTS do not specify

explicitly the variables used to obtain the working averages,

all of the variables are used to obtain the working averages in

the medical data case study of Taguchi and Rajesh (2000).

4. A MEDICAL CASE STUDY

Taguchi and Rajesh (2000) and Taguchi et al. (2001) justi-

ed their MTS approach solely through the use of case studies.

In this section we consider a medical diagnosis case study

of Taguchi and Rajesh (2000) involving liver disease. Thestudy group comprised a healthy group of 200 people and

an unhealthy group of 17 people. This healthy group was

also used in a case study presented by Taguchi et al. (2001,

chap. 3).

The data variables consist of age (V 1), gender (V 2 ), and the

15 blood test measurements listed in Table 1. The data are

available in EXCEL format from the rst author.

4.1 Results of the MTS

As described by Taguchi and Rajesh (2000), the MD val-

ues were calculated in stage 1 for the healthy group, forming

the Mahalanobis space. The reported MD values ranged from

.3784 to 2.3581. The average MD value is given as .9951,

which is, apart from rounding error, equal to (mƒ 15=m D199=200 D 0995, as expected. In stage 2, the MD values cal-culated using the observations from the unhealthy group were

higher, ranging from 7.7274 to 135.6978, so the measurement

scale was said to be good. We note that such a wide, clear

separation between the groups of interest is often not possible

in many applications of traditional statistical methods.

Because there were 17 variables, Taguchi and Rajesh

(2000) selected an L32 (2315 OA in stage 3. This fractional

factorial design can accommodate up to 31 factors with 32

runs. Taguchi and Rajesh assigned the 17 variables to the rst17 columns of the array. The remaining columns are ignored.

The MD values were calculated for all 17 unhealthy patients,



6/16

THE MAHALANOBIS–TAGUCHI SYSTEM 5

Table 1. The Case Study Blood Test Variables With Normal Ranges

Variables Symbol Acronym Normal ranges Taguchi et al. (2001) normal ranges

Total protein in blood V 3 TP 6.0–8.3 g/dL 6.5–7.5 g/dL Albumin in blood V 4 Alb 3.4–5.4 g/dL 3.5–4.5 g/dLCholinesterase V 5 ChE Depends on technique; .60–1.00 dpH

(pseudocholinesterase) 8–18 U/mLGlutamate O transaminase V 6 GOT 10–34 IU/L 2–25 U

(asparate aminotransferase)Glutamate P transaminase V 7 GPT 6–59 U/L 0–22 U

(alanine transaminase)Lactic dehydrogenase V 8 LDH 105–333 IU/L 130–250 U

Alkaline phosphatase V 9 Alp 0–250 U/L, normal; 250–750 U/L, 2.0–10.0 Umoderate elevation

r-glutamyl transpeptidase V 10 r-GPT 0–51 IU/L 0–68 U(gamma-glutamate transferase)

Leucine aminopeptidase V 11 LAP Serum: Mal e: 80–200 U/mLFemale: 75–185 U/mL

Total cholesterol V 12 TCh


7/16


Group ... :: : .

1 +---------+---------+---------+---------+---------+-------MTS

.. . . . . .2 +---------+---------+---------+---------+---------+-------MTS

.

:..:: .

1 +---------+---------+---------+---------+---------+-------OA Optimal

.... . . .2 +---------+---------+---------+---------+---------+-------OA Optimal

.

:: :. ..1 +---------+---------+---------+---------+---------+-------Optimal

: . : . .2 +---------+---------+---------+---------+---------+-------Optimal 0 25 50 75 100 125

Figure 1. Dotplot of MD Values for MTS, OA Optimal, and Optimal Combinations By Group (1D mild; 2 D moderate).

next examination or the loss increase after having subjective

symptoms followed by taking a complete examination, and

Dü is the “mid-value” of the MD of a patient group havingthe subjective symptoms. It is pointed out that T will vary by

disease, because the costs will vary by disease. The terms used

in (6) are not clearly dened, however, because the meaning of

“subjective symptoms” is not clear. It is important to note that

statistical approaches based on misclassication costs would

incorporate into any decision rule the probability of having the

disease, given the data on a subject (see, e.g., Zielezny and

Dunn 1975).

4.2 Results Using Standard Methods

Descriptions of the MTS do not mention graphical displays

of the raw data. Our rst step in the analysis of the medical

data, however, was to plot each variable by status (healthyD 1;mild diseaseD 2; medium diseaseD 3). These plots are shownin the Appendix.

A key aspect of medical diagnosis involves noting which

variables fall outside their corresponding normal ranges. Nor-

mal ranges are calculated to include 95% of the measurements

on all healthy patients. Taguchi et al. (2001, p. 3) discounted

the usefulness of these ranges based on the work of Kanetaka

(1990), stating that they are arbitrarily determined by test

chemical manufacturers or, in extreme cases, textbook val-

ues used without modication. From the discussion of Harris

(1981), however, it seems that considerable effort has gone

into the determination of normal ranges. The standard practice

of using normal ranges in medical diagnosis does have prob-

lems, however, as listed by Begg (1991), one of which is the

fact that “normalcy is an inherently multivariate concept.”

The normal ranges that we obtained from the National

Library of Medicine (2001) are given in Table 1. The normal

range for alkaline phosphatase (V 9 ) was obtained from

Neuschwander-Tetri (1995). The ranges given by Taguchi

et al. (2001, p. 36) for several of the variables are also shown

in Table 1.

Note that the pair of normal ranges for cholinesterase (V 5 )

in Table 1 do not match each other and are inconsistentwith the values of this variable given in the dataset. Thus

we do not consider the normal range for this variable. In

addition, the normal range given by Taguchi et al. (2001) for

alkaline phosphatase (V 9

) does not match the values given in

the dataset. It can be noted that the normal ranges given by

Taguchi et al. (2001) do not exactly match those given by

the National Library of Medicine for the other variables. It is

not unusual for different sources to give somewhat different

normal ranges. Also, the original study was done in Japan,

so there could be differences in the normal ranges for the

Japanese and the U.S. populations. The normal range for a

variable also depends on the measurement method used. We

have no information on the measurement methods used in

this case study.

Table 3 lists each variable for each unhealthy patient

that is well outside the corresponding normal range for

each unhealthy patient. We use the normal ranges from the

National Library of Medicine, with the exception of alkaline

phosphatase (V 9), because we have the ranges for all variables

and, for the most part, they cover more of the corresponding

values of the healthy group. Note that subjects 2 and 3 do

not have any variables clearly outside any of the normal

ranges, but they differ considerably from the healthy group

Table 3. Variables for Unhealthy Patients

Well Outside Normal Ranges

Su bje ct nu mb er Variab le numb er

1 12, 132 None3 None4 135 106 77 78 139 12, 13

10 4, 1211 10, 1212 1013 1014 10, 13

15 6, 7, 1316 3, 6, 7, 10, 1217 6, 7, 8, 10, 13



8/16


with respect to V 5. The relevance of the various variables to

the diagnosis of liver disease is discussed in Section 4.3.

The following conclusions can be reached by considering

the raw data, the dotplots, and the normal ranges:

1. We note from Figure A.1 that the unhealthy patients are

on average 10 years older than the healthy patients. If the

medical variables vary naturally by age, then it would seem

important to have roughly the same range of ages in the twogroups.

2. From Figure A.14, there is a large difference between

the abnormals and the healthy group for phospholipid (V 14 ),but all values of this variable are within the normal range.

3. It is not clear from the univariate dotplots in Figures

A.15 and A.17 why creatine (V 15 ) and uric acid (V 17 ) should

be declared to be useful variables for the MTS.

4. Some variables dropped under the MTS could be use-

ful in the diagnosis for particular patients. In particular, this

appears true for variables V 6 and V 7 for subjects numbered 15,

16, and 17 in the unhealthy group.

The scatterplot of cholinesterase (V 5 ) and r-GPT (V 10 )

shows a clear separation between the healthy and unhealthy

patients. This plot is shown in Figure 2, with healthy subjects

represented by 1, those with mild disease by 2, and those

with medium disease by 3. All outlying points correspond

to unhealthy patients with two values plotted at the point

(318, 44).

Similarly, the unhealthy patients also show up in the scat-

terplot of PL (V 14 ) versus TCh (V 12 ). This is illustrated in

Figure 3. Taguchi et al. (2001, p. 37) give the correlation

matrix for the variables for the healthy group that shows the

variables V 12 and V 14 as the most highly correlated pair.

There are some signicant differences in variation by gen-

der over all groups. This is illustrated in Figure 4 by the r-GPT

(V 10 ).

There has been an extensive amount of research on the use

of statistical modeling for medical diagnosis (see, e.g., Sahai

and Khurshid 1991). We applied the methods of discriminant

analysis to the medical data, as discussed by Albert and Harris

(1987, pp. 101–115). Interestingly, these authors apply dis-

criminant analysis to the diagnosis of liver disease to illustrate

their approach. We performed the discriminant analysis under

the assumption of multivariate normality for two groups, with

7006005004003002001000

250

200

150

100

50

0

V5 ChE

V 1 0 r - G P T

Figure 2. Scatterplot of Variable 10 Versus Variable 5 ( , 1; + , 2; , 3).

300200100

350

250

150

V12 TCh

V 1 4 P L

Figure 3. Scatterplot of Variable 14 Versus Variable 12 ( , 1; + , 2; , 3).

gender excluded in the analysis and a log transformation on

V 10 . The resulting discriminant function did not do as well

as the MTS recommended scale, however, in separating the

patients with mild disease severity from those with medium

disease severity.

From the medical considerations discussed in more detail

later, however, it is not reasonable to simply use collectively

all of the variables in this dataset to assess the severity of liver

disease. As discussed by Bodily and Fitz (1996) and Chopra

(2001), the level of liver disease is most often measured by the

modied Child–Pugh classication score, which is based on

two clinical and three biochemical measures. The two clinical

measures are ascites (uid in the abdomen) and encephalopa-

thy (mental alertness), and the three biochemical measures are

bilirubin, albumin, and prothrombin time (blood clotting fac-

tor). Only albumin [Alb (V 4 )] is included in the dataset used

for this case study. Thus it is not possible to accurately assess

the level of liver disease for the patients listed as “abnormal.”

4.3 Medical Considerations

In this case study we have considered using the MTS

in assessing the presence and extent of liver disease in a

limited group of Japanese patients. Taguchi and Rajesh (2000)

attempted to derive a valid diagnostic scale based on these

data. We have no information on how the patients were

selected, or on the criteria used to identify the severity of their

disease. Despite reservations concerning the data, we have

presented some statistical results regarding the performance of

the MTS. In this section we discuss some important medical

issues concerning the difculty of treating liver disease as

a single entity, the shortcomings resulting from the use of

2001000

V10 r-GPT

V2 Gender

1

10

Figure 4. Dotplot of V-10, r-GPT, by Gender.



9/16


so-called “liver function tests” (LFTs), and Taguchi and

Rajesh’s (2000) lack of data from some critical, standard LFTs

used for the diagnosis of liver disease and the classication

of its severity level.

The diagnosis of liver disease is complicated for several

reasons. For one, it is attributed to a diverse number of liver

disorders with highly variable underlying pathophysiologyand

clinical presentations. In addition, the only way to obtain spe-cic diagnostic results is often through invasive techniques

(e.g., radiologic procedures and liver biopsy) or immunologic

tests that allow specic diagnoses (e.g., hepatitis serology).

The LFTs are also often used for diagnostic purposes. They

represent a collection of tests that seldom give a specic diag-

nosis; rather, they suggest a general category of liver disorders

(Pratt and Kaplan 1999). It is essential that LFTs be used

collectively, because they have a limited sensitivity and speci-

city. According to Pratt and Kaplan (1999, p. 206) “when

more than one of these tests provides abnormal ndings or the

ndings are persistently abnormal on serial determinations, the

probability of liver disease is high. When all test results are

normal, the probability of missing occult liver disease is low.”

The LFTs are divided into three major categories: (1) tests

of the liver’s ability to transport organic anions and metabo-

lize drugs, such as serum bilirubin; (2) tests that detect injury

to liver cell, including aminotrasferases, such as GOT (V 6),

transaminases, such as GPT (V 7 ), and alkaline phosphatase

Alp (V 95; and (3) tests of liver’s biosynthetic capacity, includ-

ing serum albumin Alb (V 4 ), and blood clotting factors, such

as prothrombin time (Kaplan 1990). Indeed, three of these

LFTs—Alb (V 4 ), prothombin time, and bilirubin—are usedin the “modied Child–Pugh classication,” the classication

standard for severity of liver disease (Bodily and Fitz 1996;

Chopra 2001). In this classication, the severity level is deter-

mined by two physical ndings (ascites and encephalopathy)

and the three aforementioned LFTs. It should be noted that

Taguchi and Rajesh (2000) made no mention of the modied

Child–Pugh classication and gave no data for the two crit-

ical LFTs (bilirubin and prothrombin time) for the patients

in this case study. The only critical LFT reported by Taguchi

and Rajesh (2000), that of Alb (V 4 ), is consistently normal

(3.6–5.8 g/dL) in all 17 “abnormal” patients. In the modied

Child–Pugh classication, an Alb (V 4 ) level of 2.8–3.5 g/dL

is consistent with mild disease, whereas moderate or severe

disease is often found in patients with an Alb ( V 4) level less

than 2.8 g/dL (Bodily and Fitz 1996; Chopra 2001).

There are two general types of liver disease, acute and

chronic. In acute liver disease, the prominently abnormal

LFTs are the aminotransferases [e.g., GOT (V 65], which

often exceed 500 IU and can frequently reach levels in the

thousands while most other tests remain normal for a while

(Kaplan 1990; Pratt and Kaplan 1999). In contrast, in chronic

liver failure, the aminotransferases [e.g., GOT (V 65] and

transaminases [e.g., GPT (V 75] increase minimally to less than

500 IU, whereas the remaining LFTs are variable, according

to the underlying pathology.

In chronic liver disease, three major subtypes can be

identied: chronic hepatocellular disorders (e.g., cirrhosis oralcoholic liver disease), cholestasis (e.g., obstruction), and

inltrative disorders (e.g., tumors or tuberculosis). Each of

these subcategories has a specic pattern of presentation. In

the case of hepatocellular disorder, an Alb (V 4) level below

3.0 g/dL and an abnormally prolonged prothrombin time, with

only minimally increased aminotransferases [e.g., GOT (V 65]to a level below 300 IU is the norm. A ratio of GPT/GOT

above 2.0 strongly suggests alcoholic liver disease in that

setting (Clermont and Chalmers 1967). Whereas 70% of

patients with alcoholic liver disease have GPT/GOT above2.0, this is encountered in only 5% or less of patients with

other disorders (Cohen and Kaplan 1979).

In the cholestatic form of liver disease, the pattern is differ-

ent. There, the Alp (V 9) is usually elevated out of proportionwith other enzymes. Values exceeding four times the normal

level suggest cholestasis (Pratt and Kaplan 1999). Because

Alp (V 9) has a close linear relation with serum r-glutamyl

transpeptidase [r-GPT (V 10 )], it is logical to look for similar

changes in r-GPT (V 10 ) (Whiteld et al. 1972). If Alp (V 9) is

elevated and r-GPT is not, then one would assume that Alp

(V 9) is not of liver origin (probably of bone disease origin).

Aminotransferases [e.g., GOT (V 6

)] are usually elevated to

levels up to 300 IU, with values exceeding 500 IU being rare.

In cases of inltrative liver disease, the pattern is closer to

that seen with obstruction. Often, the earliest and only abnor-

mal test is Alp (V 9). Aminotransferases [e.g., GOT (V 6 )] are

normal or minimally elevated, and so are Alb (V 4 ) and pro-

thrombin time (Pratt and Kaplan 1999).

Of all LFTs (variables) in the patients’ dataset used for this

case study, the most relevant ones for liver disease diagnosis

and classication are Alb (V 4), GOT (V 6), GPT (V 7 ), Alp (V 9),

and r-GPT (V 10 ). The data results for the LFTs V 3, V 5 , V 8 ,and V 11 –V 17 are not directly relevant to liver disease. From

the foregoing medical discussion and the case study data, it

is quite clear that while “abnormal” patients 15–17 seem to

exhibit some chronic hepatocellular disease (e.g., cirrhosis or

alcoholic liver disease), all other patients, both “normal” and

“abnormal,” do not seem to exhibit any notable liver disease.

In fact, it is quite doubtful that any patient participating in this

case study has any signicant liver disease, certainly not acute,

because no patient has an Alb (V 4) level below 3.5 g/dL.

Although some of the abnormal patients 1–14 could exhibit

some extremely weak signs of chronic cholestasis (e.g.,

obstuction) or inltrative disorders (e.g., tumors), such a

diagnosis would certainly require additional results from

the two critical LFTs (bilirubin and prothrombin time) and

would benet from some physical ndings (e.g., ascites), as

suggested in the modied Child–Pugh classication method.

However, these data are not available for the case study.

Moreover, the use of only 17 “abnormal” patients is an

extremely small sample for liver disease diagnosis and

classication, given the highly diverse number of disorders

attributed to liver disease.

Finally, it is important to note that cluster analysis, which

was applied to the ve most relevant LFTs (variables)

Alb (V 4), GOT (V 6), GPT (V 7 ), Alp (V 9), and r-GPT (V 10 ) (for

the combined sample of male and female patients), yielded

an optimal number of two clusters. One cluster grouped

together the “normal” patients with “abnormal” patients, 1–14,whereas the second cluster consisted of the three “abnormal”

patients, 15–17. That is, the results of cluster analysis are



10/16


in full agreement with a careful medical diagnosis based on

the data available for the case study. When the MTS analysis

was similarly applied to these same ve most relevant LFTs

for both 8 and 32 runs, however, the results were consistent,

but different from the results of the cluster analysis and the

medical diagnosis. This suggests that in this case, the problem

with the MTS analysis is connected with the use of the S/N

ratio measure rather than the interaction issue from the OA.In general, however, both of these issues can cause problems.

5. OTHER ASPECTS OF THE

MAHALANOBIS–TAGUCHI SYSTEM

Taguchi et al. (2001) presented several other methods in the

MTS framework. We summarize these in this section.

5.1 Forecasting

Taguchi et al. (2001, chap. 3) presented an application of

the MTS to evaluate the amount of credit that should be

extended to applicants. The MTS is proposed as an alternative

to credit-scoring methods. Taguchi et al. (2001, p. 25) statedthat traditional methods in this area have not been successful,

because only people who defaulted on loans were studied.

Data on good customers is routinely used to build credit-

scoring models, however, as discussed by Reichert, Cho, and

Wagner (1983).

In the MTS approach, the values of M j represent losses to

the company due to unpaid bills. The regression model in (3)

is tted, and the loss corresponding to an applicant with an

MD value of D 2 is estimated to be

M D D= O‚ 34p

MSE = O‚50This practice of using a tted regression line to estimate the

value of an unobserved independent variable corresponding

to an observed value of the dependent variable is called

“calibration” in the statistical literature. Brownlee (1965,

pp. 361–362) discussed the calibration problem specically

for a line through the origin. As discussed by Mee and

Eberhardt (1996), the statistical approach to this problem

accounts for the error in estimating the variance and the slope

of the line. This sampling variation is ignored in the MTS.

5.2 Use in Clinical Trials

Taguchi et al. (2001, chap. 5) pointed out that clinical tri-

als involve large numbers of subjects, require quite a long

time, and are very expensive. The two reasons given for this

are the large individual differences between patients and the

use of attribute data, not continuous variables. It is stated

that if continuous variables, such as the MD, could be used,

then the study could be conducted by observing only one or

two patients in a short period. Statisticians would nd this

claim astounding, because clinical trials must have sample

sizes sufciently large for investigators to measure effective-

ness relative to other treatments, determine dosage, assess the

side effects of the treatment being studied, and to determine

which types of patients in a very heterogeneous population

benet most from the treatment. Taguchi et al. (2001, p. 4)considered use of the MTS in clinical trials to be its most

exciting potential application.

Taguchi et al. (2001, chap. 5) proposed a method for com-

paring the effectiveness of two treatments. Only one patient

is used for each treatment. The MD values of each patient

are recorded over time during treatment. The MD values of

the two patients are scaled using the corresponding initial

MD values, and regression equations are tted to show the

changes in the transformed MD values over time. The treat-

ments are compared by comparing the estimated slopes of the two lines. In statistical terminology, this corresponds to

a repeated-measurements experiment for two treatments, but

with only one subject in each treatment group. Statisticians

would never recommend this practice, however, because varia-

tion between subjects cannot be assessed. The treatment effect

is confounded with the difference between subjects.

5.3 Use of Principal Components

Taguchi and Rajesh (2000) pointed out that in some applica-

tions of the MTS, there are two types of abnormalities present.

For example, in the graduate student admission process there

could be very good, as well as very bad, applicants. Thus,

they noted that it is important to identify the direction of the

abnormality. They stated further that this cannot be done with

the MD values calculated using the inverse of the correlation

matrix, but it can be done using the Gram–Schmidt orthogo-

nalization process.

The Gram–Schmidt process is recommended for obtaining

a set of mutually perpendicular vectors from a set of linearly

independent standardized original vectors. It appears that this

is a recommendation for obtaining the values of the principal

components of the abnormal items based on the correlation

matrix of the normal group. The discussion is not clear, how-

ever, for several reasons. First, the classication into the good

and bad categories is based on the signs of the principal com-

ponents. Often this would not be any more helpful than using

the signs of the standardized original variables. Second, the

threshold of the MD values shown on bivariate plots should

correspond to an ellipse, but instead linear limits are drawn.

Third, the axes, corresponding to what appear to be the prin-

cipal components in the bivariate plots, are not drawn along

the major and minor axes of the MD contour ellipse and are

not centered at the origin, as would be expected.

6. CONCLUDING REMARKSAs statisticians, we much prefer the multivariate statisti-

cal approaches based on underlying probability models to the

MTS. Mahalanobis (1950) also greatly valued the use of prob-

ability, stating that statistics supplies the basis for choosing a

particular course of action in practical problems by balancing

the risks of gain and loss using the calculus of probability. He

also held that the cross-examination of the data was the rst

responsibility of the statistician (Mahalanobis 1965). Ques-

tioning the validity of the data and the use of exploratory data

analysis is not mentioned as part of the MTS.

Statistical methods are better designed to account for

variation between units in the groups and to account forsampling variation. The MTS does not adequately address

the issue of variation between items, because this variation



11/16


typically results in at least some classication errors, when a

classication rule is developed from a dataset. This lack of

attention to variation between units is most evident in the MTS

clinical trials methods, in which variation between individu-

als is completely ignored. In addition, sampling variation is

ignored in the decision rules involving the S/N ratios and in

the calibration problem involving prediction of an M j value

based on the value of MD.It should be noted that some of the application areas men-

tioned for the MTS have been studied extensively in the

statistics and other subject matter literature, including medical

diagnosis and credit scoring. These bodies of work are ignored

in the development of the MTS approaches.

From the case study presented in Section 4, the MTS anal-

ysis based on the S/N ratio in (5) does not necessarily lead to

a good MD scale in that separation between the classes with

different severities of abnormality can be very poor. Of course,

it is possible to use a more effective search algorithm than that

based on the OA and to modify the S/N ratio. Even with such

modications, however, we believe that there are still impor-tant unresolved conceptual issues with the MTS, and that with

further development of the basic approach, one would eventu-

ally need to incorporate methods based on probability. Despite

such serious shortcomings, however, we expect the MTS to

become more widely used in industry. Many practitioners will

understand the advantages of using multivariate data, but will

lack the expertise required to implement statistical approaches.

ACKNOWLEDGMENTSThe research of W. H. Woodall, R. Koudelik, K.-L. Tsui,

and S. B. Kim was partially supported by National Science

Foundation-DMI grant 9908013. K.-L. Tsui’s work was also

partially supported by The Logistic Institute—Asia Pacic,

Singapore. The work of Z. G. Stoumbos was funded in part

by the Law School Admission Council (LSAC) and by a

2001 Rutgers Faculty of Management Research Fellowship.

The opinions and conclusions contained in this publication

are those of the authors and do not necessarily reect the

position or policy of LSAC. We thank Rajesh Jugulum and

Genichi Taguchi for providing the medical case study dataset

and allowing us to distribute it. We also thank the referees andthe associate editor for their helpful comments.

APPENDIX: DOTPLOTS FOR THE MEDICAL DATA VARIABLES

(Status: HealthyD 1, mild diseaseD 2; medium diseaseD 3)

20 30 40 50 60

V1 Age

Status

1

2

3

Figure A.1. Dotplot of V 1 (Age) by Patient Status.

1 2 3 4 5 6 7 8 9 10

V2 GenderEach dot represents up to 3 observations.

Status

1

2

3

Figure A.2. Dotplot of V 2 (Gender) by Patient Status.



12/16


6 7 8

V3 TP

Status

1

2

3

Figure A.3. Dotplot of V 3 (Total Protein) by Patient Status.

3.8 4.8 5.8

3

3.8 4.8 5.8

V4 Alb

Status

1

2

Figure A.4. Dotplot of V 4 (Albumin) by Patient Status.

100 200 300 400 500 600 700

V5 ChE

Status

1

2

3

Figure A.5. Dotplot of V 5 (Cholinesterase) by Patient Status.



13/16


50 100 150

V6 GOT

Status

1

2

3

Figure A.6. Dotplot of V 6 (Glutamate O Transaminase) by Patient Status.

20 70 120 170

V7 GPT

Status

1

2

3

Figure A.7. Dotplot of V 7 (Glutamate P Transaminase) by Patient Status.

100 200 300 400

V8 LHD

Status

1

2

3

Figure A.8. Dotplot of V 8 (Lactic Dehydrogenase) by Patient Status.

100 200 300

V9 Alp

Status

1

2

3

Figure A.9. Dotplot of V 9 (Alkaline Phosphatase) by Patient Status.



14/16


0 1 00 200

V10 r-GPT

Status

1

2

3

Figure A.10. Dotplot of V 10 (r-Glutamyl Transpeptidase) by Patient Status.

40 50 60 70 80 90 100 110 120

V11 LAP

Status

1

2

3

Figure A.11. Dotplot of V 11 (Leucine Aminopeptidase) by Patient Status.

100 200 300

V12 TCh

Status

1

2

3

Figure A.12. Dotplot of V 12 (Total Cholesterol) by Patient Status.

100 200 300 400

V13 TG

Status

1

2

3

Figure A.13. Dotplot of V 13 (Triglyceride) by Patient Status.



15/16


150 250 350

V14 PL

Status

1

2

3

Figure A.14. Dotplot of V 14 (Phospholipid) by Patient Status.

1.0 1.5 2.0

V15 Cr

Status

1

2

3

Figure A.15. Dotplot of V 15 (Creatinine) by Patient Status.

8 18 2313

V16 BUN

Status

1

2

3

Figure A.16. Dotplot of V 16 (Blood Urea Nitrogen) by Patient Status.



16/16


2.5 3.5 4.5 5.5 6.5 7.5 8.5

V17 UA

Status

1

2

3

Figure A.17. Dotplot of V 17 (Uric Acid) by Patient Status.

[Received August 2001. Revised December 2001.]

REFERENCES

Adams, B. M., and Woodall, W. H. (1989), “An Analysis of Taguchi’sOn-Line Process Control Method Under a Random Walk Model,” Techno-metrics, 31, 401–413.

Albert, A., and Harris, E. K. (1987), Multivariate Interpretation of Clinical Laboratory Data, New York: Marcel Dekker.

Begg, C. B. (1991), “Advances in Statistical Methodology for DiagnosticMedicine in the 1980’s,” Statistics in Medicine, 10, 1887–1895.

Bodily, K. O., and Fitz, J. G. (1996), “Approach to the Patient with SuspectedLiver Disease,” in Current Diagnosis & Treatment in Gastroenterology,

eds. J. H. Grendell, K. R. McQuaid, and S. L. Friedman, Stamford, CT:Appleton & Lange, pp. 461–474.

Box, G. E. P. (1996), “The Role of Statistics in Quality and ProductivityImprovement,” Journal of Applied Statistics, 23, 3–20.

Brownlee, K. A. (1965), Statistical Theory and Methodology in Science and

Engineering, New York: Wiley.Chopra, S. (2001), “Diagnostic Approach to the Patient with Cirrhosis,” UpTo-

Date (www.uptodate.com), 9, 1–6.

Clermont, R. J., and Chalmers, T. C. (1967), “The Transaminase Tests inLiver Disease,” Medicine, 46, 197–207.

Cohen, J. A., and Kaplan, M. M. (1979), “The SGOT/SGPT Ratio: An Indi-

cator of Alcoholic Liver Disease,” Digestive Diseases and Sciences, 24,

835–839.Harris, E. K. (1981), “Statistical Aspects of Reference Values in Clinical

Pathology,” in Progress in Clinical Pathology VIII , eds. M. Stefanini andE. Benson, New York: Grune and Stratton, pp. 45–66.

Kanetaka, T. (1990), “Diagnosis of a Special Health Check Using Maha-lanobis Distance,” ASI Journal, 3.

Kaplan, M. M. (1990), “Evaluation of Hepatobiliary Disease,” in Internal

Medicine, (3rd ed.), eds. J. H. Stein et al., Boston, MA: Little, Brown, p. 443.Lunani, M., Nair, V. N., and Wasserman, G. S. (1997), “Graphical Meth-

ods for Robust Design with Dynamic Characteristics,” Journal of Quality

Technology, 29, 327–338.Mahalanobis, P. C. (1950), “Why Statistics?,” Sankhy Na, 10, 195–228.

(1965), “Statistics as a Key Technology,” The American Statistician,

19, 43–46.Mee, R. W., and Eberhardt, K. (1996), “A C omparison of Uncertainty Criteria

for Calibration,” Technometrics, 38, 221–229.

Montgomery, D. C. (1992), “The Use of Statistical Process Control andDesign of Experiments in Product and Process Improvement,” IIE Trans-

actions, 24, 4–17.

Nair, V. N. (ed.) (1992), “Taguchi’s Parameter Design: A Panel Discussion,”

Technometrics, 34, 127–161.

National Library of Medicine (2001), MEDLINEplus Health Information(www.nlm.nih.gov/medlineplus), May, 16, 2001.

Nayebpour, M. R., and Woodall, W. H. (1993), “An Analysis of Taguchi’s

On-Line Quality Monitoring Procedures for Attributes,” Technometrics, 35,53–60.

Neuschwander-Tetri, B. A. (1995), “Common Blood Tests for Liver Disease,”Postgraduate Medicine, 98, 49–63.

Pratt, D. S., and Kaplan, M. M. (1999), “Evaluation of the Liver. A. Lab-oratory Tests,” i n Schiff’s Diseases of the Liver (5th ed.), eds. E. R.Schiff, M. F. Sorrell, W. C. Maddrey, Philadelphia: Lippincott-Raven,pp. 205–244.

Reichert, A. K., Cho, C.-C., and Wagner, G. M. (1983), “An Examinationof the Conceptual Issues Involved in Developing Credit-Scoring Models,”

Journal of Business and Economic Statistics, 1, 101–114.

Sahai, H., and Khurshid, A. (1991), “Mathematical and Statistical Models inComputer-Assisted Medical Diagnosis: An Overview and a Selected Bibli-ography,” Journal of Clinical Computing, 20, 33–81.

Taguchi, G. (1981), On-Line Quality Control During Production, Tokyo:Japanese Standards Association.

Taguchi, S., Chowdhury, S., and Taguchi, S. (2000), Robust Engineering,New York: McGraw-Hill.

Taguchi, G., Chowdhury, S., and Wu, Y. (2001), The Mahalanobis–Taguchi

System, New York: McGraw-Hill.

Taguchi, G., Elsayed, E. A., and Hsiang, T. (1989), Quality Engineering inProduction Systems, New York: McGraw-Hill.

Taguchi, G., and Rajesh, J. (2000), “New Trends in Multivariate Diagnosis,”Sankhy Na, 62, 233–248.

Taguchi, G., and Wu, Y. (1980), Introduction to Off-Line Quality Control,

Nagoya, Japan: Japan Quality Control Organization.Tracy, N. D., Young, J. C., and Mason, R. L. (1992), “Multivariate Control

Charts for Individual Observations,” Journal of Quality Technology, 24,

88–95.Tsui, K.-L. (1996), “A Critical Look at Taguchi’s Modeling Approach for

Robust Design,” Journal of Applied Statistics, 23, 81–95.(1999), “Response Model Analysis of Dynamic Robust Design Exper-

iments,” IIE Transactions, 31, 1113–1122.

Whiteld, J. B., Pounder, R. E., Neale, G., and Moss, D. W. (1972), “Serumƒ -Glytamyl Transpeptidase Activity in Liver Disease,” Gut , 13, 702–708.

Wu, C. F. J., and Hamada, M. (2000), Experiments: Planning, Analysis, and

Parameter Design Optimization, New York: Wiley.Zielezny, M., and Dunn, O. J. (1975), “Cost Evaluation of a Two-Stage Clas-

sication Procedure,” Biometrics, 31, 37–47.

a review and analysis of the mahalanobis—taguchi

Documents