selection of optimal features for classification of electrocardiograms

9
J. ELECTROCARDIOLOGY 14 (3), 1981, 239-248 Selection of Optimal Features for Classification of Electrocardiograms BY UDAY JAIN, PH.D., P.M. RAUTAHARJU, M.D. AND J. WARREN, B. Sc. SUMMARY The forward sequential selection and backward sequential rejection algorithms and the optimal branch and bound algorithm were evaluated in selection of features for classification of electrocardiograms of 237 patients with old myocardial infarction and 299 subjects without infarction. The branch and bound algorithm proved suitable for small sets of ECG features. However, the computational effort required was orders of magnitude greater than that for the other two methods and became prohibitive with large feature sets. A satisfactory and consistent overall classification accuracy was achieved by using the sequential selection algorithms for selecting continuous features by maximizing the Mahalanobis distance at each step of the feature selection process. Maximization of the association index can produce better results but requires more computing effort. Feature selection based on maximizing sensitivity at each step for a fixed level of specificity is less satisfactory when a high level of specificity is required. A vast number of features can be measured on the electrocardiogram (ECG). Typical ECG classifica- tion systems make use of approximately 50 to 70 ECG features. The selection of a good subset of 50-70 features from hundreds of possible primary or derived ECG features is one of the major prob- lems in electrocardiography. Practical ECG clas- sification schemes such as the Minnesota code 1 use one to about eight features for differential diagnosis between any two categories. Selecting such small subsets from 50 to 70 commonly used ECG features is another important problem in electrocardiography. This problem is studied in the present investigation. After an initial set of D features is obtained, a subset of d features may be selected and the re- maining (D-d) features discarded. Alternatively, a transformation method such as the truncated Karhunen-Loeve expansion may be used to com- bine all D features and to generate a new set of reduced size. However, there are computational From the Biophysics and Bioengineering Research Labora- tory, Faculty of Medicine, Dalhousie University, Halifax, Nova Scotia, Canada. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement ~' in accordancewith 18 U.S.C. w 1734 solely to indicate this fact. Reprint requests to: Dr. P. M. Rautaharju, 6th Floor, Gerrard Hall, 5303 Morris Street, Halifax, Nova Scotia, Canada B3J 1B6. and conceptual difficulties in the practical use of feature transformation methods 2 and .they are presently not used in diagnostic ECG programs. Feature subset selection requires a combinato- rial search procedure which chooses the potential subsets which are then evaluated by a criterion function. Many search procedures and criteria are available. The only certain way of obtaining an optimal subset of features is to perform an exhaustive search over all D!/((D-d)! xd!) subsets of size d using the probability of error as the criterion. As pointed out by Rautaharju et al. 2, to select an op- timal subset of 66 features out of 300 initial fea- tures, 2.48 x 10~7 subsets have to be evaluated. It is not feasible to perform such an exhaustive search with the present state of computer tech- nology. This holds true even for smaller sets of features. Over 5.7 billion subsets have to be evaluated to choose an optimal subset of 8 out of 66 features. Because of the unsuitability of an exhaustive search, a number of suboptimal search procedures have been used in the design of ECG classification systems. The forward sequential selection or step-up algorithm is commonly used 3. It requires the evaluation of d(2D-d+ 1)/2 subsets. The proce- dure used first selects one feature which alone produces the best separation between given test populations, using certain statistical measures as a criterion for differentiation. 239

Upload: uday-jain

Post on 01-Nov-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Selection of optimal features for classification of electrocardiograms

J. ELECTROCARDIOLOGY 14 (3), 1981, 239-248

Selection of Optimal Features for Classification of Electrocardiograms

BY UDAY JAIN, PH.D., P.M. RAUTAHARJU, M.D. AND J. WARREN, B. Sc.

SUMMARY

The forward sequential selection and backward sequential rejection algorithms and the optimal branch and bound algorithm were evaluated in selection of features for classification of electrocardiograms of 237 patients with old myocardial infarction and 299 subjects without infarction.

The branch and bound algorithm proved suitable for small sets of ECG features. However, the computational effort required was orders of magnitude greater than that for the other two methods and became prohibitive with large feature sets. A satisfactory and consistent overall classification accuracy was achieved by using the sequential selection algorithms for selecting continuous features by maximizing the Mahalanobis distance at each step of the feature selection process. Maximization of the association index can produce better results but requires more computing effort. Feature selection based on maximizing sensitivity at each step for a fixed level of specificity is less satisfactory when a high level of specificity is required.

A vast number of features can be measured on the electrocardiogram (ECG). Typical ECG classifica- tion systems make use of approximately 50 to 70 ECG features. The selection of a good subset of 50-70 features from hundreds of possible pr imary or derived ECG features is one of the major prob- lems in electrocardiography. Practical ECG clas- sification schemes such as the Minnesota code 1 use one to about eight features for differential diagnosis between any two categories. Selecting such small subsets from 50 to 70 commonly used ECG features is another important problem in electrocardiography. This problem is studied in the present investigation.

After an initial set of D features is obtained, a subset of d features may be selected and the re- maining (D-d) features discarded. Alternatively, a t ransformation method such as the t runcated Karhunen-Loeve expansion may be used to com- bine all D features and to generate a new set of reduced size. However, there are computational

From the Biophysics and Bioengineering Research Labora- tory, Faculty of Medicine, Dalhousie University, Halifax, Nova Scotia, Canada. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement ~' in accordance with 18 U.S.C. w 1734 solely to indicate this fact. Reprint requests to: Dr. P. M. Rautaharju, 6th Floor, Gerrard Hall, 5303 Morris Street, Halifax, Nova Scotia, Canada B3J 1B6.

and conceptual difficulties in the practical use of fea ture t ransformat ion methods 2 and .they are presently not used in diagnostic ECG programs.

Fea ture subset selection requires a combinato- rial search procedure which chooses the potential subsets which are then evaluated by a criterion function. Many search procedures and cri teria are available.

The only certain way of obtaining an o p t i m a l

subset of features is to perform an exhaust ive search over all D!/((D-d)! xd!) subsets of size d using the probabili ty of error as the criterion. As pointed out by Rautahar ju et al. 2, to select an op- t imal subset of 66 features out of 300 initial fea- tures, 2.48 x 10 ~7 subsets have to be evaluated.

It is not feasible to perform such an exhaust ive search with the present state of computer tech- nology. This holds t rue even for smaller sets of fea tures . Over 5.7 bil l ion subsets have to be evaluated to choose an optimal subset of 8 out of 66 features.

Because of the unsuitabi l i ty of an exhaust ive search, a number of suboptimal search procedures have been used in the design of ECG classification systems. The forward sequent ia l select ion or step-up algori thm is commonly used 3. It requires the evaluat ion of d(2D-d+ 1)/2 subsets. The proce- dure used first selects one feature which alone produces the best separation between given test populations, using certain statistical measures as a criterion for differentiation.

239

Page 2: Selection of optimal features for classification of electrocardiograms

240 JAIN ET AL

TABLE I

The initial set of 26 features for investigating the feature selection procedures. Leadgroup L comprises leads I, aVL and V6, leadgroup F leads II, III and aVF and leadgroup V leads V1, V2, V3, V4, and V5. The ST integral is calculated over the initial 3/8 of the ST-T segment.

Feature Number Definition

1 Maximum Q duration in leadgroup L

2 Maximum Q duration in leadgroup F

3 Maximum Q duration in leadgroup V

4 Maximum Q amplitude in leadgroup L

5 Maximum Q amplitude in leadgroup F

6 Maximum Q amplitude in leadgroup V

7 Maximum Q/R amplitude ratio in leadgroup L

8 Maximum Q/R amplitude ratio in leadgroup F

9 Maximum Q/R amplitude ratio in leadgroup V

10 QRS frontal plane axis

11 Maximum QRS duration in leadgroup L

12 Maximum QRS duration in leadgroup F

13 Maximum QRS duration in leadgroup V

14 Maximum intrinsicoid deflection in leads V5 and V6

15 Maximum R amplitude in leads, I, II, III, aVR, aVL, aVF

16 Sum of R amplitude in V5 and S amplitude in V1

17 Maximum ST integral in leadgroup L

18 Maximum ST integral in leadgroup F

19 Maximum ST integral in leadgroup V

20 Minimum ST integral in leadgroup L

21 Min imum ST integral in leadgroup F

22 Minimum ST integral in leadgroup V

23 Maximum T amplitude in leadgroup L

24 Maximum T amplitude in leadgroup F

25 Minimum T amplitude in leadgroup L

26 Minimum T amplitude in leadgroup F

The relat ive distance between group means, or the t value, could be employed as criterion in the simplest univar ia te si tuation involving two popu- lations. In the next step, the second feature is selected so tha t it, in combination with the first feature, produces the best classification in com- parison with all other features paired with the first one, etc. A less commonly used procedure is the backward sequential rejection or step-down a l g o r i t h m 4 which r equ i r e s the e v a l u a t i o n of (D(D+l ) -d (d+ l ) ) / 2 subsets. It differs from the

forward step-up algori thm in tha t it first takes all f ea tu res in combina t ion and de te rmines how much the classification accuracy reduces if one feature at a t ime is el iminated from the set. The feature which causes least deterioration is elimi- nated and the "stepping-down" process repeated.

The purpose of the present investigation was to examine the relat ive uti l i ty and limitations of forward sequential selection and backward se- quential rejection algorithms in comparison with the branch and bound algori thm introduced re- cently by Narendra and Fukunaga 5. The la t ter can work in an optimal and several suboptimal modes. When in the optimal mode, in theory it yields an optimal subset with substantial ly less computational effort than exhaust ive search. The number of subsets evaluated by this algori thm depends not only on D, d and the level of optimal- i ty but also on the correlations among the fea- tures. The mathemat ics involved in the formula- tion of the branch and bound algori thm are more complex than those of the two sequential search algorithms. It is suggested tha t interested readers refer to the original article 5 for details which will not be described here. To our knowledge, the branch and bound algori thm has not been used before in electrocardiography.

MATERIALS AND METHODS Data files containing conventional 12-lead ECGs of

three groups of subjects were used. The first group was composed of 128 clinically healthy firemen without evi- dence of heart disease. The second group consisted of 237 patients with old myocardial infarction which was documented in the acute phase by following strictly defined procedures and selection criteria. The third group was comprised of 171 subjects with documented sustained hypertension of over one year's duration but without clinical evidence of myocardial infarction. The selection criteria of the clinical groups are described in more detail in another article. 6

The initial set of 26 ECG features measured by the computer is listed in Table 1. The set is composed primarily of Q wave durations and amplitudes, Q/R ratios and ST-T measurements from three groups of ECG leads which are generally considered to reflect cardiac damage in different anatomical locations. The set also includes some features which are commonly used in the criteria for left ventricular hypertrophy. Q/R ratios were transformed by the arctangent opera- tion to avoid the situation in which a ratio becomes infinite when the R wave is absent.

Another set of 43 features which consists of most of the features listed in Table 1 was used to study the performance of feature selection procedures in realisti- cally large sets. This set includes ECG wave integrals

J. ELECTROCARDIOLOGY, VOL. 14, NO. 3, 1981

Page 3: Selection of optimal features for classification of electrocardiograms

OPTIMAL ECG FEATURE SELECTION 241

and has previously been described. 6 The leadgrouping for this set was different from that for the set of 26 features.

Since the performance of features selection proce- dures may depend on the type of features, data sets composed of continuous, binary and ternary features were used in the present study. The procedure used for converting the continuous features into binary features is briefly described in a previous article 6 and a detailed account of the binary and ternary discretization proce- dures is given elsewhereY The procedure involves simultaneous stepwise shift of the thresholds and those thresholds are retained which maximize the classifica- tion accuracy.

Forward sequential selection, backward sequential rejection and branch and bound algorithm were used for feature subset selection. The branch and bound al- gorithm has a parameter whose value can be set from 0 to (D-d-l) in steps of 1. When the value is (D-d-l), the algorithm yields an optimal subset. When the value is 0, the algorithm approximates the backward sequential rejection procedure. By reducing the parameter, the speed can be increased at the expense of optimality. The Mahalanobis distance s was used as the criterion for feature selection. This statistical index is the spatial multivariate counterpart of the t value familiar from the univariate situation. It is optimal for the linear discriminant function when equal prior probabilities are used for classification.

To calculate the Mahalanobis distance s and the linear discriminant function, it is necessary to invert the joint covariance matrix of the two classes. The Gauss-Jordan method of matrix inversion described by Orden 9 was used with double precision (64 bit) arith- metic on a Xerox Sigma 5 computer.

To reduce the computing effort, the Mahalanobis dis- tance for various subsets was computed in a recursive manner 5 for the branch and bound algorithm. For the step-up and step-down algorithms, the number of sub- sets evaluated is relatively small and the matrix in- version was performed separately for each subset to improve the accuracy.

Best feature subsets of size five and ten were identified by each procedure. Classification accuracy for the linear discriminant function was determined by using the association index (AI), defined in terms of sensitivity (SE) and specificity (SP) as:

AI (%) = SE (%) + SP (%)- 100

Here, sensitivity is the classification accuracy of the myocardial infarction class while specificity is the ac- curacy of the non-infarction class. The prior prob- abilities were adjusted to maximize AI or to yield a desired level of specificity.

For continuous features, six classifiers were de- signed: normal versus hypertension, normal versus myocardial infarction, hypertension versus myocardial infarction, normal versus non-normal, hypertension versus non-hypertension, and myocardial infarction

versus non-myocardial infarction. Primary attention was paid to the performance of the myocardial infarc- tion versus non-infarction classifier because of its po- tential clinical importance and to keep the sample size as large as possible.

Most of the results are obtained by using the entire data file as the design set for the various classifiers. For final comparison of classification accuracy for clas- sifiers designed with four different feature selection criteria (Table 6), the study population was divided into a design and a test group. Two thirds of the infarct and the pooled non-infarct groups, i.e. 155 and 198 subjects respectively, were assigned to the design set thus leav- ing the remaining 79 infarcts and 100 non-infarcts in the test set. The feature selection criteria employed in these final test runs with the step-up algorithm were: a) maximization of the Mahalanobis distance at each step of the feature selection process, b) maximization of the association index, c) maximization of the sensitivity at 96% level of specificity and d) maximization of the specificity at 96% level of sensitivity. Of these selection criteria, only the Mahalanobis distance is a monotonic function of dimensionality (i.e., the number of features used) and hence suitable for use with the branch and bound algorithm.

In order to study the optimality of the step-up al- gorithm on the set of 43 features, sets often binary, ten ternary and ten continuous features were chosen by maximizing the association index at each step. Classifi- cation was performed by three sets of ten binary fea- tures with feature numbers corresponding to the above three sets.

RESULTS Table 2 indicates t ha t the opt imal subsets of con- t inuous fea tures chosen by the opt imal b ranch and bound a lgor i thm for different classifiers are considerably different. This is an expected resul t and suggests t ha t a va r ie ty of fea ture subsets are needed for a mul t ig roup h ie ra rch ica l decision tree classif ier . Table 2 also i n d i c a t e s t h a t al l t he fea tures included in subsets of size 5 are also member s of the subsets of size 10 for the normal versus hyper tens ion , hype r t ens ion versus non- hype r t ens ion and myocard ia l in farc t ion versus non-infarction classifiers. This is not so for the no rma l versus myocard ia l infarction, hyper ten- sion ve r sus m y o c a r d i a l in fa rc t ion and n o r m a l versus non-normal classifiers.

The last two columns in Table 2 ident i fy the fea tures selected for the myocard ia l infarc t ion versus non- infarc t ion classifier. I t is noted t h a t five of the ten fea tures chosen are ST and T wave items. Q wave ampl i tudes in two leadgroups a re selected while Q wave dura t ions are not included in the opt imal subset of ten features.

Table 3 lists the subsets of var ious sizes chosen

d. ELECTROCARDIOLOGY, VOL. 14, NO. 3, 1981

Page 4: Selection of optimal features for classification of electrocardiograms

242 JAIN ET AL

TABLE II

Continuous features included in the best subsets of five (F) and ten (T) chosen by the optimal branch and bound procedure for six classifiers.

D 2 = Mahalanobis distance, AI = association index.

FEATURE N vs. MI N vs. HT HT vs. MI N vs. REST HT vs. REST MI vs. REST NUMBER FIVE TEN FIVE TEN FIVE TEN FIVE TEN FIVE TEN FIVE TEN

6 7 8 F 9

10 11 12 13 14 15 16 F 17 18 19 2O 21 22 23 24 F 25 F 26

D 2 5.20 A] 79.7

T T T T F T T T F T T F T F F T

T T

F T F T T

T F T T T

T T T

T F T F T F T T

T F T T

T F

T T T F T

T T T T F T F T T F F

F T

T F T F T

T

F T

F T

6.07 4.13 5.38 2.95 4.32 2.83 3.65 1.76 2.45 3.89 82.2 75.6 79.3 67.4 75.3 63.3 71.9 56.5 67.2 71.5

T F T F T

T

4.89 76.6

N= normal, HT= hypertension, M I= myocardial infarction, REST= combined remaining two classes.

by the step-up and step-down algorithms from the initial set of 26 continuous features for the myocardial infarct ion versus non-infarct ion classifier. It is seen, for instance, that the subset of five features includes features 5, 7, 16, 24 and 25 for the step-up and features 5, 6, 7, 15, 24 and 25 for the step-down procedure.

As evident from Tables 2 and 3, the set of ten continuous features chosen by the sequential backward rejection method is identical to that selected by the branch and bound algorithm. Two out of ten features chosen by the sequential for- ward selection method differ from the set selected by the other two methods. The latter set also has slightly better discriminatory power.

The results for five and ten binary, ternary and continuous features selected from the initial set of 26 features for the myocardial infarction versus non-infarction classifier are summarized in Table 4. Irrespective of the feature type, all three fea- ture selection procedures yielded subsets with nearly equal performance. The optimum branch and bound algorithm always yields the best sub- set, though the difference between it and the other procedures is marginal.

Table 4 indicates that the number of subsets evaluated varies a great deal with the feature selection method used. The number of subsets evaluated by the optimal branch and bound al- gorithm depends on the intercorrelations among

J. ELECTROCARDIOLOGY, VOL. 14, NO. 3, 1981

Page 5: Selection of optimal features for classification of electrocardiograms

OPTIMAL ECG FEATURE SELECTION 243

the features used and this number is more than an order of magnitude greater than that for the o the r two me thods . The n u m b e r of s u b s e t s evaluated increases from continuous to binary to ternary features. To select a subset of five and ten features out of 26, exhaustive search evaluates 65,780 and 5,311,735 subsets respectively. Hence the branch and bound algorithm is substant ial ly more economical than exhaustive search for the selection of ten features but not when sets of five features are selected. For five ternary features, the exhaust ive search evaluates fewer subsets than the branch and bound algorithm.

Results for five, ten and 15 binary and continu- ous features selected from the initial set of 43 fea- tures by the step-up and step-down algorithms for the myocardial infarction versus non-myocardial infarction classifier are shown in Table 5. It is observed that for five continuous features the step-up algorithm gives bet ter results than the step-down algorithm while for the other feature sets both algorithms yield nearly equal classifica- tion accuracy.

The computational effort for the branch and bound a lgor i thm increases rap id ly when the number of features in the initial set increases. Attempts to use this algorithm in the optimal mode on the set of 43 features were abandoned because the computer t ime required was out of proportion to the potential small improvement in classifier performance.

An at tempt was made to use the branch and bound algorithm in the suboptimal mode to select subsets of five and ten binary features out of 43 fea tures for the myocardia l infarct ion versus non-myocardial infarction classifier. For subsets

TABLE III

Continuous features included in subsets of various sizes and associated Mahalanobis distances (D 2) for myocardial in- farction versus non-myocardial infarction classifier. Subsets chosen by the forward sequential selection and backward sequential rejection algorithms.

Forward Selection Backward Rejection

Subset Size Feature Kept D 2 Feature Kept D 2

1 25 1.40 25 1.40 2 24 2.39 5 2.04 3 16 2.93 15 2.94 4 5 3.55 24 3.43 5 7 3.84 6 3.89 6 6 4.04 7 4.25 7 18 4.30 18 4.55 8 15 4.58 12 4.71 9 12 4.73 23 4.79

10 21 4.82 26 4.89 11 23 4.90 17 4.94 12 4 4.96 22 5.02 13 14 5.03 14 5.08 14 11 5.07 4 5.14 15 17 5.11 11 5.18 16 22 5.18 21 5.22 17 26 5.24 8 5.24 18 8 5.26 16 5.26 19 20 5.27 20 5.27 20 3 5.28 3 5.28 21 1 5.28 1 5.28 22 13 5.29 13 5.29 23 10 5.29 10 5.29 24 19 5.29 19 5.29 25 2 5.29 2 5.29 26 9 5.29 9 5.29

TABLE IV

The performance of myocardial infarction versus non-myocardial infarction classifier for five and ten binary, ternary and continuous features chosen by the three feature selection procedures from the initial set of 26 features. The computational effort for each method is indicated by the number of subsets (N) evaluated.

D 2 = Mahalanohis distance, AI = association index.

Features Step-up Step-down Branch and Bound

D 2 AI N D 2 AI N D 2 AI N

5 Binary 3.84 73.41 120 3.74 73.45 336 3.91 72.63 46727 5 Ternary 3.82 70.99 120 3.80 71.11 336 3.82 70.99 76598 5 Continuous 3.84 69.90 120 3.89 71.49 336 3.89 71.49 26459

10 Binary 4.87 72.47 215 5.02 74.75 296 5.02 74.23 11780 10 Ternary 5.06 75.01 215 5.08 73.50 296 5.08 73.50 40187 10 Continuous 4.82 75.13 215 4.89 76.59 296 4.69 76.59 5074

J. ELECTROCARDIOLOGY, VOL. 14, NO. 3, 1981

Page 6: Selection of optimal features for classification of electrocardiograms

244 JAIN ET AL

TABLE V

The performance of myocardial infarction versus non- myocardial infarction classifier for 5, 10 and 15 binary and continuous features chosen by the step-up and step-down pro- cedures. The performance of each method is indicated by the number (N) of subsets evaluated, Mahalanobis distance (D 2) and the association index (AI).

Features Step-up Step-down

D 2 AI N D 2 AI N

5 Binary 5.84 80.53 205 5,79 79.92 931 5 Continuous 4.29 73.42 205 3.23 65.88 931

10 Binary 6.86 81.41 385 6,89 80.84 891 10 Continuous 5.61 77.97 385 5.54 79.95 891

15 Binary 7.31 80.68 540 7.34 81.11 826 15Continuous 6.52 81.99 540 6.47 82.05 826

of ten features, the value of the suboptimality parameter can be varied from 0 to 32. 5 When the value was 8, the algorithm failed to select a sub- set after examining 173,249 subsets in 60 minutes of CPU time on the Sigma 5 computer. For lesser values of the parameter , it was observed tha t when the value of the parameter was increased, the value of the Mahalanobis distance also in- creased but, in all cases, it was lower than the value obtained by the step-up and]or step-down algorithms.

When the parameter was 6, the Mahalanobis distance for subsets of five and ten features was 5.66 and 6.88 whi le the n u m b e r of s u b s e t s evaluated was 46,149 and 148,352 respectively. This indicates tha t the branch and bound al- gorithm in its suboptimal mode is not suitable for the present application. It was also found that the number of subsets evaluated to select a subset of ten features exceeded the number for five features when the value of the parameter was 2, 4 or 6. This was not so when the value of the parameter was 1.

Mahalanobis distance was used as the criterion in test runs presented above. As seen from Tables 3 and 4, the association index for the design set is not a monotonic function of the Mahalanobis dis- tance. The association index on an independent test set is even less related to the Mahalanobis distance on the design set.

Table 6 summarizes results from evaluation of the classification accuracy for classifiers designed with the four feature selection criteria defined at the end of the methods section. The results are given separately for the design and the test sets for ten binary and ten continuous features. The

classification accuracy is expressed as the sen- sitivity at 90% and 96% levels of specificity and as the association index for the combination of sen- sitivity and specificity which gave the maximal value of the association index.

It is seen that the binary features perform as well as, or bet ter than, the continuous features with all four feature selection criteria up to a specificity of 90%. However, the performance of binary features deteriorates considerably when the specificity is adjusted to 96% irrespective of the feature selection criterion used.

The results in Table 6 also indicate that up to the 90% level of specificity, the features chosen by maximizing the association index tend to have the highest classification accuracy. At the 96% level of specificity, selection of continuous fea- tures based on maximizing the Mahalanobis dis- tance tends to yield marginal ly bet ter results than selection based on maximizing the associa- tion index. It is also noted that, in the test set at 96% specificity, the sets of features selected by using the 96% specificity criterion do not yield as good classification accuracy as the feature sets chosen by maximizing the association index or the Mahalanobis distance.

In some test runs best subsets of ternary and continuous features were utilized as binary fea- tures. Surprisingly, such subsets occasionally per- formed better than corresponding subsets chosen to optimize the performance of binary features. This observation can be interpreted to indicate that the step-up selection procedure at times fails to secure the optimal subset, part icularly with large feature sets.

DISCUSSION The resul t s from the presen t inves t iga t ion

sugges t t h a t con t inuous f e a t u r e s chosen by maximizing the Mahalanobis distance at each step of the step-up algorithm yield a satisfactory and consistent overall classification accuracy, part icularly when a high level of specificity is required, as is the case in many clinical and epidemiological applications. While feature selec- tion based on maximizing the association index tends to produce a slightly higher accuracy, this selection criterion requires a considerably greater amount of computation.

The present investigation proved that while, for small initial feature sets, the optimal branch and bound a lgor i thm is su i table for ECG fea ture selection, it requires excessive computer time for rea l i s t i ca l ly la rge fea ture sets and per forms

J. ELECTROCARDIOLOGY, VOL. 14, NO. 3, 1981

Page 7: Selection of optimal features for classification of electrocardiograms

OPTIMAL ECG FEATURE SELECTION 245

TABLE Vl

Classification accuracy of infarct versus non-infarct classifiers designed by using four different feature selection criteria. The accuracy is expressed as sensitivity (%) at 90% and 96% specificity and as association index (AI) for the combination of sensitivity and specificity which gives the maximum AI. Best sets of ten binary and continuous features were chosen by maximizing in the design set the Mahalanobis distance (D'), the association index (AI) and the sensitivity at 9O%and 96%specificity at each step of the feature selection process. The results are listed separately for the design and test sets.

Sensitivity Sensitivity Feature Data Maximal at 90% at 96%

Selection Rule Type Set AI specificity specificity

Maximize D z Binary Design 81.2 87.7 81.9 Test 76.9 83.5 45.6

Continuous Design 79.6 88.4 78.7 Test 72.4 78.5 73.4

Maximize AI Binary Design 81.5 91.0 74.2 Test 76.7 86.1 55.7

Continuous Design 81.2 89.0 71.6 Test 77.9 83.5 72.2

Maximize sensitivity Binary Design 82.9 91.6 75.5 at 90% specificity Test 73.7 82.3 48.1

Continuous Design 80.2 90.3 76.8 Test 76.9 82.3 63.3

Maximize sensitivity Binary Design 78.0 83.9 81.9 at 96% specificity Test 74.3 82.3 54.2

Continuous Design 72.6 80.0 76.1 Test 67.6 70.9 64.6

poorly when used in the suboptimal mode. The search procedures which work in the back-

ward direction can in theory be expected to select better subsets than the ones which work in the forward direction. 4,~~ The backward procedures evaluate a feature in conjunction with other fea- tures. The occasionally better performance of the backward search procedures may also be due to the fact tha t they are more efficient in rejecting noisy features, and retain some of the correlated features which may improve classification but are not selected by the forward procedures.

There are two major problems with the back- ward procedures. Firstly, they require the compu- tation of the inverse of the joint covariance matr ix of dimension D, the number of features in the ini- tial data set. In order to obtain a good estimate of the inverse of a large matrix, the number of sam- ples in each group should be many times greater t han D. Secondly, accurate inversion of large matrices often requires more precision than is available on many computers. The problem is more severe when some of the features are highly correlated as is often the case with ECG features. This is particularly true for the branch and bound

algorithm which evaluates a much greater num- ber of subsets and where the Mahalanobis dis- tance for different subsets is computed in a recur- sive manner , thus l imi t ing the u t i l i ty of the backward search procedures in electrocardio- graphic applications.

Only three combinatorial search procedures were used in the present study. A number of other procedures are available. 1~ Eisenbeis et al. 13 found that all the search procedures they tested yielded approximately equal results.

A number of criteria which form tight bounds on the p r o b a b i l i t y of e r ro r have been pro- posed. 3,13-~e Operationally, the non-parametr ic criteria are difficult to use and are unlikely to be satisfactory in electrocardiographic classification. Quadrat ic cri ter ia such as Fisher ~7 criterion, Mahalanobis s distance and Bhat tacharyya TM dis- tance are suitable because they are easy to use, they are monotonic functions of the dimensional- i ty and they assume the classes to have Gaussian distributions as assumed by the linear and qua- drat ic d i sc r iminant functions. Mucciardi and Gose 3 and Eisenbeis et al. TM found no appreciable differences in the performance regardless of the

d. ELECTROCARDIOLOGY, VOL. 14, NO. 3, 1981

Page 8: Selection of optimal features for classification of electrocardiograms

246 JAIN ET AL

criteria they used. For feature subset selection for mul t igroup

classification, two different strategies are sug- gested. The minimum value of a two-group criter- ion such as Mahalanobis distance between any pair of classes may be maximized, or the average value of the criterion between all pairs of classes may be maximized. Toussaint is has indicated tha t the second procedure yields better results. When forward sequential selection procedure is being used, the (k + 1) th feature may be chosen as tha t which best separates the class pair most confused by the k feature set. 3

In electrocardiography, the suboptimal step-up algorithm is commonly used. However, the extent of suboptimality is not generally known. In the present investigation, the results obtained by the subsets selected by the step-up, the step-down and the branch and bound algorithms were near ly equally good for the set of 26 features. Thus, it appears tha t no appreciable loss is incurred as a result of the use of the step-up or step-down pro- cedures when the initial feature set is small. This is not necessarily so for larger initial feature sets, as observed for the set of 43 features. This is espe- cially true for discretized features.

In conclusion, for relatively small initial sets of features, the optimal branch and bound method appears feasible for the search of an optimum subset. For realistically large feature sets such as may be required for multigroup ECG classifica- tion, this method in its optimal mode becomes too costly in terms of computer time requirements. It performs poorly in its suboptimal mode. In practi- cal electrocardiographic applications, it may be advisable to use both s step-up and step-down algorithms and to retain the feature subset which yields the best results on the design and test sets.

The results from the present study indicate that, while the sequential selection algorithms perform quite well with small feature sets, they do not necessari ly perform satisfactori ly wi th large feature sets. Indirect evidence to support this argument was derived from the observation tha t the best subsets of continuous features, when used as binary features, occasionally performed better than the best subsets chosen from the orig- inal binary features.

The present study illustrates many of the dif- ficulties encountered in at tempts to select an op- t imum subset of features from a large initial set of measured or derived ECG variables. The deterio- ration of classifier performance in independent test groups, par t icular ly when a high level of

specificity is required, suggests instability which is probably due to the relatively small groups of subjects available for the design of the classifiers. Sample size consideration becomes of paramount importance par t icular ly when optimal feature selection is needed for differential diagnosis be- tween multiple disease categories. It thus seems plausible t ha t the best s t ra tegy is to perform feature selection on continuous features even if the sets selected are later on used as discretized variables. 6

REFERENCES

1. BLACKBURN, H W, KEYS, A, SIMONSON, E, RAUTAHARJU, P M AND PUNSAR, S: The electrocar- diogram in population studies. A classification sys- tem. Circulation 21:1160, 1960

2. RAUTAHARJU, P M, BLACKBURN, H W, WOLF, H K AND HORACEK, B M: Computers in clinical elec- trocardiology. Is vectorcardiography becoming ob- solete? Adv Cardiol 16:143, 1976

3. MUCCIARDI, A N AND GOSE, E ]~: A comparison of seven techniques for choosing subsets of pattern recognition properties. IEEE Trans Comput C-20:1023, 1971

4. LISSACK, T AND FU, K S: Parametric feature ex- traction through error minimization applied to medical diagnosis. IEEE Trans Syst, Man and Cybern SMC-6:605, 1976

5. NARENDRA, P M AND FUKUNAGA, K: A branch and bound algorithm for feature subset selection. IEEE Trans Comput C-26:917, 1977

6. JAIN, U AND RAUTAHARJU, P M: Diagnostic accu- racy of the conventional 12-lead and the orthogonal Frank-lead electrocardiograms in detection of myocardial infarction with classifiers using con- tinuous and Bernoulli features. J Electrocardiol 13:159, 1980

7. JAIN, U, RAUTAHARJU, P M AND HORACEK, B M: The stability of decision-theoretic electrocardio- gram classifiers based on the use of discretized fea- tures. Comp Biomed Res 13:132, 1980

8. MAHALANOBIS, P C: On the generalized distance in statistics. Proc Nat Inst Sci, India, 12:49, 1936

9. ORDEN, A: Matrix inversion and related topics by direct methods. In Mathematical Methods for Digi- tal Computers, A RALSTON AND H S WILF, eds. Wiley, New York 1967, pp 39-55

10. TOUSSAINT, G T: Recent progress in statistical methods applied to pattern recognition. In Proceed- ings Second International Joint Conference on Pat- tern Recognition, Copenhagen, Denmark, IEEE Compute Soc (Pubn #74CHO 885-4-C) p 550 1974

11. CHANG, C Y: Dynamic programming as applied to feature subset selection in a pattern recognition system. IEEE Trans Syst, Man and Cybern SMC- 3:166, 1973

12. STEARNS, S D: On selecting features for pattern

J. ELECTROCARDIOLOGY, VOL. 14, NO. 3, 1981

Page 9: Selection of optimal features for classification of electrocardiograms

OPTIMAL ECG FEATURE SELECTION 247

classifiers. In Proceedings Third In ternat ional Joint Conference on Pat tern Recognition, Coro- nado, California, IEEE Computer Society (Pubn #76CHl140-3-C), p.884, 1976

13. EISENBEIS, R A, GILBERT, G G AND AVERY, R B: In- vestigating the relative importance of individual variables and variable subsets in discriminant analysis. Comm Stat 2:205, 1973

14. KANAL, L: Patterns in pattern recognition: 1968- 1974. IEEE Trans Inf Theory IT-20:697, 1974

15. LACHENBRUCH, P A: Discriminant Analysis. Mac- millan, New York 1975

16. LISSACK, T AND FU, K S: Error estimation in pat-

17.

18.

19.

tern recognition via L-distance between posterior density functions. IEEE Trans Inf Theory IT- 22:34, 1976 FISHER, R A: The use of multiple measurements in taxonomic problems. Ann Eugen 7:179, 1936 BHATTACHARYYA, A: On a measure of divergence between two statistical populations defined by their probability distributions. Bull Calcutta Math Soc 35:99, 1943 TOUSSAINT, G T: Some functional lower bounds on the expected divergence for multihypothesis pa t te rn recognition, communicat ion and radar systems. IEEE Trans Syst, Man and Cybern, SMC-1:384, 1971

d. ELECTROCARDIOLOGY, VOL. 14, NO. 3, 1981