risk scoring for prediction of acute cardiac complications from imbalanced clinical data

1894 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 6, NOVEMBER 2014

Risk Scoring for Prediction of Acute CardiacComplications from Imbalanced Clinical Data

Nan Liu, Zhi Xiong Koh, Eric Chern-Pin Chua, Licia Mei-Ling Tan, Zhiping Lin, Senior Member, IEEE,Bilal Mirza, and Marcus Eng Hock Ong

Abstract—Fast and accurate risk stratification is essential inthe emergency department (ED) as it allows clinicians to identifychest pain patients who are at high risk of cardiac complicationsand require intensive monitoring and early intervention. In thispaper, we present a novel intelligent scoring system using heart ratevariability, 12-lead electrocardiogram (ECG), and vital signs wherea hybrid sampling-based ensemble learning strategy is proposedto handle data imbalance. The experiments were conducted ona dataset consisting of 564 chest pain patients recruited at theED of a tertiary hospital. The proposed ensemble-based scoringsystem was compared with established scoring methods such as themodified early warning score and the thrombolysis in myocardialinfarction score, and showed its effectiveness in predicting acutecardiac complications within 72 h in terms of the receiver operationcharacteristic analysis.

Index Terms—Electrocardiography, ensemble learning, heartrate variability (HRV), risk stratification, scoring system.

I. INTRODUCTION

PATIENTS visiting the emergency department (ED) havevarying levels of risk of cardiac complications in the acute

phase of treatment (<72 h). Therefore, early risk stratificationwould be able to help determine suitable treatment strategies andthe proper level of monitoring. The ability to identify high-riskpatients allows timely intervention for preventable and treat-able complications. Meanwhile low-risk patients can be man-aged without excessive intervention and the strain on limitedresources may be reduced. A fast, accurate, and objective triagetool that is able to risk stratify patients with suspected acutecoronary syndromes (ACS) is of value in the ED [1]. Such atool not only helps guide cost-effective patient management andreduce the number of adverse cardiac events, but also has apotential in improving the computer-aided workflow for emer-

Manuscript received October 6, 2013; revised December 15, 2013; acceptedJanuary 17, 2014. Date of publication January 29, 2014; date of current versionNovember 3, 2014.

N. Liu, Z. X. Koh, and M. E. H. Ong are with the Department of EmergencyMedicine, Singapore General Hospital, Singapore 169608 (e-mail: [email protected]; [email protected]; [email protected]).

E. C.-P. Chua is with the Program in Neuroscience and Behavioral Dis-orders, Duke-NUS Graduate Medical School, Singapore 169857 (e-mail:[email protected]).

L. M.-L. Tan is with MOH Holdings, Singapore 099253 (e-mail: [email protected]).

Z. Lin and B. Mirza are with the School of Electrical and ElectronicEngineering, Nanyang Technological University, Singapore 639798 (e-mail:[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JBHI.2014.2303481

gency healthcare [2]. Numerous studies on risk stratificationhave focused on categorizing adverse outcomes of patients intolow-, intermediate-, and high-risk groups so that efficient pa-tient care can be delivered [3]. Risk scores derived from scoringsystems have been widely used to predict clinical outcomes andassess the severity of illness [4]. An accurate outcome predictionallows patients in acute conditions to be considered for propertreatment as early as possible. Scoring systems are usually de-ployed in the ED or in the intensive care unit (ICU), whererapid and accurate decision making is essential [3]. Many scor-ing and risk stratification systems have been developed, includ-ing the thrombolysis in myocardial infarction (TIMI) score [5],the modified early warning score (MEWS) [6], acute physiol-ogy and chronic health evaluation score [7], simplified acutephysiology score [8], and mortality probability model [9].

Predictors in the scoring systems cover a wide range of vari-ables, such as a patients cardiac risk factors, heart rate variabil-ity (HRV) parameters, clinical characteristics, biomarkers, andelectrocardiogram (ECG). These predictors are usually selectedinto risk models based on their statistical significance. However,statistically insignificant predictors do not necessarily have poordiscriminatory power in clinical outcome prediction, especiallywhen machine learning techniques are involved in the decision-making process. Nevertheless, a smaller number of predictorsare usually the best choice for scoring systems as redundantvariables give minor contributions to the prediction and requireextra efforts for data manipulation. The development of scor-ing systems heavily relies on appropriate selection of variableswith which outcomes are associated, but traditional clinical toolshave some limitations. For example, patient clinical history hasnot been shown to correlate well with short-term outcomes [10].Furthermore, the 12-lead ECG parameters have poor sensitivityfor ACS (28–55%) [11] and cardiac biomarkers may take up to12 h to reach detectable levels [12] after an acute myocardial in-farction. Some limitations of current risk stratification systemsfor the prediction of cardiovascular complications have beendiscussed in [4] and [13].

We have previously developed a risk assessment tool (aEuclidean distance-based scoring system called DIST) usingHRV combined with vital signs and showed its utility for iden-tifying high-risk patients visiting the ED [14]–[16]. However,there has not been any study on the correlation of a combina-tion of HRV, 12-lead ECG, and vital signs with adverse cardiacevents. Moreover, most scoring systems are not readily adapt-able to new input features [17] and machine learning-basedscoring systems may encounter issues like data imbalance wherehigh-risk patients make up a small proportion of the study cohort

2168-2194 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

LIU et al.: RISK SCORING FOR PREDICTION OF ACUTE CARDIAC COMPLICATIONS FROM IMBALANCED CLINICAL DATA 1895

[14], [18]. In this study, we aim at proposing an intelligentscoring system to address the aforementioned difficulties, andexploring the utility of combining HRV and 12-lead ECG pa-rameters, and vital signs to predict acute cardiac complicationswithin 72 h. The proposed scoring system takes advantage ofa novel machine learning structure, with which accurate scoresand reliable decisions can be expected. The novel intelligentscoring algorithm and the combined use of HRV and 12-leadECG parameters make this study unique and significant, whichmay serve as an innovative tool in future healthcare settings.

II. METHODS

A. Study Design and Patient Recruitment

We conducted an observational cohort study of 702 chest painpatients recruited from March 2010 to April 2012 at the De-partment of Emergency Medicine, Singapore General Hospital,which is the main acute tertiary hospital in Singapore, serving135 000 patients annually. Ethics approval with a waiver of pa-tient consent was obtained from the Institutional Review Board.All public hospitals in Singapore use a national Patient AcuityCategory Scale (PACS) for triage at the ED. We included PACS 1patients and some PACS 2 patients in the study where PACS 1patients were the most critically ill and required resuscitation,while PACS 2 patients were not in danger of imminent collapsebut were considered critical. The recruited patients were adultmen and women at least 30 years of age. Patients in nonsinusrhythm (for example, asystole, supraventricular and ventriculararrhythmias, and complete heart block) and patients who weredischarged against medical advice or transferred to another hos-pital within 72 h of arrival at the ED were excluded. An eligiblepatient who arrived at the ED was screened and recruited bytrained medical personnel. The primary outcome was a com-posite of four severe complications within 72 h of arrival atthe ED. These severe complications included mortality, cardiacarrest, sustained ventricular tachycardia (VT), and hypotensionrequiring inotropes or intraaortic balloon pump insertion.

B. Data Acquisition and Processing

ECG tracings were monitored by a computer-based softwarewith an ECG sensor (Vernier Software & Technology, Portland,OR, USA) and a data acquisition device NI USB-6215 (Na-tional Instruments, Austin, Texas, USA) over a period of 5 min.ECG signals were sampled at a rate of 125 Hz and the pro-cessing of raw data to obtain the HRV parameters was done us-ing the LABVIEW (Version 8.6, National Instruments, Austin,TX, USA) interface embedded with MATLAB (R2009a, TheMathWorks, Natick, MA, USA) scripts. A detailed descriptionof data acquisition and signal processing can be found in ourprevious works [15], [19], in which a threshold-plus-derivativemethod was used to detect the QRS complexes, and all ec-topic and nonsinus beats were excluded in accordance with theguidelines outlined by the Task Force of the European Societyof Cardiology and the North American Society of Pacing andElectrophysiology [20]. In this study, a total of 16 time domain

TABLE ILIST OF 16 TIME AND FREQUENCY DOMAIN HRV PARAMETERS

and frequency domain HRV parameters were derived, which areelaborated in Table I.

The 12-lead ECG was also measured at the ED withPageWriter TC Series Cardiograph (Philips, Amsterdam,Netherlands) and parameters were either automatically derivedby the device or manually calculated by a doctor. A total oftwelve 12-lead ECG parameters were selected, which includedST elevation, ST depression, T wave inversion, Q wave, QTinterval correction (QTc), QRS axis, left-bundle branch block,right-bundle branch block, intraventricular conduction delay,left-atrial abnormality, left-ventricular hypertrophy, and right-ventricular hypertrophy.

Vital signs such as heart rate, systolic, and diastolic bloodpressure were measured using the Propaq CS (Welch Allyn,Skaneateles Falls, New York, NY, USA) Vital Signs Monitorat presentation to the ED. Respiratory rate and Glasgow comascale were recorded at the time of vital sign measurement. Tym-panic temperatures of the patients were taken using a tympanicthermometer. Other vital signs included pain score and oxygensaturation (SpO2). A total of eight vital signs were recorded forthis study.

In summary, 36 predictive variables were measured, consist-ing of 16 HRV parameters, 12 ECG parameters, and 8 vitalsigns, for the derivation and validation of the proposed scoringsystem. While patients were still in the ED, we also recordedmedical history, drug history, smoking history, family history ofischemic heart disease, and number of angina events in the past24 h. These, along with the present ECG and initial TroponinT serum levels were used to calculate the TIMI risk score [5].Moreover, AVPU scores (A for alert, V for reacting to vocalstimuli, P for reacting to pain, U for unconscious) were recordedaccording to the best response during data collection at triage.The collected data were used to calculate a MEWS score [6] foreach recruited patient.


C. Proposed Ensemble-Based Scoring System (ESS)

With the advancement of computational techniques, machinelearning methods have been found useful for scoring systems toimprove predictive performance, handle imbalanced data, andenhance system adaptability. In this paper, we aim to proposean intelligent scoring system and explore its ability to predictacute cardiac complications within 72 h for chest pain patientspresented to the ED. The scoring system takes advantage of anovel machine learning structure, with which reliable decisionscan be achieved.

By looking into the data collected from the ED, we find thatthe database is highly imbalanced in terms of outcome distribu-tion. That is, there exists a majority class with normal outcomeand a minority class with abnormal outcome (acute cardiaccomplications within 72 h). When applying conventional ma-chine learning algorithms on such an imbalanced dataset, themajority class will dominate the learning process and subse-quently results in poor generalization performance on unknowntesting samples. Typical solutions to handle imbalanced datainclude undersampling majority classes and oversampling mi-nority classes [21]. However, when the prevalence rate is fairlylow (e.g., <4% in our database), neither the state-of-the-artclassification techniques nor conventional imbalance handlingstrategies are able to give satisfactory prediction results. In orderto provide reliable prognosis with HRV, 12-lead ECG, and vitalsigns, discovering novel learning frameworks tailored specifi-cally for the imbalanced data is extremely important and willserve as a key factor in controlling system performance.

A number of innovative solutions to the class-imbalance prob-lem have been proposed at both data and algorithm levels [21].At the data level, these solutions include several forms of re-sampling. At the algorithm level, solutions include adjustingthe costs of various classes to counter the class imbalance. Wehave previously proposed a geometric distance-based scoringsystem [14], where a simple undersampling method was em-ployed. In this paper, we propose a novel ESS for risk stratifica-tion and prediction of acute cardiac complications, in which thecore technique handling class imbalance is built upon a uniquehybrid-sampling strategy.

Given a dataset X = [x1 ,x2 , . . . ,xK ] where each x rep-resents a patient, the min–max normalization [22] is appliedto transform the original inputs into the interval of [−1, 1].Let minA and maxA denote the minimum and maximumvalues of an attribute vector A = [x1(m), . . . ,xK (m)] form = 1, 2, . . . ,M , where M is the total number of HRV pa-rameters, 12-lead ECG parameters, and vital signs, and K isthe total number of patients in the dataset. Min–max normaliza-tion maps a value v of A to v′ in the range [min′

A ,max′A ] by

computing

v′ =v − minA

maxA − minA(max′

A − min′A ) + min′

A (1)

The normalization process is able to preserve the relationshipsamong the original data and to improve learning efficiency.

1) Ensemble Learning-Based Score Prediction: In most sce-narios especially medical settings, we often seek a second or

Fig. 1. Framework of the ensemble learning-based score prediction.

more opinion before making final decisions. By consulting sev-eral experts with various backgrounds, we can weigh their sug-gestions or pick up the most informative one. For example, thesuggestion by a senior clinician could be given a higher weightthan that of a junior clinician. For critically ill cases, final deci-sions may be given by a committee of experts through discussionand voting. Given the real-world needs, many computationalintelligence methods have been investigated and validated tosimulate the process of decision making from multiple experts.These have various names such as ensemble learning systems,mixture of experts, and multiple classifier systems. The philos-ophy behind these techniques is to discover an optimal way tocombine the suggestions of individual experts so as to achievea reliable final decision. Fig. 1 illustrates the general structureof an ensemble learning-based system where each individualexpert is also called a classifier. Each ensemble classifier ϕt

(t = 1, 2, . . . , T where T is the number of individual classifiersin the decision ensemble) will be given a weight o represent theimportance of that classifier. In the proposed ESS algorithm, theweight wt is determined by the contribution to the prediction ofits corresponding classifier ϕt and is derived from the trainingprocess. Ensemble learning methods [21], [23] usually generatea predictive label rather than a score as the output. However,in healthcare applications a risk score is more informative thana class label to clinicians for making decisions. Details of theproposed ESS algorithm are elaborated as follows.

Assume that we have a training dataset L consisting of Ksamples (xk , yk ), where k = 1, 2, . . . ,K and yk is the class la-bel. Given a testing sample x, its label y can be predicted bya single classifier ϕ(x, L) where the class label is either C0 orC1 . Label C0 indicates that the patient is normal (negative out-come), while label C1 indicates that the patient has acute cardiaccomplications within 72 h (positive outcome). As illustrated inFig. 1, we can derive T independent classifiers and their corre-sponding weights from training samples. The risk score on thetesting sample x is calculated as

RSx =

( ∑y∈C1

ϕt(x, L) · wt

)× 100

∑y∈C1

ϕt(x, L) · wt +∑y∈C0

(1 − ϕt(x, L)) · wt

(2)

The output of a classifier ϕt(x, L) is either 0 or 1 and its cor-responding predicted label y is C0 or C1 , respectively. The risk


Fig. 2. Framework of the ensemble learning-based score prediction, where ahybrid-sampling approach handles the learning from imbalanced data.

score is based on the measurements of weighted positive pre-diction and weighted negative prediction. The weighted positiveprediction is defined as the sum of weights whose correspond-ing classifiers predict label C1 , while the weighted negativeprediction is defined as the sum of weights whose correspond-ing classifiers predict label C0 on the testing sample x. Theidea is straightforward and is an attempt to simulate the processof real-world decision making. Since we are dealing with bi-nary class problem in this study, the presentation of risk scorecalculation can be simplified as.

RSx =∑T

t=1 ϕt(x, L) · wt∑Tt=1 wt

× 100 (3)

Next, it is important to determine an approach to select suit-able individual classifiers for decision ensemble creation and aproper way for decision combination. Addressing the previousquestions is difficult for many medical applications where thecollected datasets are usually imbalanced, i.e., positive samplesare much less than negative samples. Learning from imbalanceddata has been well studied in a machine learning community andmany methods have been proposed and evaluated [21]. In thenext section, we will elaborate a novel hybrid-sampling ap-proach to enhance the ESS algorithm to handle imbalanced datawithin the proposed structure of score prediction.

2) Learning From Imbalanced Data With a Novel Hybrid-Sampling Approach: With the framework presented in Fig. 1,we propose a hybrid-sampling approach to extend ESS to ma-nipulate the imbalanced data. The hybrid-sampling-based scor-ing system is illustrated in Fig. 2, and its detailed algorithm ispresented in Fig. 3. Given the minority set P and the majorityset N , the undersampling method randomly samples a subsetNt from N , where |Nt | < |N | and |Nt | = |P |. In this study, Prepresents a set of samples with positive outcomes and N rep-resents a set of samples with negative outcomes. The balanceddataset St consists of both P and Nt and is used for classi-fication model derivation, whereas its corresponding syntheticdataset S ′

t is adopted for model validation. The risk score cal-

Fig. 3. The proposed ESS algorithm for risk scoring where a hybrid-samplingapproach is implemented to handle data imbalance. The synthetic datasets aregenerated with the synthetic minority over-sampling technique (SMOTE).

culation is straightforward. We randomly sample T subsets andtrain T independent classifiers to create the decision ensemble.Different from most ensemble learning methods that combinethe outputs of all classifiers into one composite prediction [24],we calculate the number of positive predictions as well as thenumber of negative predictions and use (2) to estimate a riskscore. As part of the scoring system, linear support vector ma-chine (SVM) [25] is chosen as the individual classifier becauseof its reliable performance and efficiency.

In the proposed ESS algorithm, random undersampling isused for subset selection in the majority class, while oversam-pling is used for evaluating the chosen classifiers. As mentionedin [26], the selection process with an undersampling approachis an unsupervised strategy to explore the majority class, i.e.,we are not able to determine the performance of each individ-ual classifier even though some of them may contribute less tothe decision ensemble. Therefore, the oversampling technique isadopted into the selection process through a supervised learningstrategy such that a robust decision ensemble with discrimina-tory classifiers can be built. Unlike some methods [26], [27]where oversampled data are used to enlarge the training set, weuse the oversampling technique to generate synthetic data forindividual classifiers to derive weights that represent their cor-responding contributions in the decision ensemble. The noveltyof the ESS algorithm is its innovative approach for classifierselection, which is a supervised process that takes the perfor-mance of an individual classifier into account. The syntheticminority over-sampling technique (SMOTE) [27] is employed


TABLE IIPERFORMANCE INDICATORS IN ROC ANALYSIS

to build synthetic datasets. To create a synthetic sample fromxi ∈ St , we randomly select one of its nearest neighbors xi andformulate the synthetic sample as

x′i = xi + (xi − xi) × δ (4)

where δ ∈ [0, 1] is a random number so that the synthetic samplex′ is a point along the line segment joining xi and xi .

As mentioned previously, we seek an alternative way of usingoversampled data, for classifier validation instead of classifiertraining. In the ESS algorithm, a classifier is trained with St andvalidated with S ′

t , and the validation accuracy Acct is recordedto indicate the importance of the classifier. Higher Acct valuemeans more contribution that individual classifier ϕt could giveto the decision ensemble. We build SVM-based ensemble clas-sifier ϕt(x, St) as follows:

ϕt(x, St) = sgn

( ∑x i ∈St

αiyi〈x,xi〉 + b

)(5)

where the linear kernel function is adopted due to its simplicityand efficiency. Those training samples for which αi > 0 arecalled “support vectors,” and lie closest to the decision boundary.Having the outputs of individual classifiers in the ensemble,derivation of the risk score RSx elaborated in the previoussection can be conducted on the testing sample. Note that theoutput of the SVM classifier is±1, while ESS algorithm requiresthe output as either 0 or 1, we need to convert negative SVMoutput into 0 prior to risk score calculation.

D. Performance Measures

We evaluated scoring systems with the leave-one-out cross-validation (LOOCV) framework. Given a dataset of K samples,one sample is selected to validate a scoring model trained withthe rest of K − 1 samples. To complete the LOOCV-based val-idation, all K samples have to be tested individually through Kiterations. Having derived the risk scores for all samples in thedataset, the receiver operation characteristic (ROC) analysis isconducted, with which the area under the curve (AUC), sensi-tivity, specificity, positive predictive value (PPV), and negativepredictive value (NPV) can be derived for performance evalu-ation. Calculation of these performance indicators is presentedin Table II, which is based on four parameters elaborated asfollows. True positive (TP) indicates patients with acute car-diac complications within 72 h correctly predicted as acute car-diac complications within 72 h; false positive (FP) indicateshealthy patients incorrectly predicted as cardiac arrest within72 h; true negative (TN) indicates healthy patients correctlypredicted as healthy; and false negative (FN) indicates patients

TABLE IIIBASELINE CHARACTERISTICS OF STUDY PATIENTS WHERE DATA ARE SHOWN

AS NUMBERS (%) UNLESS OTHERWISE STATED

with acute cardiac complications within 72 h incorrectly pre-dicted as healthy.

III. RESULTS

A. Baseline Characteristics

A total of 702 patients were included in the study, out ofwhich 138 patients were excluded due to unavailable 12-leadECG recordings; 19 out of the remaining 564 patients met theprimary outcome. The baseline characteristics of the recruited564 patients are shown in Table III. The mean age of patientswithout complications and with complications was 60.3 and61.1, respectively. Male patients were more than 60% of the en-tire cohort. In terms of race group, Chinese, Malay, and Indianwere the top three races in this study. Furthermore, by compar-ing patients with complications with those without complica-tions within 72 h, we observed that patients with prior historyof diabetes (36.8% versus 34.7%), stroke (15.8% versus 7.7%),cancer (5.3% versus 4.0%), chronic renal failure (31.6% versus10.6%), congestive heart failure (10.5% versus 5.7%), and my-ocardial infarction (15.8% versus 12.5%) were more likely tosuffer from acute cardiac complications.

B. Performance of Scoring Systems

Fig. 4 illustrates a comparison of the prediction of acute car-diac complications within 72 h between four different scor-ing systems, namely ESS, DIST, MEWS, and TIMI. The pro-posed ESS algorithm using 12-lead ECG, HRV, and vital signsachieved an AUC of 0.837, which was superior to DIST (AUCof 0.720), TIMI (AUC of 0.621), and MEWS (AUC of 0.672).For a clinical prediction model, it is generally considered thatan AUC of less than 0.6 has no clinical value, limited valueat 0.6 to 0.7, modest value at 0.7 to 0.8, and an AUC of


Fig. 4. ROC curves generated with ESS, DIST, MEWS, and TIMI.

greater than 0.8 has discrimination adequate for genuine clinicalutility [28]. Table IV presents the detailed performance in termsof optimal ROC cutoff points. The ranges of scores for ESS andDIST are 0–100, whereas the ranges of scores for both TIMIand MEWS are 0–6 in this study. From the results, we also ob-served small PPV values and large NPV values, which was aresult of imbalanced data where patients without complicationsbelonged to the majority class. As seen from the Table IV thatwith the optimal cutoff score, ESS was able to accurately indi-cate 78.9% of patients with acute cardiac complications within72 h and to filter out 76.5% of patients not meeting the primaryoutcomes. Furthermore, the DIST method was found to achievegood specificity and PPV at the optimal cutoff score, but itscorresponding sensitivity (63.2%) was far from satisfactory. Tohave a more intuitive comparison among different systems, thecutoff score of DIST was tuned to achieve the sensitivity of78.9%, which was the same as those of ESS and TIMI (notethat MEWS method was not able to meet the same sensitivitythrough parameter tuning). By doing this, at the cutoff score of43.9, DIST achieved the specificity of 35.4% (95% CI: 31.4%to 39.4%), PPV of 4.1% (95% CI: 2.1% to 6.1%), and NPV of98.0% (95% CI: 96.0% to 99.9%).

A further investigation on the prediction performance isshown in Fig. 5 where the distributions of the predicted riskscores are depicted by outcome categories, i.e., with or withoutacute cardiac complications within 72 h. The dark gray chartsindicate the score distributions for patients with complicationsin terms of percentage of total patients in the same outcomecategory, and the light gray charts indicate the score distribu-tions for patients without complications. The percentage reflectsthe number of patients of its own outcome category which fallinto a specific range of scores. For example in Fig. 5(a), 53%of patients with complications received risk scores between 80and 100, whereas 9% of patients without complications receivedrisk scores in the same range.

Fig. 5(a), (b), (c), and (d) illustrate the score distributionsobtained by ESS, DIST, MEWS, and TIMI, respectively. Ap-parently, ESS successfully assigned high scores to patients withcomplications and low scores to patients without complications,showing good discriminatory power in distinguishing patientswith different outcomes through risk stratification. Meanwhile,16% of patients with complications fell in the score range of[0, 20], suggesting that ESS needs to be further enhanced forwell handling some positive samples. In general, a good scoringmethod should be able to assign low scores to patients withoutcomplications and high scores to patients with complications.To some extent, the DIST method was still good but it aggre-gated the risk scores of patients with no complications in therange of (40, 60], making it difficult to achieve high specificitywhile maintaining good sensitivity. Previous results have provedthat the sensitivity of 78.9% came with the specificity of 35.4%when the cutoff score of 43.9 was applied to DIST. The per-formance of MEWS was less convincing as it predicted verylow risk scores for 58% of patients with complications. TheTIMI method generally failed to risk stratify patients with dif-ferent outcomes, which was reflected in the ROC analysis that itsAUC of 0.621 was slightly better than 0.5, the AUC of randomguessing.

The comparison of prediction performance by different typesof predictors are presented in Table IV. Four combinations ofpredictors were investigated, namely {ECG, HRV, VS}, {ECG,HRV}, {ECG, VS}, and {HRV, VS} where ECG, HRV, andVS represent 12-lead ECG parameters, HRV parameters, andvital signs, respectively. According to the AUC values, {ECG,HRV, VS} outperformed a combination of any two of these threepredictors. It is noted that {ECG, HRV} and {ECG, VS} hadsimilar performance. Predictor {HRV, VS} received the lowestprediction values compared with predictors that included ECG,suggesting that the 12-lead ECG is a significant predictor inassessing acute cardiac complications.

C. Effects of Parameter Setting

Algorithmic parameters not only control the performance butalso determine the efficiency. Therefore, selection of parametersplays a crucial role in algorithm design and validation. Thereare two important parameters in the ESS algorithm, namely T ,the number of ensemble classifiers, and S, a ratio determiningthe number of synthetic SMOTE samples. These factors haveimpacts on both the prediction performance and the trainingtime. Given that the nature of score prediction is risk stratifi-cation in emergency situations, a tradeoff between performanceand efficiency is ultimately what we are looking for. Table Vpresents the effects of parameter setting where T and S wereindividually evaluated. It is observed that {T = 70, S = 0.3}is the best combination in achieving the highest AUC whilemaintaining moderate training time. All other experiments inthis study were conducted based on this “optimal” parameterset. However, given a new study with totally different clinicaldata, it is suggested to fine-tune these two parameters to get themost satisfactory performance.


TABLE IVPREDICTION RESULTS WITH FOUR DIFFERENT SCORING METHODS AND THE IMPACTS OF VARIABLE SELECTION IN ESS-BASED RISK PREDICTION

Fig. 5. Risk score distributions according to outcomes obtained with fourdifferent scoring methods, namely ESS, DIST, MEWS, and TIMI. The optimalcutoff scores for ESS, DIST, MEWS, and TIMI were 42.3, 51.8, 1, and 1,respectively. Based on these cutoff scores and the results in Table IV, ESSachieved the lowest misclassification rate (21.1%) on positive sample prediction,compared with DIST (36.8%), MEWS (57.9%), and TIMI (21.1%).

IV. DISCUSSION

In this observational cohort study of ED patients with chestpain, the 12-lead ECG combined with HRV and vital signs werefound to strongly associate with acute cardiac complicationswithin 72 h. A novel scoring method ESS has been proposedto integrate multiple sources of predictors for risk stratification,which showed superior performance compared with several ex-

TABLE VPREDICTION RESULTS WITH ESS WHERE THE IMPACTS OF PARAMETERS T

AND S WERE INDIVIDUALLY EVALUATED

isting methods such as TIMI [5], MEWS [6], and an intelligentscoring method DIST [14]. As illustrated in Fig. 5, ESS was thebest performer in accurately identifying both high risk patientsand low risk patients. The ROC analysis further confirmed theeffectiveness of ESS in risk prediction.

A wide range of variables have been considered for use in car-diac risk stratification tools, for example, patient’s cardiac riskfactors, HRV parameters, clinical characteristics, biomarkers,and ECG [3]. Although HRV has been used to predict physi-ological distress for centuries [29], it only received increasingattention in recent decades as a potential predictor of congestiveheart failure [30], coronary artery disease [31], post myocardialinfarction [32], and acute cardiac complications [15]. With theadvancement of computing technology, sensor measurementsand statistical analysis techniques, derivation of a risk predictorbecomes easier and development of complex and accurate riskstratification tools is promising. In this study, we attempted todiscover the utility of HRV, vital signs, and 12-lead ECG for pre-dicting clinical outcomes in critically ill patients presented to theED with the help of machine learning techniques. Through per-formance comparisons conducted with different types of vari-ables, the 12-lead ECG seems promising as a predictor as itsinvolvement significantly improves the performance, which isshown in Table IV.

The ESS method simulates the scenarios in real-world medi-cal settings where more than one opinion is sought before mak-ing final decisions. The weights in the ESS algorithm indi-cate the contribution of their corresponding classifiers and the


determination of the weights are derived from a novel hybridapproach. In the ESS algorithm, large weights strengthen thepower of prediction while small weights weaken the power,which is a strategy refined from the one suggested in [33] thatmany individual classifiers in the ensemble could be better thanall for decision making. The novelty of the ESS algorithm is itshybrid-sampling-based optimization and its ability in handlingimbalanced data. It is worth noting that SMOTE-based tech-nique is one of many methods for weight derivation and it isnot perfect. Given more real data, we would be able to generatemore convincing and reliable weights for individual predictorsto represent their contributions in the decision ensemble. Fur-thermore, the SVM classifier was originally implemented aspart of the ESS algorithm for demonstration purpose. There-fore, the state-of-the-art pattern classification methods could beof great help for improving prediction performance. For exam-ple, a Bayesian belief network framework has been proved to besuccessful in generating a risk score to identify the risk of heartfailure [34].

Experimental results have shown that the 12-lead ECG andHRV are significant predictors of acute cardiac complications;however, it is still unclear which specific ECG-derived param-eters are the real contributors to the prediction. It is also men-tioned in [35] that not all vital signs were found useful in theprediction of clinical outcomes. In order to save time and cost,a clinician still prefers to neglect variables with minor con-tributions. To address these questions, we intend to conducta detailed analysis of the data by means of variable selectiontechniques [36], [37] in our future work.

Due to the fact that this is a single-center study at an acutetertiary hospital in Singapore, our findings may not be gener-alizable to other populations. Another limitation of this studyis that input measurements came from different resources andwere collected either automatically from machines or manuallyby experienced medical staff. Therefore, operational errors werepossible even though quality control was carried out. Further-more, validation of the scoring system on an independent datasetwould make this work more acceptable in clinical practice.

V. CONCLUSION

In this paper, we have proposed a novel risk-scoring methodESS that integrated HRV, 12-lead ECG, and vital signs for theprediction of acute cardiac complications within 72 h, wherea hybrid-sampling approach was invented to handle data im-balance. Through experimental validations based on an imbal-anced dataset collected from a tertiary hospital in Singapore,ESS has proved its efficiency in predicting clinical outcomescompared with several established scoring systems in terms ofthe ROC analysis. It was observed through the study that the 12-lead ECG was a significant predictor and its combined use withHRV and vital signs achieved satisfactory performance. Further-more, machine learning techniques have shown improvementsover traditional statistics-derived methods, suggesting that in-telligent algorithms are potential solutions to enhance medicaldecision making. However, computational predictions should becompared with clinical knowledge prior to reaching decisions.

REFERENCES

[1] H. V. Huikuri, A. Castellanos, and R. J. Myerburg, “Sudden death due tocardiac arrhythmias,” N. Engl. J. Med., vol. 345, no. 20, pp. 1473–1482,2001.

[2] J. Wang, “Emergency healthcare workflow modeling and timeliness anal-ysis,” IEEE Trans. Syst., Man, Cybern. A, Syst. Humans, vol. 42, no. 6,pp. 1323–1331, Nov. 2012.

[3] S. Marcoon, A. M. Chang, B. Lee, R. Salhi, and J. E. Hollander, “HEARTscore to further risk stratify patients with low TIMI scores,” Crit. Pathw.Cardiol., vol. 12, pp. 1–5, 2013.

[4] J. L. Vincent and R. Moreno, “Clinical review: Scoring systems in thecritically ill,” Crit. Care, vol. 14, p. 207, 2010.

[5] E. Antman, M. Cohen, P. Bernink, C. McCabe, T. Horacek, G. Papuchis,B. Mautner, R. Corbalan, D. Radley, and E. Braunwald, “The TIMI riskscore for unstable angina/non-ST elevation MI—A method for prognosti-cation and therapeutic decision making,” JAMA, vol. 284, no. 7, pp. 835–842, 2000.

[6] C. P. Subbe, R. G. Davies, E. Williams, P. Rutherford, and L. Gemmell,“Effect of introducing the modified early warning score on clinical out-comes, cardio-pulmonary arrests and intensive care utilisation in acutemedical admissions,” Anaesthesia, vol. 58, pp. 797–802, 2003.

[7] W. A. Knaus, E. A. Draper, D. P. Wagner, and J. E. Zimmerman,“APACHE II: A severity of disease classification system,” Crit. CareMed., vol. 13, pp. 818–829, 1985.

[8] J. R. Le Gall, S. Lemeshow, and F. Saulnier, “A new simplified acute phys-iology score (SAPS II) based on a European/North American multicenterstudy,” JAMA, vol. 270, pp. 2957–2963, 1993.

[9] S. Lemeshow, D. Teres, J. Klar, J. S. Avrunin, S. H. Gehlbach, andJ. Rapoport, “Mortality probability models (MPM II) based on an interna-tional cohort of intensive care unit patients,” JAMA, vol. 270, pp. 2478–2486, 1993.

[10] J. Sanchis, V. Bodi, J. Nunez, X. Bosch, P. Lorna-Sorio, L. Mainar,E. Santas, G. Minana, R. Robles, and A. Llacer, “Limitations of clini-cal history for evaluation of patients with acute chest pain, non-diagnosticelectrocardiogram, and normal troponin,” Amer. J. Cardiol., vol. 101,no. 5, pp. 613–617, 2008.

[11] F. Fesmire, R. Percy, J. Bardoner, D. Wharton, and F. Calhoun, “Useful-ness of automated serial 12-lead ECG monitoring during the initial emer-gency department evaluation of patients with chest pain,” Ann. Emerg.Med., vol. 31, no. 1, pp. 3–11, 1998.

[12] A. Bakker, M. Koelemay, J. Gorgels, B. Vanvlies, R. Smits, J. Tijssen,and F. Haagen, “Failure of new biochemical markers to exclude acutemyocardial-infarction at admission,” Lancet, vol. 342, no. 8881, pp. 1220–1222, 1993.

[13] A. F. Manini, N. Dannemann, D. F. Brown, J. Butter, F. Bamberg,J. T. Nagurney, J. H. Nichols, and U. Hoffmann, “Limitations of riskscore models in patients with acute chest pain,” Amer. J. Emerg. Med.,vol. 27, no. 1, pp. 43–48, 2009.

[14] N. Liu, Z. Lin, J. Cao, Z. X. Koh, T. Zhang, G.-B. Huang, W. Ser, andM. E. H. Ong, “An intelligent scoring system and its application to car-diac arrest prediction,” IEEE Trans. Inf. Technol. Biomed., vol. 16, no. 6,pp. 1324–1331, Nov. 2012.

[15] M. E. H. Ong, C. H. Ng, K. Goh, N. Liu, Z. X. Koh, N. Shahidah, T. Zhang,S. Fook-Chong, and Z. Lin, “Prediction of cardiac arrest in critically illpatients presenting to the emergency department using a machine learningscore incorporating heart rate variability compared with the modified earlywarning score,” Crit. Care, vol. 16, p. R108, 2012.

[16] M. E. H. Ong, K. Goh, S. Fook-Chong, B. Haaland, K. L. Wai, Z. X. Koh,N. Shahidah, and Z. Lin, “Heart rate variability risk score for predictionof acute cardiac complications in ED patients with chest pain,” Amer. J.Emerg. Med., vol. 31, pp. 1201–1207, 2013.

[17] C. B. Pearce, S. R. Gunn, A. Ahmed, and C. D. Johnson, “Machine learn-ing can improve prediction of severity in acute pancreatitis using admis-sion values of APACHE II score and C-reactive protein,” Pancreatology,vol. 6, pp. 123–131, 2006.

[18] M. Khalilia, S. Chakraborty, and M. Popescu, “Predicting disease risksfrom highly imbalanced data using random forest,” BMC Med. Inform.Decis. Mak., vol. 11, pp. 51:1–51:13, 2011.

[19] N. Liu, Z. Lin, Z. X. Koh, G.-B. Huang, W. Ser, and M. E. H. Ong, “Patientoutcome prediction with heart rate variability and vital signs,” J. SignalProcess. Syst., vol. 64, pp. 265–278, 2011.

[20] Task Force of the European Society of Cardiology the North AmericanSociety of Pacing Electrophysiology, “Heart rate variability: Standards ofmeasurement, physiological interpretation, and clinical use,” Circulation,vol. 93, pp. 1043–1065, 1996,


[21] H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans.Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, Sep. 2009.

[22] J. Han and M. Kamber, Data Mining: Concepts and Techniques. SanMateo, CA, USA: Morgan Kaufmann, 2006.

[23] N. Liu and H. Wang, “Ensemble based extreme learning machine,” IEEESignal Process. Lett., vol. 17, no. 8, pp. 754–757, Aug. 2010.

[24] R. Polikar, “Ensemble based systems in decision making,” IEEE CircuitsSyst. Mag., vol. 6, no. 3, pp. 21–45, Oct.–Dec. 2006.

[25] C. J. C. Burges, “A tutorial on support vector machines for pattern recog-nition,” Data Min. Knowl. Discov., vol. 2, pp. 121–167, 1998.

[26] X. Y. Liu, J. Wu, and Z. H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Trans. Syst. Man. Cybern. B Cybern., vol. 39,no. 2, pp. 539–550, Apr. 2009.

[27] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE:synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16,no. 1, pp. 321–357, 2002.

[28] E. M. Ohman, C. B. Granger, R. A. Harrington, and K. L. Lee, “Risk strat-ification and therapeutic decision making in acute coronary syndromes,”JAMA, vol. 284, pp. 876–878, 2000.

[29] M. E. H. Ong, P. Padmanabhan, Y. H. Chan, Z. Lin, J. Overton,K. R. Ward, and D. Y. Fei, “An observational, prospective study explor-ing the use of heart rate variability as a predictor of clinical outcomesin pre-hospital ambulance patients,” Resuscitation, vol. 78, pp. 289–297,2008.

[30] B. M. Szabo, D. J. van Veldhuisen, N. van der Veer, J. Brouwer, P. A. DeGraeff, and H. J. Crijns, “Prognostic value of heart rate variability inchronic congestive heart failure secondary to idiopathic or ischemic dilatedcardiomyopathy,” Amer. J. Cardiol., vol. 79, pp. 978–980, 1997.

[31] L. Fei, X. Copie, M. Malik, and A. J. Camm, “Short- and long-term assess-ment of heart rate variability for risk stratification after acute myocardialinfarction,” Amer. J. Cardiol., vol. 77, pp. 681–684, 1996.

[32] C. Carpeggiani, A. L’Abbate, P. Landi, C. Michelassi, M. Raciti,A. Macerata, and M. Emdin, “Early assessment of heart rate variabil-ity is predictive of in-hospital death and major complications after acutemyocardial infarction,” Int. J. Cardiol., vol. 96, pp. 361–368, 2004.

[33] Z. H. Zhou, J. Wu, and W. Tang, “Ensembling neural networks: Manycould be better than all,” Artif. Intell., vol. 137, pp. 239–263, 2002.

[34] S. Sarkar and J. Koehler, “A dynamic risk score to identify increased riskfor heart failure decompensation,” IEEE Trans. Biomed. Eng., vol. 60,no. 1, pp. 147–150, Jan. 2013.

[35] W. Hong, A. Earnest, P. Sultana, Z. X. Koh, N. Shahidah, andM. E. H. Ong, “How accurate are vital signs in predicting clinical out-comes in critically ill emergency department patients,” Eur. J. Emerg.Med., vol. 20, pp. 27–32, 2013.

[36] M. Llamedo and J. P. Martınez, “Heartbeat classification using featureselection driven by database generalization criteria,” IEEE Trans. Biomed.Eng., vol. 58, no. 3, pp. 616–625, Mar. 2011.

[37] H. Liu and L. Yu, “Toward integrating feature selection algorithms forclassification and clustering,” IEEE Trans. Knowl. Data Eng., vol. 17,no. 4, pp. 491–502, Apr. 2005.

Authors’ photographs and biographies not available at the time of publication.

risk scoring for prediction of acute cardiac complications from imbalanced clinical data

Documents

risk scoring

levels of risk

accurate risk stratication

proposed ensemble

emergency department

established scoring

chest pain patients

introduction patients