Çukurova university institute of natural and applied ... · marquardt geri yayılım algoritması...

ÇUKUROVA UNIVERSITY INSTITUTE OF NATURAL AND APPLIED SCIENCES

MSc THESIS

Esra MAHSERECİ KARABULUT

A RESEARCH ON PERFORMANCE OF DECISION SUPPORT SYSTEMS IN DIAGNOSIS OF CORONARY ARTERY DISEASE

DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING

ADANA, 2012

ÇUKUROVA UNIVERSITY INSTITUTE OF NATURAL AND APPLIED SCIENCES

A RESEARCH ON PERFORMANCE OF DECISION SUPPORT SYSTEMS

IN DIAGNOSIS OF CORONARY ARTERY DISEASE


MSc THESIS

DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING We certify that the thesis titled above was reviewed and approved for the award of degree of the Master of Science by the board of jury on 20/06/2012. ………………............................ …………………………………….. …….............................. Asst. Prof. Dr. Turgay İBRİKÇİ Assoc. Prof. Dr. Selma Ayşe ÖZEL Asst. Prof. Dr. Sami ARICA SUPERVISOR MEMBER MEMBER This MSc Thesis is written at the Department of Institute of Natural And Applied Sciences of Çukurova University. Registration Number:

Prof. Dr. M. Rifat ULUSOY Director Institute of Natural and Applied Sciences

This thesis was financially supported by Ç.U. academic resource foundation MMF2011YL19 Not:The usage of the presented specific declerations, tables, figures, and photographs either in this

thesis or in any other reference without citiation is subject to "The law of Arts and Intellectual Products" number of 5846 of Turkish Republic

I

ABSTRACT

MSc THESIS

A RESEARCH ON PERFORMANCE OF DECISION SUPPORT SYSTEMS IN DIAGNOSIS OF CORONARY ARTERY DISEASE


ÇUKUROVA UNIVERSITY

INSTITUTE OF NATURAL AND APPLIED SCIENCES DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING

Supervisor :Asst. Prof. Dr. Turgay İBRİKCİ Year: 2012, Pages: 67 Jury :Asst. Prof. Dr. Turgay İBRİKCİ :Assoc. Prof. Dr. Selma Ayşe ÖZEL :Asst. Prof. Dr. Sami ARICA

Coronary Artery Disease (CAD) is a common heart disease related with disorders effecting heart and blood vessels. Since the disease is one of the leading reasons of heart attacks and thus deaths, diagnosis of the disease in its early stages or in cases when patients don’t show many of the symptoms has considerable importance. The increase in prevalence in CAD in the world also increased the work for early diagnosis and treatment of cardiovascular diseases. Clinical decision support systems (CDSS) have become an important part of various medical areas in diagnosis in the last few decades.

In this thesis, a research on computational tools is presented to diagnose the CAD in order to support clinical decision making processes. Real-life data is used for our research in order to have convincing experimental results. . These computational tools include decision support systems of artificial neural networks, decision trees and Bayesian networks. Also ensemble systems and effect of feature selection is investigated to improve decision making for diagnosis of CAD in this thesis study. Furthermore, a new method is proposed which employs Rotation Forest ensemble system with base classifiers of artificial neural networks, which are trained by Levenberg-Marquardt backpropagation algorithm. This learning algorithm is selected among several back propagation algorithms because of its superior performance on the CAD dataset. The proposed method reaches a high accuracy and provides a good option for large population diagnosis. The obtained accuracy rate is 91.2%, which is, to the best of our knowledge, the best rate achieved thus far in the relevant literature using the same dataset.

Key Words: Coronary artery disease, decision support systems, Rotation Forest,

artificial neural networks, backpropagation

II

ÖZ

YÜKSEK LİSANS TEZİ

KARAR DESTEK SİSTEMLERİNİN KORONER ARTER HASTALIĞI TEŞHİSİNDEKİ PERFORMANSI ÜZERİNE BİR ARAŞTIRMA


ÇUKUROVA ÜNİVERSİTESİ FEN BİLİMLERİ ENSTİTÜSÜ

ELEKTRİK ELEKTRONİK MÜHENDİSLİĞİ ANABİLİM DALI

Danışman :Yrd. Doç. Dr. Turgay İBRİKCİ Yıl: 2012, Sayfa: 67 Jüri :Yrd. Doç. Dr. Turgay İBRİKCİ :Doç. Dr. Selma Ayşe ÖZEL :Yrd. Doç. Dr. Sami ARICA

Koroner Arter Hastalığı (KAH) kalbi ve kan damarlarını etkileyen düzensizliklerle ilgili yaygın bir kalp hastalığıdır. Hastalık kalp krizi ve ölümlerin başlıca sebebi olduğu için, erken evrelerde veya hasta belirtilerin birçoğunu göstermediğinde hastalığın teşhis edilmesi önemlidir. Dünyada KAH sıklığındaki artış aynı zamanda kalp ve damar hastalıklarının erken teşhis ve tedavi çalışmalarını da artırmıştır. Son birkaç on yıldır klinik karar destek sistemleri teşhis koymada çeşitli tıbbi alanların önemli bir parçası haline gelmiştir. Bu tezde klinik karar verme sürecini desteklemek amacıyla KAH hastalığını teşhis etmek için bilişimsel gereçler üzerine bir araştırma sunuldu. Araştırmamızda deneysel sonuçların ikna edici olması için gerçek-hayat verisi kullanılmıştır. Bu bilişimsel gereçler, karar destek sistemleri olan yapay sinir ağları, karar ağaçları ve Bayes ağlarını içermektedir. Aynı zamanda topluluk sistemleri ve özellik seçiminin etkisi de KAH teşhisinde karar vermeyi iyileştirmek için bu tez çalışmasında araştırılmıştır. Ayrıca taban sınıflandırıcıları yapay sinir ağları olan Rotation Forest topluluk sistemini kullanan yeni bir yöntem önerilmiştir, bu sinir ağları Levenberg-Marquardt geri yayılım algoritması ile eğitilmektedir. Bu öğrenme algoritması birçok geri yayılım algoritması içinden KAH verisindeki daha üstün performansı nedeniyle seçilmiştir. Önerilen yöntem yüksek bir doğruluğa ulaşıyor ve geniş nüfus için teşhis sağlıyor. Elde edilen doğruluk oranı %91.2’dir, bildiğimiz kadarıyla aynı veri setini kullanan ilgili literatür arasında ulaşılmış en yüksek orandır.

Anahtar Kelimeler: Koroner arter hastalığı, karar destek sistemleri, Rotation Forest,

yapay sinir ağları, geri yayılım

III

ACKNOWLEDGEMENTS

I am grateful to my supervisor, Assist. Prof. Turgay İBRİKÇİ, whose

encouragement, guidance and support motivated me in research and study of the

thesis. Furthermore he was always accessible and willing to help in every level of the

study.

It is a pleasure to thank to my thesis committee members and advisors Assoc.

Prof. Dr. Selma Ayşe ÖZEL and Asst. Prof. Dr. Sami ARICA for valuable insight

they shared and guiding advices.

I also thank to my mother Nuran MAHSERECİ and mother-in-law Meral

KARABULUT for their support by looking after my children Erva and Cengiz

during the completion of the thesis.

I would like to show my deepest gratitude to my husband, Mustafa

KARABULUT. His support and patience has taught me so much about discipline

and his experience broadened my perspective in the study of this thesis.

IV

CONTENTS PAGE ABSTRACT .................................................................................................................. I

ÖZ ................................................................................................................................. II

ACKNOWLEDGEMENTS ....................................................................................... III

CONTENTS……………………………………………………………………... .... IV

LIST OF TABLES ...................................................................................................... V

LIST OF FIGURES ................................................................................................... VI

LIST OF ABBREVIATIONS ................................................................................... VII

1. INTRODUCTION ................................................................................................... 1

1.1. Coronary Artery Disease and Risk Factors ....................................................... 1

1.2. General Characteristics of a CAD Patient ......................................................... 5

1.3. Aim and Scope of the Thesis ............................................................................ 6

2. RELATED WORKS ................................................................................................ 9

3. MATERIAL AND METHODS ............................................................................. 13

3.1. Dataset Descriptions........................................................................................ 13

3.2. Methods ........................................................................................................... 14

3.2.1. Artificial Neural Networks .................................................................... 14

3.2.2. Bayesian Classification ......................................................................... 20

3.2.3. Decision Trees ....................................................................................... 24

3.2.4. Ensemble Systems ................................................................................. 28

3.2.5. Feature Selection ................................................................................... 31

4. RESULTS AND DISCUSSION ............................................................................ 35

4.1. Evaluation Metrics .......................................................................................... 35

4.2. Employing Artificial Neural Networks for Diagnosis of CAD ...................... 37

4.3. Using Bayesian Networks and Decision Trees for Diagnosis of CAD ........... 39

4.4. Effect of Feature Selection on Diagnosis of CAD .......................................... 43

4.5. Employing Ensemble Methods for Diagnosis of CAD ................................... 50

4.6. More on Rotation Forest Ensemble and a New Method Proposal for CAD

Diagnosis ......................................................................................................... 51

V

5. CONCLUSIONS .................................................................................................... 57

REFERENCES……………………………………………………….………….. .... 59

CURRICULUM VITAE….………………………………………….…………... .... 67

VI

LIST OF TABLES PAGE

Table 1.1. Coronary artery disease risk factors (Onat et al., 2002)............................ 3

Table 3.1. CAD dataset summary ............................................................................ 13

Table 4.1. Accuracy values of backpropagation algorithms on CAD data .............. 38

Table 4.2. Performance values of BNs constructed by three different optimization

techniques on CAD data. ......................................................................... 40

Table 4.3. Confusion matrix of 303 data of CAD using HillClimber ...................... 41

Table 4.4. Confusion matrix of 303 data of CAD using Simulated Annealing ....... 41

Table 4.5. Confusion matrix of 303 data of CAD using TAN ................................. 42

Table 4.6. Evaluation results of five decision trees according to CAD data ........... 42

Table 4.7. Features selected by Relief-F filter ......................................................... 44

Table 4.8. Effect of Relief-F filter on classification performance ........................... 44

Table 4.9. Features selected by Gain Ratio filter ..................................................... 46

Table 4.10. Effect of Gain Ratio filter on classification performance ....................... 46

Table 4.11. Features selected by Symmetrical Uncertainty filter .............................. 48

Table 4.12. Effect of Symmetrical Uncertainty on classification performance ......... 48

Table 4.13. Accuracy values of ensemble classifiers using different base

classifiers ................................................................................................. 50

Table 4.14. Sensitivity values of ensemble classifiers using different base

classifiers ................................................................................................. 50

Table 4.15. Specificity values of ensemble classifiers using different base

classifiers ................................................................................................. 50

Table 4.16. Classification results of CAD dataset applied in different classifiers. .... 53

Table 4.17. Classification results of RF algorithm with different base classifiers .... 54

Table 4.18. Classification accuracy results of literature methods that utilize the

same dataset ............................................................................................ 53

VIII

LIST OF FIGURES PAGE

Figure 1.1. Diagram of the Coronary Arteries (Texas Heart Institute, 2011) ....... 2

Figure 1.2. Normal artery and narrowing of artery (NHLBI, 2011) ..................... 2

Figure 1.3. How a heart attack happens (Healthwise, 2011) ................................ 5

Figure 3.1. A neuron with single input and bias ................................................. 15

Figure 3.2. A neuron with vector input and bias ................................................. 15

Figure 3.3. The tansig transfer function .............................................................. 16

Figure 3.4. logsig transfer function ..................................................................... 17

Figure 3.5. Representation of a multilayer perceptron with one hidden layer .... 18

Figure 3.6. A training process flowchart using backpropagation algorithm

(Moghadassi et al., 2009) ................................................................. 19

Figure 3.7. Representation of Naïve Bayes DAG as a BN ................................. 22

Figure 3.8. A Simple Bayesian Network made up of a DAG and probability

tables ................................................................................................. 23

Figure 3.9. A simple decision tree for Heart Disease (HD) diagnosis ................ 25

Figure 3.10. An ensemble system with three base classifiers ............................... 28

Figure 4.1. Experimental setup of the artificial neural network used for CAD

diagnosing ........................................................................................ 38

Figure 4.2. Comparison of performances of BP algorithms on CAD data. ........ 39

Figure 4.3. ROCs of classification of BayesNetwork using HillClimber,

Simulated Annealing and TAN respectively .................................... 40

Figure 4.4. Accuracies of classifiers with and without Relief-F filter ................ 45

Figure 4.5. AUCs of classifiers with and without Relief-F filter ........................ 45

Figure 4.6. Accuracies of classifiers with and without Gain Ratio filter ............ 47

Figure 4.7. AUCs of classifiers with and without Gain Ratio filter .................... 47

Figure 4.8. Accuracies of classifiers with and without Symmetrical Uncertainy

filter .................................................................................................. 49

Figure 4.9. AUCs of classifiers with and without Symmetrical Uncertainty

filter ................................................................................................... 49

Figure 4.10. Comparison of performances of ensemble algorithms using different

base classifiers with respect to accuracy. ......................................... 51

IX

Figure 4.11. Effect of RF algorithm on different classifiers ................................. 55

Figure 4.12. ROC analysis of Levenberg-Marquardt algorithm with and without

RF ..................................................................................................... 55

X

LIST OF ABBREVIATIONS

Acc : Accuracy

AUC : Area Under Curve

ANN : Artificial Neural Network

BN : Bayesian Network

CAD : Coronary Artery Disease

CBR : Case Based Reasoning

CDSS : Clinical Decision Support Systems

CP : Chest Pain

DAG : Directed Acyclic Graph

DM : Diabetes Mellitus

DSS : Decision Support Systems

FN : False Negatives

FP : False Positives

FT : Functional Tree

GR : Gain Ratio

HD : Heart Disease

HDL : High Density Lipoprotein

IB1 : Instance Based Learning

IG : Information Gain

LDL : Low Density Lipoprotein

MAE : Mean Absolute Error

MI : Myocardial Infarction

MLP : Multi Layer Perceptron

PCA : Principal Component Analysis

RBF : Radial Basis Function

RF : Rotation Forest

ROC : Receiver Operating Characteristics

SCG : Scaled Conjugate Gradient

Sn : Sensitivity

Sp : Specificity

XI

SU : Symmetrical Uncertainty

TAN : Tree Augmented Bayesian Network

TN : True Negatives

TP : True Positives

UCI : University of California Irvine

WNN : Wavelet Neural Network

1. INTRODUCTION Esra MAHSERECİ KARABULUT

1

1. INTRODUCTION

Coronary Artery Disease (CAD), which refers to a wide variety of diseases

and disorders affecting the heart and the blood vessels, is the most common type of

heart disease. According to 2006 statistics, 26% of deaths in the United States are

caused by heart disease, more than one in every four (Heron et al., 2009) as well as

in other countries such as Russia, New Zealand, Australia, and even in Europe. It is a

common cause of heart attacks and thus the most deadly disease in the world

(Setiawan et al., 2009), also the most common cause of sudden death in people who

are over 20 years old. By 2030, almost 23.6 million people are expected to die from

cardiovascular diseases, mainly from heart disease and stroke (WHO, 2011). This

situation also causes labor shortage and financial burden. It is necessary to get the

risk factors under control to prevent cardiovascular diseases.

The increase in prevalence in CAD in the world also increased the work for

early diagnosis and treatment of cardiovascular diseases. Clinical decision support

systems (CDSS) become an important part of various medical areas in diagnosis in

the last few decades. Not only diagnosis accuracy is improved by this way, but also

clinical complexity, details, cost control are managed and duplicate or unnecessary

tests are avoided (Perreault and Metzger, 1999). CDSS support experiences of

physicians and are a component of medical technology. Motivated by these facts we

achieved a research on improving CAD diagnosis by various decision support

systems.

1.1. Coronary Artery Disease and Risk Factors

Coronary arteries are two major vessels that provide blood, oxygen and

nutrients to the heart as represented in Figure 1.1. The narrowing and blockage of

these arteries is called atherosclerosis which cause CAD. Atherosclerosis is the

accumulation of cholesterol and fatty material (called plaques) on the inner walls of

the arteries. These plaques cause reduced or absent blood flow to the heart and result

in shortage of oxygen and vital nutrients it needs to work properly. Figure 1.2


2

represents normal artery and that kind of narrowed artery. This shortage of blood

flow causes chest pain or angina. If plaque completely blocks the artery, it may

cause a heart attack.

Figure 1.1. Diagram of the Coronary Arteries (Texas Heart Institute, 2011)

Figure 1.2. Normal artery and narrowing of artery (NHLBI, 2011)


3

Many factors can cause a higher risk for CAD; they may or may not be

related to the lifestyle of a patient. Therefore, risk factors may be divided as

changeable and unchangeable.

Table 1.1. Coronary artery disease risk factors (Onat et al., 2002) Risk Factor Description Status

Age Greater than 45 for men, and 55 for women or early menopause Unchangeable

Sex More often in men Unchangeable

Family History First degree relatives has CAD before 55 for men, and 65 for women Unchangeable

Smoking A pocket of cigarette a day increase CAD risk two times. Changeable

High Blood Pressure (Hypertension)

≥140/90 mmHg antihypertensive usage Changeable

Total Cholesterol ≥200 mg/dl Changeable High LDL ≥130 mg/dl Changeable Low HDL <40 mg/dl Changeable Diabetes Mellitus (DM)

Takes a risk of equivalent to existence of CAD Changeable

On average, CAD appears in women 10 years later than men. MI (myocardial

infarction) and other complications are seen later. Men aged between 40 and 65 are 7

times more diseased than women (Işık, 1986). In cardiovascular studies, the ratio of

vascular diseased men and women between 65 and70 ages are 33% and 22%

respectively. For old people over 85 years, these ratios are 45% and 43% for men and

women respectively (Kuller et al., 1998).

Many studies determined that, there is a relationship between CAD risk and

early beginning CAD when a first degree relative in family is diseased with CAD

(Hopkins and Williams, 1989). This risk continues even if other risk factors are

eliminated. If a male family member younger than 55 years old or a female family

member younger than 65 years old has CAD, then the risk is accepted to exist in the

family history. If the diseased family member is younger or the number of diseased

family members increases the risk also increases (Rissanen, 1979; Bassuk, 2008;

Dursun, 2010).


4

Smoking is as serious risk factor like high blood pressure and is important in

aspect of it is changeable. Smoking causes unhealthy cholesterol levels by tightening

blood vessels. CAD risk decreases in people who abandon cigarette dramatically

(Korkmaz, 1997). Cardiovascular diseases can exist even in passive smokers.

Cardiac deaths increase to 2.7 times for men, and 4.7 times for women related to

smoking (Onat et al., 2002).

High blood pressure is a major risk factor that speeds up atherosclerosis

formation. People having high blood pressure are under 2-3 times more risk than

people having normal blood pressure (Dressler, 2010). An important property of high

blood pressure is that it can be taken under control with 90% possibility by proper

drug treatment (Korkmaz, 1997).

Malnutrition can lead to obesity, diabetes and abnormal cholesterol which are

also causes of CAD. Obesity can be also caused by genetics and hormonal disorders.

Abnormal cholesterol results in increase in LDL (low density lipoprotein, bad

cholesterol) and decrease in HDL (high density lipoprotein, good cholesterol). LDL

cholesterol accumulates on the inner walls of arteries and increases the chance of

having heart disease. Therefore LDL values are required to be low. HDL cholesterol

protects the arteries by preventing LDL cholesterol from building in the arteries.

Therefore HDL values are expected to be high.

Diabetes mellitus (DM) is a kind of metabolic disease that is caused by

insufficient production of insulin or the inability of human to respond to the insulin

formed in the body. Therefore, a high level of sugar (glucose) exists in blood.

Diabetes damages a membrane called endothelium in the inner wall of arteries and

causes atherosclerosis; arteries are hardened and normal blood flow is prevented.

Another risk factor, physical inactivity can cause obesity, hypertension and

decrease in cardiovascular function capacity (Dressler, 2010). It is recommended to

walk 30 minutes a day. Regular physical exercises reduce obesity and risk of

diabetes. But those suffering CAD must avoid sudden or irregular exercises, because

it increases the risk of MI. Also emotional factors such as depression, stress, social

isolation increase the cardiovascular risk. Such factors can lead to high blood

pressure, arterial damage or irregular heart rhythms. People that have such emotional


5

problems usually have tendency to smoking, using drugs, drinking alcohol

excessively or overeating; therefore linking to other risk factors. They report the

social visit quantity of half of other patients; isolated patients were usually unmarried

and have not an intimate. (Beverley et al., 2001).

1.2. General Characteristics of a CAD Patient

Heart muscle works continuously and always needs blood supply. When a

patient works strenuously this need increases. This situation leads patient to feel pain

or discomfort such as tightness, pressure, burning or squeezing. Shortness of breath

may also occur. This is the most common symptom of CAD which called angina.

Angina can be felt not only in the chest, but also in the left shoulder, arms, neck,

back or jaw. Angina is more frequent in cold weather because vessels may contract,

increasing the work of the heart and decreasing the blood supply to the heart at the

same time. Symptoms repeat in stable angina when the patient repeats the same

strenuous activity and disappear when the patient rests. But unstable angina lasts

longer, happens immediately and can occur while patient is resting. It is a warning of

heart attack and requires treatment.

Figure 1.3. How a heart attack happens (Healthwise, 2011)


6

Some of the plaque in atherosclerosis progress may be in clot form and

temporarily blocks the artery. This situation lead to sudden angina until the clot

resolves. Figure 1.3 represents such a blockage. This sudden blockage is called acute

coronary syndrome which is a medical emergency. Therefore, if the blood supply for

heart can’t be provided for more than half an hour heart muscles start to die because

of the shortage of oxygen. This means a heart attack (i.e. myocardial infarction, MI).

Blockage of a coronary artery can cause patient to have a serious heart beat

irregularity (arrhythmia) which is a disorder in the hearth’s electrical activity. Heart

muscles are damaged and caused electrical instability in the patients suffering from

CAD or MI. Dizziness, nausea, and sweating are some other symptoms in CAD.

Sometimes no symptoms are present, which results in difficulty in the diagnosis of

the heart disease.

1.3. Aim and Scope of the Thesis

Decision Support Systems (DSSs), which means computer technologies for

decision making or solutions to problems, are getting more interest of researchers

recently in medical decision making. DSSs include tools which are proved to have a

considerable success in disease diagnosis. They improve the quality of clinical

decisions, while preventing much of the human supplied errors and provide a more

qualified service to patients.

The aim of this thesis is to develop DSSs to improve diagnosis of CAD for

supporting clinical decision making. CAD is one of the leading ones in the world

resulting in deaths of patients. Sometimes no symptoms are present, which results in

difficulty in the diagnosis of heart disease. Diagnosing the disease in its early stages

is of great importance and therefore several methods are utilized to diagnose CAD.

Our aim is to develop computer-based solutions to solve difficulties of clinical

decision making. Artificial neural networks, decision trees and Bayesian networks

are included for this thesis. Real-life data is used for our research in order to have

experimental results to be a convincing proof. Ensemble systems and effect of

feature selection are also in the research issue. In the result of the study, we aim to


7

effectively categorize patients as they have CAD or not. This categorization is made

according to cheap and available data such as patient age, sex and results of some

laboratory experiments. To achieve this aim not only individual decision support

systems, but also ensembles of these systems are studied in the scope of this thesis.

Although ensemble systems are an active research field (Opitz and Maclin, 1999) in

machine learning and pattern recognition, only few studies (Das et al., 2009; Detrano

et al., 1989) are present in the literature which diagnoses CAD by means of

computer-based methods using noninvasive and widely available data. Namely we

aim to reach a high accuracy and provide a good option for large population

diagnosis.

In the study of ensemble systems of the thesis a new method is proposed

which is the first study in literature that utilizes RF to diagnose CAD. In this method,

ANNs are used as base classifiers of the Rotation Forest algorithm, each of which

uses the Levenberg-Marquardt back propagation algorithm. This learning algorithm

is selected from among several back propagation algorithms because of its superior

performance on the CAD dataset. An ensemble system with three neural network

base classifiers is proposed, and the final unique decision of these classifiers is

determined by evaluating each of their decisions. In this way, the proposed method

reaches a high accuracy and provides a good option for large population diagnosis.


8

2. RELATED WORKS Esra MAHSERECİ KARABULUT

9

2. RELATED WORKS

In the literature, computer aided methods are proposed as automatic

diagnostics systems of CAD. Since such tools have been proven to have a

considerable success in disease diagnosis and hence improve the quality of clinical

decision-making processes, they are called Clinical Decision Support Systems

(CDSS). CDSS also decrease the human error rate and provide better-informed

service to patients. In this context, an early computerized method (Fujita et al., 1992)

attempts to diagnose the disease from SPECT Bull’s-eye images. This method

utilizes artificial neural networks (ANNs) and achieved a diagnosis accuracy of 77%

on average. Scott et al. (2004) used myocardial perfusion imaging data with clinical

data for CAD prediction. Artificial neural networks were employed in that study;

88% sensitivity and 65% specificity results are obtained.

In another study (Tsipouras et al., 2008), a decision support system based

upon fuzzy modeling was developed. The proposed method in the paper works by

evaluating patient history, demographics and some basic laboratory examinations. As

a result, 73% diagnosis accuracy is reported along with other literature methods that

resulted in higher accuracy rates. These more accurate methods performed automatic

diagnosis by means of expensive and not widely available data such as SPECT

images, stress ECHO, and Doppler ultrasound and thus were not preferable.

Haddad et al. (1997) utilized case based reasoning (CBR) to develop an

automatic image interpretation system to determine the presence of CAD.

Interpretation is achieved from a scintigraphic image dataset. Sensitivity and

specificity for detection of CAD were 98% and 70% respectively. Thus, CBR

systems may be used for clinical use, because they may achieve a considerable

diagnostic accuracy.

Yan et al. (2006) developed a multi layer perceptron (MLP)-based decision

support system to support the diagnosis of heart diseases. Three assessment methods,

namely cross validation, holdout and bootstrapping, are applied to evaluate the

generalization of the system. They concluded that MLP-based decision support

system can achieve very high diagnosis accuracy (>90%).


10

Tkacz and Kostka (2000) presented how to use wavelet neural networks

(WNN) for classifications of patients with CAD. WNN is trained with half of the

heart rate variability data while the other half is used for testing. They investigated

the effect of choice of basic wavelet function. They reported that the highest

sensitivity and specificity values are obtained when tangsoidal and linear activation

function is used in a double-layer WNN. Turkoglu et al. (2003) also used a WNN

model for evaluation of the Doppler signals of heart valve diseases based on pattern

recognition. This model is made up of two layers; the first is a wavelet layer and the

second is multi layer perceptron (MLP). Doppler heart sounds are correctly classified

at an average of %91 of 123 test samples.

An approach using radial basis function neural network for CAD diagnosis is

presented by Lewenstein (2001) who used traditional ECG exercise tests for results.

The best network correctly recognized over 97% of cases from a 400-element test

set; the results are about condition of the patient (simple "sane-sick" diagnosis) and

sick/stenosed vessels.

In a more recent study, Das et al. (2009) utilized data that can be collected

noninvasively, easily, and cheaply from the patient to develop an expert system that

was capable of diagnosing patients with an accuracy of about 89%. They used neural

network ensemble method, and also obtained 80.95% and 95.91% sensitivity and

specificity values respectively in CAD diagnosis. The experiments of Das et al. also

indicate that the increase in the number of nodes in the neural networks doesn’t

improve the performance of the network.

In another study (Mobley et al., 2000), neural networks are used to identify

patients who do not need coronary angiography. Coronary angiography is a

procedure which uses dye and special x-rays to show inside of coronary arteries.

Mobley et al. developed a neural network to predict the existence or nonexistence of

coronary artery stenosis. Patients’ records are used for training, cross-validation and

testing, and as a result some patients are kept from coronary angiography and these

patients are not endangered with any coronary stenosis. 0.89 AUC value is obtained

as a result.


11

Comak et al. (2007) proposed a decision support system for recognizing heart

valve disorders using Doppler heart sounds. Firstly, redundancy of dataset is reduced

by feature selection and normalization is applied as preprocessing. Least-squares

support vector machine and artificial neural network are used to classify the

extracted features; 90.0% and 94.0% specificity values are obtained respectively.

Therefore support vector machines outperformed neural networks when Doppler

heart sounds are evaluated.


12

3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT

13

3. MATERIAL AND METHODS

3.1. Dataset Descriptions

Coronary artery diseased/undiseased patients contributed to dataset of CAD

and in this study these patients are evaluated by decision support systems. We

utilized the medical records of 303 patients. Each record includes 13 features

belonging to the patient, including age, sex, and measurements obtained as a result of

medical examination (see Table 3.1).

Table 3.1. CAD dataset summary Feature Description Age

Sex 1 = male 0 = female

Chest pain type

0 = typical angina 1 = atypical angina 2 = non-anginal pain 3 = asymptomatic

Resting systolic blood pressure (mmHg) Serum cholesterol (mg/dl)

Fasting blood sugar 1 = if fbs is over 120 mg/dl 0 = if fbs is below 120

Resting electrocardiographic results 0 = normal 1 = having ST-T wave abnormality 2 = LV hypertrophy

Maximum heart rate achieved

Exercise induced angina 1 = yes 0 = no

ST depression induced by exercise relative to rest

The slope of the peak exercise ST segment

0 = up sloping 1 = flat 2 = down sloping

Number of major vessels colored by fluoroscopy

Exercise thallium scintigraphic defects 3 = normal 6 = fixed defect 7 = reversible defect


14

The dataset is publicly available at “The Data Mining Repository of

University of California Irvine (UCI)” (Newman et al., 1998) and was first

considered by Detrano et al. (1989). By using 13 given attributes, each sample is

classified into one of two groups of patients - those whose vessels are narrowed by

less than 50% or those whose vessels are narrowed by more than 50%. If diameter

narrowing in any major vessel is over 50% then the patient is considered to have the

disease. Otherwise the patient is classified as healthy.

3.2. Methods

3.2.1. Artificial Neural Networks

Artificial Neural Networks (ANNs) were developed with an inspiration of the

human brain; they are the systems of parallel computers. They are constructed by

many neurons connected to each other. ANN can learn by examples as human

beings do. An ANN is created for a specific application such as pattern recognition,

data classification, regression etc. with a learning process. Data is obtained from the

environment during this process.

ANNs has the ability to solve some problems that can’t be done by linear

programming methods. Data are not in a database or in a file, but directly in the

weights of neurons. In fact, it can be thought that a summary of this data is used by

using these weights. The network generalizes the decision when it is to decide about

a new example using the data got by learning in this way. With this property of

ANNs a very large variety of problems can be solved. In a situation that some

neurons are not able to work, that network can still continue to work, since it has

fault tolerance (Jain et al., 1996). Moreover, incomplete input data doesn’t prevent

the network from producing output.

A disadvantage of ANNs is that it is data dependent. An ANN works for a

specific data, and if the ANN is to be used with different data, it must be constructed

again (Haykin, 1999). Learning is achieved in either supervised or unsupervised way.

In the supervised way, output values must be given to the network, whereas in the


15

unsupervised way given input data is categorized into groups without defining the

desired output (Dayhoff and Deleo, 2001).

1) Mathematical Model

Figure 3.1. A neuron with single input and bias

( )bwpfa += (1)

Scalar input p is multiplied by scalar weight w, and again wp scalar value is

gained. Then bias b is added to wp value. Transfer function f is applied to this

n=wp+b sum (Beale et al., 2010).

Figure 3.2. A neuron with vector input and bias

p=[p1, p2, ..., pR] input vector is multiplied with w=[w11, w21,...,wR1] weights

and sent to sum unit. The bias value, b, is added to the wp value:

bpwpwpwn RR ++++= 1221111 ... (2)

This value can be written in a matrix form:


16

n = wp + b (3)

Then, this value is used by a non-linear transfer function, f, to produce the

neuron output:

a = f (wp + b) (4)

2) Transfer Functions

Transfer functions are generally a sigmoid function, a hard limit function or a

function that is defined by the researcher. The selected transfer function produces an

output. If transfer function is selected to be the tansig function as shown in equation

5, its range becomes [-1 1] and it shows a non-linear change according to input

values.

11

2)2( −

+= − ne

a (5)

Figure 3.3. The tansig transfer function

If the transfer function is logsig function as shown in equation 6, its range is

[0 1], and this function also shows a non-linear change in this range.

)(11

nea −+

= (6)


17

Figure 3.4. logsig transfer function

If the transfer function is hardlim as shown in equation 7, its output is 0 or 1.

≥<

≡0 10 0

)(nifnif

nf (7)

3) Single Layer and Multilayer Networks

The power of ANNs occurs when more than one neuron is interconnected.

When training data set includes two linearly separable classes, after a number of

training iterations, the perceptron learns (Jain et al., 1996). A single layer network is

made up of only input and output layer. There is no other layer of neurons, namely

no hidden layers. Learning is achieved by changing weights in each epoch, and

epoch is an iteration of presenting the training data set to the network once (Haykin,

1999).

A neural network may have more than one input and an output layer. If there

is an extra layer that supplies input to output layer of neurons, this layer is called

hidden layer. Each layer has a W weight vector, b bias vector, and an output vector.

There are many substantial points to be carefully decided while designing multilayer

networks (Haykin, 1999):

• Number of hidden layers in the network

• Number of neurons in each hidden layer

• Finding a global optimum solution to prevent local minimum

• Finding an optimal solution in an acceptable time

• Testing the validity of network


18

Figure 3.5. Representation of a multilayer perceptron with one hidden layer

It is generally enough to use one hidden layer in networks for a large variety

of problems. By using two hidden layer is a better model but it is more likely to drop

to local minimum. Number of neurons is also an important parameter of a network,

but sometimes large number of neurons can prevent network to decide properly. This

increases the complexity of network and overfitting occurs (Dayhoff et al., 2001). In

overfitting situation, network is so sensitive that it decides wrongly when the pattern

has a bit noise.

A very common way used by learning algorithms in training multilayer

networks is backpropagation. Backpropagation algorithm tries to optimize the

weights according to network error. It updates the weight values beginning from the

weights at last layer and continues towards the input layer at each epoch.


19

Figure 3.6. A training process flowchart using backpropagation algorithm

(Moghadassi et al., 2009)

There are some variants of backpropagation algorithm such as Scaled

Conjugate Gradient (SCG) (Moller, 1993), Levenberg-Marquardt (Marquardt, 1963),

Resilient Backpropagation (Riedmiller and Braun, 1993) and Powell-Beale

Conjugate Gradient (Powell, 1977). Levenberg-Marquardt is the fastest training

algorithm for networks of small and medium size, it provides a proper training but

when the size of network gets larger and the number of weights is over a hundred, its

performance decreases especially in pattern recognition problems (Beale et al.,

2010). SCG is also a general purpose backpropagation algorithm and faster than

Levenberg-Marquardt in large networks. The performance of all back propagation

algorithms is problem dependent.

Initialize training Epoch=1

Initialize weights and biases with random values

Present input pattern and calculate output values

Calculate mse

mse<msemm

Epoch≥Epochmax

Update weights and biases

Epoch=Epoch+1

Stop training network

Yes

Yes


20

3.2.2. Bayesian Classification

Bayesian classifiers are statistical classifiers and based on Bayes theorem.

They can predict the probabilities of class membership of a sample to each class.

1) Bayes’ Theorem

Assume A and B are two events, and equation 8 gives the formula of the

conditional probability of A, in such a condition that B has already occurred.

)()()|(

BPBAPBAP ∩

= (8)

Therefore,

)()|()( BPBAPBAP =∩ (9)

)()|()( APABPBAP =∩ (10)

When right sides of 9 and 10 are equated Bayes’ theorem is obtained:

(11) )(

)()|()|(AP

BPBAPABP =

2) Naïve Bayes’ Classification

Despite its simple nature, Naïve Bayes is one of the most efficient and well-

known algorithms (Minsky, 1961). It estimates the class probabilities of a given

sample and selects the class with maximum probability value as the decision. Naïve

Bayes algorithm assumes that each attribute value of a sample is independent of


21

other attributes, but the class value of the sample affects all the other attributes. This

assumption, also known as “the conditional independence”, simplifies the probability

calculation. This independence assumption works very efficiently for problems in

medical fields, probably connected to the fact that the chosen symptoms are

independent to some degree (Sierra et al., 2001)

Naïve Bayes classifier is also called simple Bayesian classifiers and works as

follows:

1. Assume that X is a data sample and has X={x1, x2, … , xn} attribute values. Assume

that there are m classes in the data set as C1, C2, … , Cm. Using Bayes’ theorem the

following equation can be used to determine the class of X sample for each of the

classes (the maximum class value is going to be regarded):

)()()|(

)|(XP

CPCXPXCP ii

i = (12)

2. In Naïve Bayes classification it is assumed that xi values are conditionally

independent and therefore the computation is simplified. And now to calculate

)|( iCXP the following equation can be used (Han and Kamber, 2000):

(13) )|()|(1

∏=

=n

kiki CxPCXP

3. P(X) value will be the same in all class probability calculations and because the

maximum class probability is going to be selected as class we can discard P(X) from

calculations.

(14) )}()|(max{arg ii CPCXP


22

3) Bayesian Networks

A Bayesian network (BN), also known as Belief network, is a probabilistic

graphical model representing dependence relationships between variables (Cowell et

al., 1999). A Bayesian network structure is a directed acyclic graph (DAG) and

DAGs are widely used in statistics and machine learning. The set of nodes and set of

directed edges construct a DAG, but no cycle among directed edges is allowed.

Unlike Naïve Bayes, BN regards that an attribute value of a sample may be

dependent on the other attribute value. With this aspect Naïve Bayes can be viewed

as a simple BN without conditional dependencies as shown in Figure 3.7.

Figure 3.7. Representation of Naïve Bayes DAG as a BN

The nodes in a DAG represent random variables. An edge from one node to

the another represents statistical dependence between represented variables. So the

elements of a BN are a DAG and a probability table for each node. A node having

parents has conditional probability table, and a node without any parents has

unconditional probability table. The conditional probability table for a node must

have entries for each possible combination of values of its parents

When specifying the probability tables prior probabilities must be given to the

nodes without parents and conditional probabilities for other nodes. BN can be

constructed after DAG and probability tables are given as in the following sample in

Figure 3.8:

C

x1 x2 xn


23

Figure 3.8. A Simple Bayesian Network made up of a DAG and probability tables

Joint probability from a Bayesian Network is computed as:

(15) ))(|()...,,(1

21 ∏=

=n

iiin xparentsxPxxxP

To use a BN as a classifier equation 15 is used for all class values. In Bayesian

classification there aren’t any learning rules, instead there are estimated probabilities.

As a classification example for Bayesian Network in Figure 3.8, assume that a new

sample has attribute values as; Exercise(E)=No, Smoking(S)=Yes, Shortness of

Breath(SB)=No and Chest Pain(CP)=Yes. The class attribute is Heart Disease(HD),

and for ‘Yes’ hypothesis and for ‘No’ hypothesis two probability values are

calculated, and the larger value indicates the class label of this sample. Exercise and

Smoking attributes have no parents so P(E=Yes)=0.7, so P(E=No)=1-P(E=Yes) and

P(E=No)=0.3, P(S=Yes)=0.25. Now according to first hypothesis the results of


24

HD=Yes, P(HD=Yes|E=No,S=Yes)=0.75, P(SB=High|HD=Yes)=0.85 and

P(CP=Yes|HD=Yes)=0.7 are obtained. Depending on the equation 15, first

hypothesis probability is (0.3)(0.25)(0.75)(0.85)(0.7)=0.0335. According to second

hypothesis, the results of HD=No, P(HD=No|E=No,S=Yes)=0.25,

P(SB=No|HD=No)=0.15 and P(CP=Yes|HD=No)=0.3 are reached. Depending on

the formula 15, second hypothesis formula is found these results

(0.3)(0.25)(0.25)(0.15)(0.3)=0.0008. and the result of first hypothesis 0.0335 is

larger than this value, so the class value of this sample is decided as HD=Yes.

An expert of a particular subject may prepare the DAG and probability tables

to use Bayesian Network for classification. But usually this is not the way, and

learning algorithms must be used to construct Bayesian Networks. This subject has

attracted many researches, and many approaches are presented. Some widely used

optimization algorithms such as hill climbing, simulated annealing, and tabu search

can be used to find DAG of the BN heuristically. And once the network is

constructed a DAG probability tables can be produced directly from data.

BNs are promising classifiers and are useful in medical areas and diagnoses

(John and Langley, 1995). Other than machine learning, they have been used for text

mining, natural language processing, speech recognition, signal processing,

bioinformatics and weather forecasting.

3.2.3. Decision Trees

Decision tree is one of the most commonly used algorithms in classification

and pattern recognition literature recently. The most important reason for this is the

use of comprehensive and clear rules in construction of a decision tree. A decision

tree is a prediction method which can easily integrate with information technologies,

and can be used in clinical decision making, for example a type of decision tree C4.5

can be used to yield clinically useful predictive values (Tanner et al., 2008)

A decision tree is made up of nodes, branches and leaves (Quinlan, 1993). A

node is the testing unit of the tree. The result of this test causes tree to branch without

losing data, and this branching is according to up-level branching. If there is a


25

specific class in a node, then this node is a leaf from now on, and no branching

continues from this node. Figure 3.9 shows a sample tree where a leaf represents a

class, each child and root node is an attribute of a data set.

Figure 3.9. A simple decision tree for Heart Disease (HD) diagnosis

Decision process in a tree is from root node until reaching a leaf, following

consequent nodes. A path from root node to a leaf produces a decision rule of the

tree. Decision rules resemble rules in programming languages (Quinlan, 1993). There

are four rules in sample tree of Figure 3.9:

Rule 1:

If Chest Pain=No Then HD=No

Rule 2:

If Chest Pain=Yes and

If Shortness of Breath=Yes Then HD=Yes

Rule 3:


If Shortness of Breath=No and

If Exercise=No Then HD=No

Chest Pain

Shortness of Breath

Exercise HD=Yes

HD=No HD=Yes

HD=No

No Yes

No Yes

No Yes


26

Rule 4:


If Shortness of Breath=No and

If Exercise=Yes Then HD=Yes

In section 3.2.2, a data sample of having attribute values; Exercise(E)=No,

Smoking(S)=Yes, Shortness of Breath(SB)=No and Chest Pain(CP)=Yes is classified

according to Figure 3.8. When this sample is required to be classified according to

decision tree represented in Figure 3.9, it is started at root node ‘Chest Pain’. If it has

‘Yes’ value, then branch is to the ‘Shortness of Breath’ node. If it has ‘No’ value,

then branch is to the ‘Exercise’ node. It has ‘No’ value, so the decision value is

HD=No.

Data classification is a two-phase operation in a decision tree. First phase is

training phase, and second is classification phase. At training phase, a training data is

used for construction of the tree. The rules of tree are determined according to this

training data. At the classification phase a test data is used for validation of the

constructed tree. If accuracy of the tree is at an acceptable ratio, then the tree is used

for new data samples. To classify a new sample it is started from the root and queried

among a top-down path until a leaf is reached. When a leaf is reached, it is

determined as the class of that sample.

For construction of the tree it is important to decide at which attribute to start

to branch. It is an NP-hard problem to construct all possible trees for a dataset and

select the best one (Kantardzic, 2002). Therefore some heuristic methods are needed.

According to the relevant literature these methods may be classified as entropy-based

methods, classification and regression trees and memory-based classification

algorithms. In this study, the entropy-based algorithm, C4.5 (Quinlan, 1993), is used,

which is a more advanced version of ID3 (Quinlan, 1993) and most popular in a

series of classification tree methods (Duda et al., 2006). J48 is the Java

implementation of C4.5 algorithm in WEKA environment (Hall et al., 2009), and J48

is used for experiments about decision trees in this study.


27

C4.5 algorithm selects the attributes according to their entropy quantities,

while constructing a tree. Entropy is a measure of uncertainty in a system (Shannon,

1948) and used in many areas. It is a desired situation in a system that entropy is 0.

While constructing the decision tree the Information Gain (Cover and Thomas, 2006)

and the Gain Ratio (Mitchell, 1997) algorithms are used for ranking data set

attributes. Decision tree is constructed according to this ranking. These two

algorithms are also feature selection algorithms; features are selected according to a

threshold value using this ranking. But if these algorithms are not used for feature

selection they are used in ranking of attributes, to determine at which attribute to

branch in construction of the tree. For Gain Ratio, see section 3.2.5.

1) Information Gain

Assume Y is the class attribute of a data set, and X is a given feature, both are

discrete. The information gain of X is the reduction of uncertainty of Y values, when

X values are known. This uncertainty is measured as H(Y), the entropy of Y.

Information Gain (IG) of X is the difference between entropy of Y and entropy of Y

after X values are observed, and is calculated as equation 16.

(16) )|()();( XYHYHXYIG −=

Entropy of Y is calculated with equation 17.

∑∈

−=Yy

ypypYH (17) ))((log)()( 2

Where y is a value of Y class feature, and p(y) is the probability of Y=y. And

entropy of Y after X values are observed is calculated as:


28

∑∑∈∈

−=YyXx

xypXypxpXYH (18) ))|((log)|()()|( 2

3.2.4. Ensemble Systems

Ensemble systems are an active research field in machine learning and pattern

recognition. In an ensemble system more than one classifier is trained and each

classifier contributes to the final decision of the system (Kuncheva, 2004). This

contribution is provided by voting the class labels, in order to select a winner which

is the decision of the ensemble system. Voting can be weighted or not. Each

classifier in the ensemble system is called a base classifier. An efficient ensemble

system consists of accurate base classifiers. In this way, a sample misclassified by a

base classifier will be corrected by others. So the outputs are more accurate than

those of a good individual classifier (Opitz and Maclin, 1999). The success of the

ensemble system is dependent on several factors such as performance of base

classifier algorithm, the number of features used, the size of ensemble, and the

decision combining algorithm (Amasyali and Ersoy, 2008).

Usually the diversity of base classifiers conflicts with the accuracy. However,

if the base classifiers are accurate, diversity among them is low (Chandra et al.,

2006). If there is not any diversity among the base classifiers, their combination will

not produce an effective output. Thus, the optimum results can be reached only by

an ensemble consisting of highly accurate classifiers that disagree as much as

possible.

Figure 3.10. An ensemble system with three base classifiers

Data

C1

C2

C3

Combining decisions

New data

Prediction


29

Bagging (Breiman, 1996) and Boosting (Schapire, 1990) are two main

frequently used ensemble methods in literature. In Bagging algorithm t number of

subsets is randomly taken from the dataset (bootstrap) with replacement. Each subset

is used to train a classifier, so there are t classifiers in the ensemble. This property

tolerates unstable base classifiers which are too sensitive to changes in training data

set. When a new sample is to be classified, each classifier predicts a decision; and the

final decision of the ensemble is the most frequent decision. Unlike boosting,

bagging can use same or different type base classifiers. Boosting produces base

classifiers one after another. Each base classifier is dependent on the previous

classifier, such that the training set chosen for a base classifier includes the set of

incorrectly classified instances by previous base classifier. Thus, the ensemble is

strengthened by a new base classifier that fixes previous errors. Effect of both

bagging and boosting is clearer when using weak classifiers.

Adaboost algorithm (Freund and Schapire, 1996) is the most popular variant

of boosting and takes its name from ‘adaptive boosting’. Classifiers are added until a

low error ratio is reached. Adaboost assigns a weight value for each candidate

training sample. These candidates are selected according to their weights for actual

training set of base classifier. The candidate training sample that is incorrectly

classified by previous classifiers has greater weight values (Duda et al., 2006). So

Adaboost concentrate on samples which are difficult to classify correctly.

Random Forest algorithm (Breiman, 2001) is also known to be a successful

ensemble in literature. Base classifiers are trees in random forest. In the algorithm of

random forest a t number of bootstrap samples are taken from training data for

construction of t trees. At each node of each tree, m features are selected randomly

and one is selected for a the best split. Many trees can be used to meet name forest,

but the trees are not pruned for performance issues. Also not evaluating all features

for best split at each tree is an advantage over boosting algorithm when it uses trees

as base classifiers.

As a new ensemble method, Rodriguez et al. (2006) proposed the Rotation

Forest (RF) and used decision trees as base classifiers in their study. RF can avoid

accuracy-diversity trade-off problem efficiently (Rodriguez et al., 2006; Liu and


30

Huang, 2008). Rodriguez and Kuncheva (2007) have reported RF to be more

accurate than bagging, Adaboost and Random Forest ensembles for a collection of

data sets. RF applies feature extraction of training data of each classifier for

improving diversity. This extraction is not for reducing the dimensions of data but

for presenting the data in another form to the base classifier.

1) Rotation Forest Algorithm

Let X be the training sample set. There are L base classifiers D1, . . . , DL in a

Rotation Forest. The following steps are processed for each of the base classifier Di:

Step 1: Splitting feature set into subsets. Assume there are n features in the

dataset X. Feature set, F, is separated into K disjointed subsets randomly so each

feature subset has M = n / K features. It is not necessary to choose K as a factor of n.

Step 2: Generating coefficients matrix. i denotes for the iteration number of

base classifier to be trained, Di and jth subset of features to train this classifier is Fij.

Let Xij be the part of X having the data that corresponds to Fij features. From each Xij

some subset of class labels is selected randomly, then 75% of remaining Xij is

selected randomly again in order to generate another dataset X’ij. Then coefficients

matrix, Cij, is generated by operation of a linear transformation on X’ij. Coefficients

of this matrix are a(1)ij , . . . , a(Mj)

ij.

Step 3: Constructing a rotation matrix. Coefficients generated in the previous

step are used to obtain Ri:

. (19)

Step 4: Generating rearrange matrix. Ri is arranged to the feature sequence of

original X dataset to generate Rai, so the actual rotation matrix is obtained. When the

1

2

( )(1)1 1

( )(1)2 2

( )(1)

,..., [0] ... [0][0] ,..., ... [0]... ... ... ...[0] [0] ... ,..., K

Mi i

Mi i

i

MiK iK

a aa a

R

a a

=


31

dataset is rotated by this rotation matrix, X is multiplied by Rai, training set for Di

classifier is obtained as XRai.

As for classifying an instance, x, confidence of each class label is calculated.

x is assigned to class label having largest confidence. First x’=xRai is generated.

Assume w={w1,…,wc} are class labels and dij (x’) is the probability that class label of

x is wj as determined by Di classifier. Confidence of each class label is calculated as:

1

1( ) ( '), 1,..., .L

j iji

x d x j cL

µ=

= =∑ (20)

3.2.5. Feature Selection

Feature selection is the process of removing redundant or irrelevant features

from the original data set. The execution time of the classifier that will process the

data will reduce, also accuracy will increase, because irrelevant features can include

noisy data affecting the classification accuracy negatively (Doraisamy, 2008). With

feature selection the understandability will be improved and cost of data handling

will be less with feature selection(Arauzo et al., 2011).

A classification algorithm classifies instances to a category according to a

given set of features. When classification is performed on the output of a feature

selection, the prediction will be more certain and clear.

Feature selection algorithms are divided into three categories; filters,

wrappers and embedded. Filters evaluate each feature independent from the

classifier, rank the features after evaluation and take the superior ones (Guyon and

Elisseeff, 2003). This evaluation may be done using entropy for example. When a

decision tree will be used this can be leading which feature to start. Wrappers take a

subset of the feature set, evaluates the classifier’s performance on this subset, and

then another subset is evaluated on the classifier. The subset which the classifier has

the maximum performance is selected. Wrappers are dependent on the classifier

selected. In fact this way is more reliable because classification method affects the

accuracy, but which subset to be selected is an NP-hard problem (Novakovic, 2010).


32

It can take considerable process time and memory. Some heuristic algorithms can be

used for subset selection such as genetic algorithm, greedy stepwise, best first or

random search. So the filters are more efficient, but it doesn’t take into account the

fact that selecting the better features may be relevant to classification algorithms.

Embedded techniques perform feature selection during learning process such as

artificial neural networks. In this study three filters, Relief-F, Gain Ratio and

Symmetrical Uncertainty is used.

1) Relief-F

Relief-F algorithm gives a weight for each feature of the dataset. To achieve

this instance is selected randomly from the dataset. Then its nearest neighbors from

the same class and different classes are found. The difference in a feature value for

the same and different classes is calculated. The ability of feature f to discriminate

the instance from the same classes determine the weight of this feature wf: The

following formula can be used to calculate this weight for each feature (Wang and

Makedon, 2004).

wf=P(different value of f | different class) – P(different value of f | same class)

This difference is always desired to be maximum.

2) Gain Ratio

Gain Ratio (GR) applies a kind of normalization to information gain using

entropy of X (Han and Kamber, 2000), such that:

( )IGGR

H X= (21)


33

Gain Ratio avoids selecting features having more number of different values

which is a disadvantage of Information Gain. That is features with greater numbers

of values have more information gain than those with fewer values even if they are

actually no more informative (Hall and Smith, 1999).

3) Symmetrical Uncertainty

Symmetrical Uncertainty (SU) also avoids bias of Information Gain towards

the features having more values, and normalization is provided to range [0,1] by

dividing the information gain to sum of entropies of Y class feature and given X

feature:

2( ) ( )

IGSUH Y H X

=+

(22)

As the value of symmetrical uncertainty and gain ratio approaches 1, X feature can

predict Y class more completely.


34

4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT

35

4. RESULTS AND DISCUSSION

4.1. Evaluation Metrics

Some metrics are required in order to measure and validate the performance

of a DSS that is to classify a sample. These metrics generally provide quantitative

results to assess and compare the performance of a classification algorithm. In this

study, each method is evaluated in terms of some metrics, namely Sensitivity,

Specificity and Accuracy. Also, Receiver Operating Characteristics (ROC) curve is

used as a graphical tool to visualize the classifier’s tendency between over-prediction

and under-prediction states. Therefore, another quantitative metric, Area Under

Curve (AUC), which is a ROC curve related value, can be calculated as an additional

means to assess the classification performance.

The calculation of the metrics requires the outcomes of the classifier system

to be labeled with four possible states, as true positives (TP), true negatives (TN),

false positives (FP) and false negatives (FN). Once the output for each sample is

labeled, quantitative metrics, as well as ROC curve, can be calculated for the

evaluated algorithm.

True Positives (TP) refers to the number of samples about which the

classifier decided that the patient has the disease, and in real the patient has the

disease. False Positives (FP) refers to the number of samples about which the

classifier decided that the patient has the disease, but in real the patient has not the

disease. False Negatives (FN) refers to the number of samples about which the

classifier decided that the patient has not the disease, but in real the patient has the

disease. True Negatives (TN) refers to the number of samples about which the

classifier decided that the patient has not the disease, and in real the patient has not

the disease.

Sensitivity (Sn) measures how well the classifier identifies positive results.

/Sn TP TP FN (23)


36

Specificity (Sp) measures how well the classifier is at discriminating false

predictions.

Sp TN TN FP (24)

Accuracy (Acc) is a measure to show the binary classifier’s ability to

correctly predict given samples. While Sn and Sp measures only one side of the

classifier, Acc is able to measure the overall performance of a classifier algorithm.

( )Acc TP TN TP FP FN TN (25)

Mean Absolute Error (MAE) is another quantity for measuring classification

performance, that is it measures how far the predicted values from the actual values.

∑=

−=n

iii yf

nMAE

1||1 (26)

In equation 26 fi denotes for predicted values, yi denotes for actual values, and an

average of absolute errors is calculated by MAE.

1) ROC Curve Analysis

ROC analysis is useful to measure and visualize the overall performance of

the classifier. It is also functional for evaluating and comparing classification

algorithms. Thus, it is used for medical diagnosis very commonly (Swets, 1979).

In the case of optimum situation, both Sensitivity and Specificity reach value

of 1. When the Sensitivity gets the value 1, this means that all diseased patients are

classified as diseased. In contrast, if the Specificity gets the value of 1, this means

that no healthy person is classified as diseased. This is the result when the ROC

curve tends to upper left corner.


37

Area Under Curve (AUC) is the total area under the ROC curve and a

measure of the performance of the diagnostic test since it reflects the test

performance at all threshold values. The area lies in the interval [0.5, 1] and the

larger area, the better performance.

2) Cross Validation

Cross validation is utilized to validate the performance of the classifier. In

cross validation, dataset is separated into two parts. The classifier is trained with the

first part, and tested with the second part. Then the accuracy of network is calculated.

The network is trained with the second part, and tested with the first part. Again the

accuracy is calculated. The average of these two accuracies is calculated to find the

general accuracy. This kind of cross validation is known as the “hold out” method.

In this study, k-fold cross validation method is utilized, which is an improved

derivative of the original hold out method. In this method, the data is separated into k

subsets, and the hold out operation is performed k times in each of which a subset is

used for testing and the other subsets are used for training. Therefore, the eventual

accuracy is calculated by averaging k number of accumulated accuracies.

4.2. Employing Artificial Neural Networks for Diagnosis of CAD

A two-layer feed-forward neural network is able to learn most input-output

relation, but when complex relationships exist between input and output, the

networks with more layers can be used for quick learning. The neural network

architecture is constructed according to CAD dataset and depicted in Figure 4.1

which includes a two-layer neural network for observations. There are 13 neurons in

input layer, 10 neurons in hidden layer and 1 in output layer. Target values in dataset

are categorical therefore not a regression but a classification is to be processed.


38

Figure 4.1. Experimental setup of the artificial neural network used for CAD

diagnosing

Some variants of standard backpropagation algorithm are evaluated using

neural network toolbox of MATLAB for CAD data. The training data is fed to the

created network object; the network is trained and then simulated to response to new

test inputs. Performance values of these algorithms using 10-fold cross-validation for

test data are given in Table 4.1:

Table 4.1. Accuracy values of backpropagation algorithms on CAD data

Id Algorithm Acc (%)

1 Powell -Beale conjugate gradient backpropagation 83.78 2 Fletcher-Powell conjugate gradient backpropagation 84.46 3 Polak-Ribiere conjugate gradient backpropagation 82.77 4 Gradient descent backpropagation 83.78 5 Gradient descent with momentum backpropagation 84.12 6 Gradient descent with adaptive learning backpropagation 81.76

7 Gradient descent with momentum & adaptive learning backpropagation 82.43

8 Levenberg-Marquardt backpropagation 85.14 9 One step secant backpropagation 81.77 10 Resilient backpropagation 84.46 11 Scaled conjugate gradient backpropagation 84.12

input1 (age)

input2 (sex)

input3 (c.p.t)

input12(...)

input13(…)

output1 (class)

INPUT LAYER HIDDEN LAYER OUTPUT LAYER


39

Standard backpropagation algorithm (Rumelhart et al., 1986) (with id 4) is a

gradient descent method in which network weights are adjusted to the negative of the

gradient of the performance function. There are various convergence problems in

gradient descent and therefore the network can be very slow in practice, so other

variants of BP are evaluated. Levenberg-Marquardt algorithm (Marquardt, 1963) is

one of these BP algorithms and can be thought as a combination of gradient descent

and Gauss-Newton method. It has the stability of gradient descent and speed of

Gauss-Newton. It has more convergence ability and out performs other BP

algorithms in many applications in efficiency and classification accuracy (Paulin and

Santhakumaran, 2011; Vongkunghae and Chumthong, 2007; Kisi and Uncuoglu,

2005). According to Table 4.1 Levenberg-Marquardt algorithm out performed other

BP algorithms in CAD data with the highest performance 85.14% in testing samples.

Figure 4.2 presents a visual comparison of results.

Figure 4.2. Comparison of performances of BP algorithms on CAD data.

4.3. Using Bayesian Networks and Decision Trees for Diagnosis of CAD

Performance of a Bayesian Network is primarily related to the optimization

technique that constructs the network. Three optimization techniques HillClimber,

Simulated Annealing and Tree Augmented Bayesian Network (TAN) algorithms are


40

utilized and analyzed in WEKA environment to observe the success of BN in

classification of CAD patients as summarized in Table 4.2. Evaluation metrics of

Sensitivity, Specificity, area under curve (AUC) and mean absolute error (MAE) are

used.

Table 4.2. Performance values of BNs constructed by three different optimization techniques on CAD data.

Sensitivity Specificity AUC Acc (%) MAE

HillClimber 0.804 0.861 0.908 83.50 0.195 Simulated Annealing 0.804 0.855 0.899 83.17 0.221

TAN 0.768 0.855 0.912 81.52 0.206

According to Table 4.2 HillClimber seems to be the most successful of three

algorithms with consideration of specificity, accuracy and mean absolute error. In

most applications it is difficult for HillClimber to outperform Simulated Annealing,

because it is more probable for it to result in local optimum. In fact all three

classification results are not very different from each other. A Bayesian Network

classification is very dependent on its structure, but as here if the results are similar

for different structuring algorithms then we understand that the dataset is stable.

Figure 4.3. ROCs of classification of BayesNetwork using HillClimber, Simulated

Annealing and TAN respectively

It can be seen that the largest AUC, 0.912, belongs to TAN algorithm.

Generally classifiers produce an output decimal value and they determine the last

decision of classification according to a threshold value. Selecting the proper

threshold is important for the prediction. Different threshold values are evaluated


41

while ROC is produced (Karabulut and İbrikçi, 2010). In a ROC the region closer to

up-left corner is the region where the classifier is successful. When the most proper

threshold value is used a curve closest to up-left corner is plotted. Larger AUC

means that TAN is the most independent from threshold value of classification. Of

course this is a positive property for a classifier, but when evaluating all three

classifiers, other metrics must also be regarded.

Table 4.3. Confusion matrix of 303 data of CAD using HillClimber

Classified As

Num=0 Num=1

Actual Class Value

Num=0 142 23

Num=1 27 111

Table 4.4. Confusion matrix of 303 data of CAD using Simulated Annealing

Classified As

Num=0 Num=1

Actual Class Value

Num=0 141 24

Num=1 27 111


42

Table 4.5. Confusion matrix of 303 data of CAD using TAN

Classified As

Num=0 Num=1

Actual Class Value

Num=0 141 24

Num=1 32 106

Sensitivity and Specificity values change between 0 and 1, and desired to be

close to 1. When sensitivity is 1, number of false negatives (FN) is 0; no patient with

disease is diagnosed as healthy. When specificity is 1, number of false positives (FP)

is 0, no healthy patient is diagnosed as a patient with disease. For example in Table

4.3, FN=23, FP=27 and there are a total of 50 wrong diagnosis. Similarly, Table 4.4

and Table 4.5 show wrong diagnosis as 24+27=51 and 24+32=56 respectively.

If decision trees are used to implement classification problem solving in form

of diagnostic procedure, each node of the tree corresponds to an observable. Each

node is a comparison unit of a feature of database. A comparison of some decision

trees is achieved in terms of evaluation metrics of sensitivity, specificity, area under

curve (AUC), accuracy (acc) and mean absolute error (MAE). The employed

decision trees are ADTree (Freud and Mason, 1999), BFTree (Friedman et al., 2000),

J48 (WEKA implementation of C4.5), Functional Tree (FT) (Landwehr et al., 2005)

and SimpleCart (Breiman et al., 1984).

Table 4.6. Evaluation results of five decision trees according to CAD data Sensitivity Specificity AUC Acc (%) MAE ADTree 0.761 0.842 0.896 80.53 0.278 BFTree 0.717 0.830 0.783 77.89 0.256 J48 0.710 0.836 0.804 77.89 0.259 FT 0.812 0.836 0.884 82.51 0.196 SimpleCart 0.746 0.861 0.818 80.86 0.277


43

According to Table 4.6 FT outperforms all other decision trees in terms of

accuracy, sensitivity and AUC. It also has the minimum classification error.

SimpleCart and ADTRee algorithms are close to FT in accuracy in classification of

CAD data.

4.4. Effect of Feature Selection on Diagnosis of CAD

In this section, the effect of three feature selection algorithms on performance

of classifiers is analyzed. These feature algorithms are Relief-f, Gain Ratio and

Symmetrical Uncertainty. The performance values are calculated before and after the

feature selection, and these values are compared. To achieve this 9 classification

algorithms are used and evaluated in WEKA (Hall et al., 2009). These algorithms

are, BayesNet, Multilayer Perceptron (MLP), Radial Basis Function (RBF), Instance

Based Learning (IB1) (Aha et al., 1991), KStar (Cleary and Trigg, 1995), PART

(Frank and Witten, 1998), ADTree, BFTree, SimpleCart. The algorithms are

evaluated in terms of Acc, AUC and Mean Squared Error.

Each of the Relief-F, Gain Ratio and Symmetrical Uncertainty algorithms

assign an evaluation score to each feature of dataset, and then rank all features in

descending order and take a pre-determined number of most successful features.

Original dataset has 13 features, and 8 features are selected by feature selecting

algorithms. Different number of features can be determined by users. So the number

of dimensions of the dataset is reduced and the data has become more

comprehensible and easier to study on.


44

Table 4.7. Features selected by Relief-F filter No Attribute Evaluation Score 1 cp {typ_angina, asympt, non_anginal,

atyp_angina} 0.1729

2 thal {fixed_defect, normal, reversable_defect} 0.1259 3 sex {male, female} 0.1106 4 ca numeric 0.0943 5 slope {down, flat, up} 0.0776 6 exang {no, yes} 0.0683 7 restecg {left_vent_hyper, normal,

st_t_wave_abnormality} 0.0644

8 oldpeak {numeric} 0.0237

According to Table 4.7, chest pain (cp) feature of the dataset has the highest

score according to evaluation of Relief-F algorithm. This means that cp feature

values have more ability to discriminate the diseased patients from the healthy ones,

because Relief-F selects the features according to ability of features to be

discriminative on a class member relative to other class members.

Table 4.8. Effect of Relief-F filter on classification performance Id Method Acc (%) eAcc (%) AUC eAUC MSE eMSE 1 BayesNet 83.50 83.83 0.908 0.912 0.134 0.121 2 MLP 80.20 84.82 0.878 0.891 0.171 0.133 3 RBF 84.16 83.50 0.895 0.906 0.120 0.118 4 IB1 76.23 76.56 0.760 0.764 0.238 0.234 5 KStar 74.59 81.19 0.814 0.888 0.207 0.135 6 PART 81.85 82.18 0.846 0.869 0.154 0.139 7 ADTree 80.52 84.82 0.896 0.903 0.130 0.124 8 BFTree 77.89 80.20 0.783 0.833 0.188 0.158 9 SimpleCart 80.82 81.85 0.818 0.839 0.158 0.147

Classification performances of nine algorithms under Relief-F filter are

evaluated in Table 4.8 in terms of accuracy, AUC and MSE. The ‘e-’ prefix

represents the value after the feature selection. CAD dataset is first preprocessed by

Relief-F filter, five of the features are eliminated, and new dataset is fed to each

classifier. Then the new Acc, AUC and MSE obtained is named eAcc, eAUC and

eMSE respectively.


45

The bold values in Table 4.8 mean that classifier is positively affected from

Relief-F feature selector. Almost all of the classifiers, except RBF, have increased

accuracy of classification. All of them have increased AUC and decreased error

quantities measured by MSE. The most affected classifier is KStar with difference in

accuracy of 6.60%. Other two are MLP and ADTree with difference in accuracy

4.62% and 4.30% respectively. In order to have a total view on comparison of

performances of classifiers affected by Relief-F see Figure 4.4 and Figure 4.5.

Figure 4.4. Accuracies of classifiers with and without Relief-F filter

Figure 4.5. AUCs of classifiers with and without Relief-F filter


46

Gain Ratio filter has given the highest score to ‘ca’ feature according to Table

4.9. It ranks and selects features according to their entropy quantities. To have a high

score a feature must have low entropy. It can be thought for decision tree classifiers

having this ranking in their own algorithm. But in that issue, features are ranked, but

not selected. While constructing the tree all features are used without selecting.

Table 4.9. Features selected by Gain Ratio filter No Attribute Evaluation Score 1 ca numeric 0.1741 2 thal {fixed_defect, normal, reversable_defect} 0.1698 3 exang {no, yes} 0.1560 4 thalac numeric 0.1322 5 cp {typ_angina, asympt, non_anginal,

atyp_angina} 0.1176

6 oldpeak {numeric} 0.1053 7 slope {down, flat, up} 0.0903 8 sex {male, female} 0.0656

Gain Ratio is not as successful as Relief-F, at most it does not affect 8

algorithms in accuracy as Relief-F, but it has affected 6 algorithms in accuracy, 5 in

AUC and 4 in MSE.

Table 4.10. Effect of Gain Ratio filter on classification performance Id Method Acc (%) eAcc (%) AUC eAUC MSE eMSE 1 BayesNet 83.50 83.83 0.908 0.898 0.134 0.137 2 MLP 80.20 81.52 0.878 0.894 0.171 0.150 3 RBF 84.16 85.48 0.895 0.892 0.120 0.120 4 IB1 76.23 77.23 0.760 0.770 0.238 0.228 5 KStar 74.59 78.55 0.814 0.859 0.207 0.162 6 PART 81.85 80.53 0.846 0.820 0.154 0.169 7 ADTree 80.52 80.52 0.896 0.881 0.130 0.138 8 BFTree 77.89 78.88 0.783 0.785 0.188 0.178 9 SimpleCart 80.82 80.53 0.818 0.824 0.158 0.159

Gain Ratio has most affected KStar algorithm positively. KStar increased its

accuracy by 3.96%. Other positively affected classifiers have a little improvement.

Second and third most affected algorithms are MLP and RBF with accuracy increase

1.32%. Figure 4.6 and Figure 4.7 presents a visual comparison of Acc and eAcc.


47

Figure 4.6. Accuracies of classifiers with and without Gain Ratio filter

Figure 4.7. AUCs of classifiers with and without Gain Ratio filter

According to Table 4.11 Symmetrical Uncertainty filter has selected ‘thal’

feature as the first, ‘ca’ was Gain Ratio’s first feature now second here, and ‘cp’ the

third which was Relief-F’s first feature. This means that although feature selection

algorithms follow very different ways to select features, the decisions are not too far

from each other.


48

Table 4.11. Features selected by Symmetrical Uncertainty filter No Attribute Evaluation Score 1 thal {fixed_defect, normal, reversable_defect} 0.1889 2 ca numeric 0.1727 3 cp {typ_angina, asympt, non_anginal,

atyp_angina} 0.1497

4 exang {no, yes} 0.1492 5 thalac numeric 0.1313 6 oldpeak {numeric} 0.1268 7 slope {down, flat, up} 0.1021 8 sex {male, female} 0.0624

Table 4.12. Effect of Symmetrical Uncertainty on classification performance Id Method Acc (%) eAcc (%) AUC eAUC MSE eMSE 1 BayesNet 83.50 83.83 0.908 0.897 0.134 0.137 2 MLP 1.33 80.20 80.53 0.878 0.885 0.171 0.164 3 RBF 1.32 84.16 85.48 0.895 0.892 0.120 0.122 4 IB1 1.33 76.23 77.56 0.760 0.773 0.238 0.225 5 KStar 3.94 74.59 78.55 0.814 0.854 0.207 0.166 6 PART 81.85 80.20 0.846 0.833 0.154 0.166 7 ADTree 80.52 80.52 0.896 0.882 0.130 0.138 8 BFTree 77.89 78.88 0.783 0.788 0.188 0.177 9 SimpleCart 80.82 80.86 0.818 0.832 0.158 0.154

Symmetrical uncertainty has most affected KStar, MLP, IB1 and RBF with

increases of 3.94%, 1.33%, 1.33% and 1.32 respectively. Again, this filter is not as

successful as Relief-F, and has affected 7 algorithms in accuracy, 5 in AUC and 5 in

MSE positively (see Figure 4.8 and Figure 4.9).


49

Figure 4.8. Accuracies of classifiers with and without Symmetrical Uncertainy filter

Figure 4.9. AUCs of classifiers with and without Symmetrical Uncertainty filter

It can be concluded that most succesful one of the three filters is Relief-F

feature selector on CAD data. Naturally, different datasets change the most

successful one. Another conclusion is that KStar and MLP is the most affected

classifiers from all three filters positively.


50

4.5. Employing Ensemble Methods for Diagnosis of CAD

In this section, four ensemble methods are evaluated and compared according

to their success terms of accuracy, sensitivity, specificity and MAE. These

evaluations are implemented in WEKA. The base classifiers for ensembles are MLP

from neural network, IBk from lazy classifiers, PART from rule based classifiers and

FT from decision trees.

The boosting algorithm is named AdaBoostM1 in WEKA and the results of

‘boosting’ given here is results of AdaBoostM1 in WEKA. The values in Table 4.13,

Table 4.14 and Table 4.15 are wanted to be as high as possible. The bold values in

Table 4.13, Table 4.14 and Table 4.15 represent the highest value of the

corresponding base classifier (involved column).

Table 4.13. Accuracy values of ensemble classifiers using different base classifiers Ensemble Algorithm

Accuracy (%) MLP NaiveBayes IBk PART FT

Bagging 83.50 83.83 75.91 81.19 83.50 Boosting 80.53 83.83 76.24 79.87 81.52 Decorate 80.20 83.17 74.26 80.86 80.53 Rotation Forest (RF) 84.16 79.21 77.23 83.50 84.16

Table 4.14. Sensitivity values of ensemble classifiers using different base classifiers Ensemble Algorithm

Sensitivity MLP NaiveBayes IBk PART FT

Bagging 0.797 0.797 0.717 0.790 0.812 Boosting 0.761 0.790 0.739 0.775 0.790 Decorate 0.797 0.797 0.717 0.754 0.812 RF 0.812 0.790 0.725 0.804 0.812

Table 4.15. Specificity values of ensemble classifiers using different base classifiers Ensemble Algorithm

Specificity MLP NaiveBayes IBk PART FT

Bagging 0.867 0.873 0.794 0.830 0.855 Boosting 0.842 0.879 0.782 0.818 0.836 Decorate 0.806 0.861 0.764 0.855 0.800 RF 0.867 0.794 0.812 0.861 0.867


51

Table 4.13 shows that RF ensemble outperforms other ensembles in four of

five base classifiers with the accuracy of 84.16%. This accuracy value is reached by

using MLP and FT as base classifiers. These two classifiers are more successful in

accuracy metric than NaiveBayes, IBk and PART algorithms when used in a RF

ensemble. The results of Table 4.14 and Table 4.15 are parallel with Table 4.13 with

respect to that; the highest values belong to almost the same base classifiers and RF

ensemble.

Figure 4.10. Comparison of performances of ensemble algorithms using different

base classifiers with respect to accuracy.

RF has a considerable success over other ensembles especially according to

Figure 4.10 in terms of accuracy. In the next section, we investigated this ensemble

method intensely with different aspects in order to improve diagnosis accuracy in

CAD data.

4.6. More on Rotation Forest Ensemble and a New Method Proposal for CAD

Diagnosis

In this section, Rotation Forest (RF) ensemble of three separate ANNs based

on Levenberg-Marquardt back-propagation algorithm is proposed and is


52

implemented in a MATLAB environment. Each ANN uses a different set of axes.

That is, each feature is considered as an axis, the RF algorithm selects the axes

randomly to rotate according to the algorithm parameter K. Axes are rotated by

principal component analysis (PCA), which is a statistical method for reducing

dimensions with a covariance analysis between features. PCA rotated our data set

into a different configuration, which is easier to classify. In this way, data is simpler;

relationships between features are more discriminative. Using PCA, it is possible to

rotate the axes of our multi-dimensional space to new positions (principal axes). Data

are defined differently, as if they were not the same as before. Using PCA, our aim is

not to reduce dimensions, but rather to rotate the axes in order to define each

example in the data set in a different way. For each neural network, this rotation is

performed with different subsets of features. In other words, each classifier is trained

with the whole data set with different extracted features. Each base classifier also

takes different subsets of instances having the selected features so that diversity, an

important property of ensemble methods, could be achieved. Another contribution to

diversity is that each neural network is created apart from each other, with randomly

chosen initial weights. All principal components are taken into account, so that the

accuracy of the system is not ignored while achieving diversity.

In this study, diversity is provided by three separate techniques in order to

create an ensemble consisting of classifiers that disagree on their predictions. Firstly,

the data set is rotated by the transformation matrix gained by PCA. Secondly, the

base neural network classifiers are constructed including different initial weights.

And finally, each network is trained by using different portions of the training set as

a rule of the RF algorithm.

For the implementation of other compared classification algorithms other than

Levenberg-Marquardt based RF, the WEKA data mining and machine learning

environment (Hall et al., 2009) is utilized. In all experiments, the validation is done

via the 10-folds cross validation method.

First of all, we compare the performance of base classifiers alone in

diagnosing the disease. This comparison is vital in order for us to decide on two

things; 1) what the utmost performance of arbitrary classifiers without Rotation


53

Forest algorithm is, and 2) whether the RF algorithm actually improves the

performance of any arbitrary classifier. Table 4.16 presents the classification

performances of several classifiers in diagnosing the disease in terms of accuracy,

AUC, sensitivity, and specificity, where the best values for each performance

measure are marked in bold.

Table 4.16. Classification results of CAD dataset applied in different classifiers.

As presented in Table 4.16, the Levenberg-Marquardt backpropagation

algorithm based ANN structure appeared to be superior to the other methods in terms

of three metrics - Acc, AUC and Sn. Another ANN derivative, RBF Network, and

Naïve Bayes were the close to match to Levenberg-Marquardt in terms of these

classification performance measures. Generally, in our experiment, rule-based

classifiers (OneR) and decision trees (J48, Random Forest) performed relatively

worse as sole classification tools. The performances of ANN derivatives (RBF

Network, Levenberg-Marquardt) and the Naïve Bayes classifier were comparable to

each other but obviously superior to the others in diagnosing CAD.

As a second experiment, we utilized the Rotation Forest algorithm to have an

ensemble of each classification algorithm. The corresponding results of the second

experiment in terms of classification measures are given in Table 4.17.

Algorithm Accuracy AUC Sensitivity Specificity J48 77.89 0.804 0.810 0.836 RBF Network 84.16 0.895 0.812 0.867 Levenberg-Marquardt 85.14 0.903 0.850 0.852 Naïve Bayes 83.83 0.902 0.803 0.867 OneR 71.62 0.716 0.717 0.715 Random Forest 80.20 0.883 0.790 0.812 KStar 74.59 0.814 0.659 0.818


54

Table 4.17. Classification results of RF algorithm with different base classifiers RF with Base classifer Accuracy AUC Sensitivity Specificity J48 81.85 0.889 0.775 0.855 RBFNetwork 84.82 0.899 0.783 0.903 Levenberg-Marquardt 91.20 0.915 0.956 0.867 NaiveBayes 79.21 0.876 0.790 0.794 OneR 80.53 0.887 0.739 0.861 RandomForest 82.84 0.902 0.797 0.855 KStar 74.59 0.814 0.659 0.818

When the results in Table 4.17 are compared with those in Table 4.16, it is

clearly seen that the Rotation Forest ensemble algorithm improves the classification

accuracy for almost all the classifiers even though this improvement was not

significant for all the classifiers such as the RBF Network (See Figure 4.11). Most

notably, the performance of Levenberg-Marquardt is increased to a classification

accuracy of 91.20%, which is the best value of all the results. Although Levenberg-

Marquardt appears to be clearly superior in terms of Accuracy and Sensitivity, its

AUC value is still comparable. The ROC curve, and consequently the AUC value, is

highly dependent on how much two class distributions (e.g., patients with and

without disease) are distinguishable via a threshold value. Therefore, the comparable

AUC values with different Accuracy measures mean some of the classifiers are very

sensitive to threshold changes while the others are not as sensitive.


55

Figure 4.11. Effect of RF algorithm on different classifiers

As a result of the two experiments, it was observed that Levenberg-Marquardt

was the best classifier with or without RF. However, when it is utilized with RF as a

base classifier, its performance, e.g., Accuracy, is improved to 91.2%, which is an

improvement of 7% above the original classification accuracy. The ROC curve in

Figure 4.12 clearly depicts the classification performances of Levenberg-Marquardt

with and without the Rotation Forest algorithm. The interpretation of ROC curve is

that the classification accuracy is better when the curve tends to be close to the

upper-left corner of the ROC area.

Figure 4.12. ROC analysis of Levenberg-Marquardt algorithm with and without RF


56

Finally, in order to prove the efficiency of the proposed method (i.e., Rotation

Forest ensemble with Levenberg-Marquardt based ANN), we compare the

performance of the proposed method to those performances of the literature methods

that utilize the same dataset for CAD diagnosis. Table 4.18 presents reported results

of the literature methods in terms of accuracy and their proposed methodologies.

Table 4.18. Classification accuracy results of literature methods that utilize the same dataset

Author (Year) Method Acc (%) Detrano et al. (1989) Logistic Regression 77.0 Cheung (2001) C4.5 81.4 Polat et al. (2005) Artificial Immune System 84.5 Das et al. (2009) ANN Ensemble (SAS Miner) 89.0 Proposed Method Rotation Forest, Levenberg-Marquardt 91.2

The best classification accuracy to date was obtained by Das et al. (2009) at

89.01% accuracy with an averaging ensemble of ANNs. This study proposes a

method that outperforms the highest accuracy achieved thus far, with 91.2%

accuracy by using the Rotation Forest ensemble of Levenberg-Marquardt based

ANNs. Notably, the best two performances in the literature are ensemble algorithms

that utilize some kind of ANNs as base classifiers, and that fact proves the efficiency

of ensemble approach.

5. CONCLUSIONS Esra MAHSERECİ KARABULUT

57

5. CONCLUSIONS

In this thesis, we aimed to improve the accuracy and reliability of diagnosis

of heart failures and to present a computer based approach in diagnosis. Some

decision support systems that are frequently used in medical area are researched

regarding to their performances on coronary artery disease diagnosis. Various

performance criteria are used but especially accuracy rate of diagnosis is respected.

The experimental results obtained in section 4 leads to the conclusion that artificial

neural networks are the most successful method of all other techniques in CAD

diagnosis both as standalone model and in an ensemble form. Of course the

performance of these techniques depends on many factors especially on the dataset

used. But when the literature about medical decision support systems is investigated

neural networks are observed to have a considerable power on test data of the

medical dataset which is firstly seen by the model. Also the choice of

backpropagation algorithm has an important effect on the result.

Feature selection is another important issue in classification, because it may

have a considerable effect on accuracy of the classifier. It reduces the number of

dimensions of the dataset, the processor and memory usage reduce; the data becomes

more comprehensible and easier to study on. In this thesis, we have investigated the

influence of feature selectors of Relief-f, Gain Ratio and Symmetrical Uncertainty on

nine different classifiers using CAD data. We observed that KStar and MLP are the

most affected classifiers; The classification accuracy is improved up to 6.60% and

4.62% by KStar and MLP respectively using Relief-F filter.

When ensemble systems are evaluated it is conceived that the most

considerable ensemble is the rotation forest (RF) which outperformed boosting,

bagging and decorate ensembles. Our research experiments continued with selection

of bests for a combined model. Neural networks, BayesNet and decision trees

accuracy results are 87.50%, 83.50%, 82.51% respectively by using proper

parameters. The backpropagation which is used in neural network is selected as the

Levenberg-Marquardt which outperformed the others. Therefore our model is

achieved by choosing the best components.

5. CONCLUSIONS Esra MAHSERECİ KARABULUT

58

In this thesis, also the performance of the RF ensemble method, whose base

classifiers are ANNs with the Levenberg-Marquardt back propagation algorithm, is

evaluated for the effective diagnosis of CAD. The proposed method is able to

determine the existence of the disease with the data collected noninvasively, easily,

and cheaply from the patient, such as demographic information and blood measures.

In this scheme, the obtained accuracy rate is 91.2%, which is, to the best of our

knowledge, the best rate achieved thus far in the relevant literature. Notably, the

study of Das et al. (2009), which utilized a simple ANNs based ensemble system and

obtained 89.01% accuracy, supports the fact that the Rotation Forest algorithm is

efficient and superior to other ensemble systems (Karabulut and İbrikçi, 2011).

Our experiments also show that RF not only improves the performance of

ANNs but also enhances the classification accuracy of almost all classifiers. This fact

is proven with experiments that include different types of classifiers such as decision

trees (Random Forest, J48), rule-based learners (OneR), instance-based learners

(KStar), ANNs (RBF Network, Levenberg-Marquardt) and Naïve Bayes. Therefore,

in future studies the proposed scheme, i.e., ensemble of ANNs or other classifiers

with Rotation Forest, may be utilized in order to develop efficient expert systems for

the diagnosis of several other diseases.

59

REFERENCES

AHA, D.W., KIBLER, D., ALBERT, M.K., 1991. Instance-based learning

algorithms. Machine Learning, 6: 37-66.

AMASYALI, M.F., ERSOY, O., 2008. The Performance Factors of Clustering Ensembles. Signal Processing, Communication and Applications Conference, 1-4.

ARAUZO, A., AZNARTE, J. L., BENITEZ J. M., 2011. Empirical study of feature

selection methods based on individual feature evaluation for classification

problems. Expert Systems with Applications, 38: 8170-8177.

BASSUK, S.S., MANSON, J.E., 2008. Lifestyle and risk of cardiovascular disease

and diabetes in women. A Review of the epidemiologic evidence. Am J

Lifestyle Med.: 2:191-213.

BEALE, M.H., HAGAN, M.T., DEMUTH, H.B, 2010. Neural Network Toolbox TM

7: User’s Guide, The MathWorks Inc, 7th Edition.

BREIMAN, L., FRIEDMAN, J., OLSHEN, R. A., STONE, J., 1984. Classification

and Regression Trees. Wadsworth International Group, Belmont, California.

BREIMAN, L., 1996. Bagging predictors. Machine Learning, 26: 123-140.

BREIMAN, L., 2001. "Random forests". Machine Learning, 45 (1): 5–32.

BRUMMET, B.H., BAREFOOT, J.C., SIEGLER, I.C., CLAP-CHANNING, N.E.,

LYTE, B.L, BOSSWORD, H.B., WILLIAMS, R.B, MARK D.B., 2001.

Characteristics of socially isolated Patients with coronary artery disease who

are at elevated risk for mortality. psychosomatic Medicine, 63:267-272

CHANDRA, A., CHEN, H., YAO, X., 2006. Trade-off between diversity and

accuracy in ensemble generation. Multi-objective Machine Learning,

Springer Verlag, Heidelberg, pp.429–464.

CHEUNG, N., 2001. Machine learning techniques for medical analysis. School of

Information Technology and Electrical Engineering, B.Sc. Thesis, University

of Queenland.

CLEARY, J.G., TRIGG, L. E., 1995. An Instance-based learner using an entropic

distance measure. In: 12th International Conference on Machine Learning,

108-114

60

COMAK, E., ARSLAN, A., TURKOGLU, İ., 2007. A decision support system based

on support vector machines for diagnosis of the heart valve diseases.

Computers in Biology and Medicine, 37:21-27.

COVER, T.M., THOMAS, J.A., 2006. Elements of Information Theory, 2nd edition.

Wiley-Interscience, Hoboken, 776p.

COWELL, R.G., DAWID A.P., LAURITZEN S.L., SPIEGELHARTER D.J., 1999.

Probabilistic Networks and Expert Systems. Springer, Berlin, 324p.

DAS, R., TÜRKOĞLU, İ., SENGÜR, A., 2009. Effective diagnosis of heart disease

through neural network ensembles. Expert Syst Appl, 36: 7675-7680

DAYHOFF, J.E, DELEO, J.M., 2001. Artificial Neural Networks: Opening the black

box. Factors and Staging in Cancer Management, 91: 1615-1635.

DETRANO, R., JANOSI, A., STEINBRUNN, W., PFISTERER, M., SCHMID, J.,

SANDHU, S., GUPPY, K., LEE, S., FROELICHER, V., 1989. International

application of a new probability algorithm for the diagnosis of coronary artery

disease. Am J Cardiol, 64: 304—310.

DORAISAMY, S., GOLZARI, S., NOROWI, N.M., SULAIMAN, M.N., UDZIR,

N.I., 2008 A study on feature selection and classification techniques for

automatic genre classification of traditional malay music. In Proceedings of

ISMIR. , 331-336.

DRESSLER, D.K., 2010. Management of patients with coronary vascular disorders.

In, Smeltzer S.C., Cheever K.H., Hinkle J..L, Bare B. G. (Eds.). Brunner and

Suddarth's Textbook of Medical-Surgical Nursing. 12th edition, p:775–779.

Philadelphia: USA, Wolters Kluwer Health.

DUDA, O.R., HART, P.E., STORK, D.G., 2006. Pattern Classification. John Wiley

& Sons Inc., U.K. 654 p.

DURSUN, R., 2010. Kadin hastalarda koroner risk faktörleri ve koroner arter

hastaliği varlığı ve ciddiyeti arasındaki ilişki. Kardiyoloji Uzmanlık Tezi,

İstanbul, 55 p.

61

FRANK, E., WITTEN, I.H, 1998. Generating accurate rule sets without global

optimization. In Shavlik, J., ed., Machine Learning: Proceedings of the

Fifteenth International Conference, Morgan Kaufmann Publishers, San

Francisco, CA.

FREUD, Y., MASON, L., 1999. The alternating decision tree learning algorithm.

Proceeding of the Sixteenth International Conference on Machine Learning,

Bled, Slovenia, 124-133.

FREUND, Y., SCHAPIRE, R., 1996. Experiments with a new boosting algorithm. In

Machine Learning: Proceedings Of The Thirteenth International Conference,

148-156.

FRIEDMAN, J., HASTIE, T., TIBSHIRANI, R., 2000. Additive logistic regression:

A statistical view of boosting. Annals of statistics. 28(2):337-407.

FUJITA, H., KATAFUCHI, T., UEHARA, T., NISHIMURA, T., 1992. Application

of artificial neural network to computer aided diagnosis of coronary artery

disease in myocardial SPECT bull's-eye images. J NucI Med 33:272-76.

GUYON, I., ELISSEEFF, A., 2003. An introduction to variable and feature

selection. Journal of Machine Learning Research, 3: 1157-1182.

HADDAD, M., ADLASSNIG, K.P., PORENTA, G 1997. Feasibility analysis of a

case-based reasoning system for automated detection of coronary heart

disease from myocardial scintigrams. Artif Intell Med, 9(1): 61–78.

HALL, M., FRANK, E., HOLMES, G., PFAHRINGER, B., REUTEMANN, P.,

WITTEN, I.H., 2009. The WEKA Data Mining Software: An Update;

SIGKDD Explorations, 11(1).

HALL, M. A., SMITH, L. A., 1999. Feature selection for machine learning:

Comparing a correlation-based filter approach to the wrapper. Proceedings of

the Twelfth International Florida Artificial Intelligence Research Society

Conference, AAAI Press, pp.235-239.

HAN, J., KAMBER, M., 2000. Data Mining Concepts and Techniques. Morgan

Kaufmann Publishers, 1st Ed., San Francisco, USA.

HAYKIN, S., 1999. Neural Networks: A Comprehensive Foundation, Prentice Hall,

USA, 842 p.

62

HEALTHWISE Staff, 2011. E. Gregory Thompson (Primary Medical Reviewer)

http://www.webmd.com/heart-disease/how-a-heart-attack-happens

HERON, M., HOYERT, D.L., MURPHY, S.L., KOCHANEK, K.D., TEJADA-

VERA, B., 2009. Final data for 2006. National Vital Statistics Reports;

Hyattsville, MD: National Center for Health Statistics 57(14).

HOPKINS, P.N., WILLIAMS, R.R., 1989. Human Genetics and Coronary Heart

Disease: A Public Health Perspective. Annu Rev Nutr., 9:303-306.

IŞIK, K., 1986. Acil Kalp Hastalıklarında Teşhis ve Tedavi, Beta Basım Yayım

Dağıtım, İstanbul, 459 p.

JAIN, K.A., MAO, J., MOHUIDDIN, K.M., 1996. Artificial Neural Networks: A

Tutorial. Theme Feature, 29: 31-44.

JOHN, H.J., LANGLEY P., 1995. Estimating Continuous Distributions in Bayesian

Classifiers. Proceedings of the Eleventh Conference on Uncertainty in

Artificial Intelligence. pp. 338-345. Morgan Kaufmann, San Mateo

KANTARDZIC, M., 2002. Data Mining: Concepts, Models, Methods and

Algorithms. John Wiley & Sons Inc., New York. 360 p.

KARABULUT, E., İBRİKÇİ, T., 2010. Birleştirilmiş Yapay Sinir Ağlarıyla

Parkinson Hastalığı Teşhisi, ELECO, Bursa.

KARABULUT, E., İBRİKÇİ, T., 2011. Effective Diagnosis of Coronary Artery

Disease Using The Rotation Forest Algorithm. Journal of Medical Systems

36(3):1831-1840.

KISI, O., AND UNCUOGLU, E., 2005. Comparison of three backpropagation

training algorithms for two case studies. Indian J Eng Mat Sci 12:434–442.

KORKMAZ, E.,1997. Kardiyovasküler risk fakyörlerinin değitirilmesi yönünde

yapılaca girişimler ve bunların etkinliği.. İlaç ve Tedavi, 10(6):331-341.

KULLER, L., FISHER, L., MCCLELLAND, R., FRIED, L., CUSHMAN, M.,

JACKSON, S., MANOLIO, T., 1998. Differences in prevalence of and risk

factors for subclinical vascular disease among black and white participants in

the Cardiovascular Health Study. Arterioscler Thromb Vasc Biol.,

18(2):28393.

http://www.webmd.com/heart-disease/how-a-heart-attack-happens

63

KUNCHEVA, L., 2004. Combining Pattern Classifiers Methods and Algorithms,

Wiley-Interscience, 360 p.

LANDWEHR, N., HALL, M., FRANK, E., 2005. Logistic Model Trees. Machine

Learning, 95:161-205.

LEWENSTEIN, K., 2001. Radial basis function neural network approach for the

diagnosis of coronary artery disease based on the standard electrocardiogram

exercise test. Med Biol Eng Comput. 39(3):362-369.

LIU, K., HUANG, D., 2008. Cancer classification using rotation forest. Computers

in Biology and Medicine, 38: 601-610.

MARQUARDT, D., 1963. Journal of the Society for Industrial and Applied

Mathematics. J Soc Ind Appl Math, 11(2):431–441.

MINSKY, M., 1961. Steps toward artificial intelligence. Proceedings of the Institute

of Radio Engineers, 49:8-30.

MITCHELL, T.M., 1997. Machine Learning. WCB/McGrawHill, Boston, 414 p.

MOLLER M.F., 1993. A scaled conjugate gradient algorithm for fast supervised

learning. Neural Networks 6(4): 525-533.

MOBLEY, B.A., SCHECHTER, E., MOORE, W., MCKEE P.A., EICHNER, J.E.,

1999. Predictions of coronary artery stenosis by artificial neural network.

Artificial Intelligence in Medicine, 18:187-203.

NEWMAN, D.J., HEITTECH, S., BLAKE, C.L., MERZ, C.J., 1998. UCI Repository

of machine learning databases. University California Irvine, Department of

Information and Computer Science.

NHLBI National Heart Lung and Blood Institute, 2011.

http://www.nhlbi.nih.gov/health/health-topics/topics/cad/

NOVAKOVIC, J., 2010. The Impact of Feature Selection on the Accuracy of Naive

Bayes Classifier. 18th Telecommunications forum TELFOR.

ONAT, A., BÜYÜKÖZTÜRK K., SANSOY, V., AVCI, Ş.G., ÇAM, N., AKGÜN,

G., TOKGÖZOĞLU, L., ÇAĞLAR, N., ŞAN, M., NIŞANCI, Y., OTO, A.,

ERGENE, O., 2002. Türk Kardiyoloji Derneği Koroner Kalp Hastalığı,

Korunma ve Tedavi Kılavuzu. http://www.tkd.org.tr/kilavuz/k11/4e423.htm

http://www.nhlbi.nih.gov/health/health-topics/topics/cad/

http://www.tkd.org.tr/kilavuz/k11/4e423.htm

64

OPITZ, D., MACLIN, R., 1999. Popular Ensemble Methods: An Empirical Study.

Journal of Artificial Intelligence Research, 11: 169-198

PAULIN, F., AND SANTHAKUMARAN, A., 2011. Classification of breast cancer

by comparing back propagation training algorithms. Int J Comput Sci Eng

(IJCSE), 3(1):327–332.

POLAT, K., SAHAN, S., KODAZ, H., GÜNES, S., 2005. A new classification

method to diagnosis heart disease: Supervised artificial immune system

(AIRS). Proceedings of the Turkish Symposium on Artificial Intelligence and

Neural Networks (TAINN)

POWELL, M.J.D., 1977. Restart Procedures for the conjugate gradient method.

Mathematical Programming, 12: 241-254.

PERREAULT, L., METZGER, J., 1999. A pragmatic framework for understanding

clinical decision support. Journal of Healthcare Information Management,

13(2):5-21.

QUINLAN, J. R., 1993. C4.5: Programs For Machine Learning. Morgan Kaufmann,

Los Altos, 299p.

RIEDMILLER, M., BRAUN, H., 1993. A Direct Adaptive Method for Faster

Backpropagation Learning: The RPROP Algorithm. IEEE International

Conference On Neural Networks, 586-591.

RISSANEN A.M., 1979. Familial aggregation of coronary heart disease in a high

incidence area. Br Heart J., 42(3):294-303.

RODRIGUEZ, J.J., KUNCHEVA, L.I., 2007. An Experimental Study on Rotation

Forest. Proceedings of the 7th international conference on Multiple classifier

systems (MCS'07), Berlin, Heidelberg, 459-468.

RODRIGUEZ, J. J., KUNCHEVA L. I., ALONSO, C.J., 2006. Rotation Forest: A

New Classifier Ensemble Method. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 28 (10): 1619-1630

RUMELHART, D. E., HINTON, G. E., WILLIAMS, R. J., 1986. Learning internal

representations by error propagation. In: Parallel distributed processing:

explorations in the microstructure of cognition, vol. 1., MIT Press,

Cambridge, pp 318–362.

65

SCHAPIRE, R., 1990. The Strength of Weak Learnability. Machine Learning

5(2):197-227.

SCOTT, J.A., AZIZ, K., YASUDA, T., GEWIRTZ, H., 2004. Integration of clinical

and imaging data to predict the presence of coronary artery disease with the

use of neural networks. Coron. Artery Dis., 15(7): 427–434.

SETIAWAN, N.A., VENKATACHALAM, P.A., HANI, A.F.M., 2009. Diagnosis of

Coronary Artery Disease Using Artificial Intelligence Based Decision

Support System. ICoMMS, Penang, Malaysia.

SHANNON, C.E., 1948. A Mathematical Theory of Communication. The Bell

System Technical Journal, 27: 379-423, 623-656.

SIERRA, B., SERRANO, N., LARRANAGA, P., PLASENCIA, E. J., INZA, I.,

JIMENEZ, J. J., REVUELTA, P., and MORA, M. L., 2001. Using Bayesian

networks in the construction of a bi-level multi-classifier. A case study using

intensive care unit patients data. Artif Intell Med, 22: 233-48.

SWETS, J.A., 1979. ROC Analysis Applied To The Evaluation Of Medical Imaging

Techniques. Investigation Radiology, 14:109-121.

TANNER, L., SCHREIBER, M., LOW J., ONG, A., TOLFVENSTAM, T. et al.

2008. Decision tree algorithms predict the diagnosis and outcome of dengue

fever in the early phase of illness. PLoS Negl Trop Dis 2(3): e196.

doi:10.1371/journal.pntd.0000196

TEXAS Heart Institute, 2011.

http://texasheart.org/HIC/Topics/Cond/CoronaryArteryDisease.cfm

TKACZ, E. J., KOSTKA, P., 2000. An application of wavelet neural network for

classification patients with coronary artery disease based on HRV analysis.

Proceedings of the Annual International Conference on IEEE Engineering in

Medicine and Biology, 1391–1393.

TSIPOURAS, M. G., EXARCHOS T. P., FOTIADIS D. I., KOTSIA A. P.,

VAKALIS K. V., NAKA K.K., MICHALIS L. K., 2008. Automated

diagnosis of coronary artery disease based on data mining and fuzzy

modeling. IEEE Trans. Information Technology in Biology, 12(4): 447–457.

http://texasheart.org/HIC/Topics/Cond/CoronaryArteryDisease.cfm

66

TURKOGLU, I., ARSLAN., ILKAY, E., 2003. A wavelet neural network for the

detection of heart valve diseases. Expert Systems, 20(1): 1-7.

VONGKUNGHAE, A., AND CHUMTHONG, A., 2007. The performance

comparisons of backpropagation algorithm’s family on a set of logical

functions. ECTI Transactions on Electrical Eng Electronics and

Communications (ECTEEC), 5(2):114–118.

WANG, Y., MAKEDON, F., 2004. Application of Relief-F feature filtering

algorithm to selecting informative genes for cancer classification using

microarray data. In Proc. IEEE Computational Systems Bioinformatics

Conference, Stanford, California, 497-498.

WHO World Health Organization, 2011.

http://www.who.int/mediacentre/factsheets/fs317/en/index.html

YAN, H., JIANG, Y., ZHENG, J., PENG, C., LI, Q., 2006. A multilayer perceptron-

based medical decision support system for heart disease diagnosis. Expert

Syst. Appl., 30(2): 272-281.

http://www.who.int/mediacentre/factsheets/fs317/en/index.html

67

CURRICULUM VITAE

She was born on June 20th, 1980 in Gaziantep, Türkiye. She received her BSc

degree in Karadeniz Technical University Computer Engineering Department in

Trabzon in 2002. She has been working as an instructor in Gaziantep University

Vocational School of Higher Education department since 2003.

Çukurova university institute of natural and applied ... · marquardt geri yayılım algoritması...

Documents