Çukurova university institute of natural and applied ... · marquardt geri yayılım algoritması...
TRANSCRIPT
ÇUKUROVA UNIVERSITY INSTITUTE OF NATURAL AND APPLIED SCIENCES
MSc THESIS
Esra MAHSERECİ KARABULUT
A RESEARCH ON PERFORMANCE OF DECISION SUPPORT SYSTEMS IN DIAGNOSIS OF CORONARY ARTERY DISEASE
DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING
ADANA, 2012
ÇUKUROVA UNIVERSITY INSTITUTE OF NATURAL AND APPLIED SCIENCES
A RESEARCH ON PERFORMANCE OF DECISION SUPPORT SYSTEMS
IN DIAGNOSIS OF CORONARY ARTERY DISEASE
Esra MAHSERECİ KARABULUT
MSc THESIS
DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING We certify that the thesis titled above was reviewed and approved for the award of degree of the Master of Science by the board of jury on 20/06/2012. ………………............................ …………………………………….. …….............................. Asst. Prof. Dr. Turgay İBRİKÇİ Assoc. Prof. Dr. Selma Ayşe ÖZEL Asst. Prof. Dr. Sami ARICA SUPERVISOR MEMBER MEMBER This MSc Thesis is written at the Department of Institute of Natural And Applied Sciences of Çukurova University. Registration Number:
Prof. Dr. M. Rifat ULUSOY Director Institute of Natural and Applied Sciences
This thesis was financially supported by Ç.U. academic resource foundation MMF2011YL19 Not:The usage of the presented specific declerations, tables, figures, and photographs either in this
thesis or in any other reference without citiation is subject to "The law of Arts and Intellectual Products" number of 5846 of Turkish Republic
I
ABSTRACT
MSc THESIS
A RESEARCH ON PERFORMANCE OF DECISION SUPPORT SYSTEMS IN DIAGNOSIS OF CORONARY ARTERY DISEASE
Esra MAHSERECİ KARABULUT
ÇUKUROVA UNIVERSITY
INSTITUTE OF NATURAL AND APPLIED SCIENCES DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING
Supervisor :Asst. Prof. Dr. Turgay İBRİKCİ Year: 2012, Pages: 67 Jury :Asst. Prof. Dr. Turgay İBRİKCİ :Assoc. Prof. Dr. Selma Ayşe ÖZEL :Asst. Prof. Dr. Sami ARICA
Coronary Artery Disease (CAD) is a common heart disease related with disorders effecting heart and blood vessels. Since the disease is one of the leading reasons of heart attacks and thus deaths, diagnosis of the disease in its early stages or in cases when patients don’t show many of the symptoms has considerable importance. The increase in prevalence in CAD in the world also increased the work for early diagnosis and treatment of cardiovascular diseases. Clinical decision support systems (CDSS) have become an important part of various medical areas in diagnosis in the last few decades.
In this thesis, a research on computational tools is presented to diagnose the CAD in order to support clinical decision making processes. Real-life data is used for our research in order to have convincing experimental results. . These computational tools include decision support systems of artificial neural networks, decision trees and Bayesian networks. Also ensemble systems and effect of feature selection is investigated to improve decision making for diagnosis of CAD in this thesis study. Furthermore, a new method is proposed which employs Rotation Forest ensemble system with base classifiers of artificial neural networks, which are trained by Levenberg-Marquardt backpropagation algorithm. This learning algorithm is selected among several back propagation algorithms because of its superior performance on the CAD dataset. The proposed method reaches a high accuracy and provides a good option for large population diagnosis. The obtained accuracy rate is 91.2%, which is, to the best of our knowledge, the best rate achieved thus far in the relevant literature using the same dataset.
Key Words: Coronary artery disease, decision support systems, Rotation Forest,
artificial neural networks, backpropagation
II
ÖZ
YÜKSEK LİSANS TEZİ
KARAR DESTEK SİSTEMLERİNİN KORONER ARTER HASTALIĞI TEŞHİSİNDEKİ PERFORMANSI ÜZERİNE BİR ARAŞTIRMA
Esra MAHSERECİ KARABULUT
ÇUKUROVA ÜNİVERSİTESİ FEN BİLİMLERİ ENSTİTÜSÜ
ELEKTRİK ELEKTRONİK MÜHENDİSLİĞİ ANABİLİM DALI
Danışman :Yrd. Doç. Dr. Turgay İBRİKCİ Yıl: 2012, Sayfa: 67 Jüri :Yrd. Doç. Dr. Turgay İBRİKCİ :Doç. Dr. Selma Ayşe ÖZEL :Yrd. Doç. Dr. Sami ARICA
Koroner Arter Hastalığı (KAH) kalbi ve kan damarlarını etkileyen düzensizliklerle ilgili yaygın bir kalp hastalığıdır. Hastalık kalp krizi ve ölümlerin başlıca sebebi olduğu için, erken evrelerde veya hasta belirtilerin birçoğunu göstermediğinde hastalığın teşhis edilmesi önemlidir. Dünyada KAH sıklığındaki artış aynı zamanda kalp ve damar hastalıklarının erken teşhis ve tedavi çalışmalarını da artırmıştır. Son birkaç on yıldır klinik karar destek sistemleri teşhis koymada çeşitli tıbbi alanların önemli bir parçası haline gelmiştir. Bu tezde klinik karar verme sürecini desteklemek amacıyla KAH hastalığını teşhis etmek için bilişimsel gereçler üzerine bir araştırma sunuldu. Araştırmamızda deneysel sonuçların ikna edici olması için gerçek-hayat verisi kullanılmıştır. Bu bilişimsel gereçler, karar destek sistemleri olan yapay sinir ağları, karar ağaçları ve Bayes ağlarını içermektedir. Aynı zamanda topluluk sistemleri ve özellik seçiminin etkisi de KAH teşhisinde karar vermeyi iyileştirmek için bu tez çalışmasında araştırılmıştır. Ayrıca taban sınıflandırıcıları yapay sinir ağları olan Rotation Forest topluluk sistemini kullanan yeni bir yöntem önerilmiştir, bu sinir ağları Levenberg-Marquardt geri yayılım algoritması ile eğitilmektedir. Bu öğrenme algoritması birçok geri yayılım algoritması içinden KAH verisindeki daha üstün performansı nedeniyle seçilmiştir. Önerilen yöntem yüksek bir doğruluğa ulaşıyor ve geniş nüfus için teşhis sağlıyor. Elde edilen doğruluk oranı %91.2’dir, bildiğimiz kadarıyla aynı veri setini kullanan ilgili literatür arasında ulaşılmış en yüksek orandır.
Anahtar Kelimeler: Koroner arter hastalığı, karar destek sistemleri, Rotation Forest,
yapay sinir ağları, geri yayılım
III
ACKNOWLEDGEMENTS
I am grateful to my supervisor, Assist. Prof. Turgay İBRİKÇİ, whose
encouragement, guidance and support motivated me in research and study of the
thesis. Furthermore he was always accessible and willing to help in every level of the
study.
It is a pleasure to thank to my thesis committee members and advisors Assoc.
Prof. Dr. Selma Ayşe ÖZEL and Asst. Prof. Dr. Sami ARICA for valuable insight
they shared and guiding advices.
I also thank to my mother Nuran MAHSERECİ and mother-in-law Meral
KARABULUT for their support by looking after my children Erva and Cengiz
during the completion of the thesis.
I would like to show my deepest gratitude to my husband, Mustafa
KARABULUT. His support and patience has taught me so much about discipline
and his experience broadened my perspective in the study of this thesis.
IV
CONTENTS PAGE ABSTRACT .................................................................................................................. I
ÖZ ................................................................................................................................. II
ACKNOWLEDGEMENTS ....................................................................................... III
CONTENTS……………………………………………………………………... .... IV
LIST OF TABLES ...................................................................................................... V
LIST OF FIGURES ................................................................................................... VI
LIST OF ABBREVIATIONS ................................................................................... VII
1. INTRODUCTION ................................................................................................... 1
1.1. Coronary Artery Disease and Risk Factors ....................................................... 1
1.2. General Characteristics of a CAD Patient ......................................................... 5
1.3. Aim and Scope of the Thesis ............................................................................ 6
2. RELATED WORKS ................................................................................................ 9
3. MATERIAL AND METHODS ............................................................................. 13
3.1. Dataset Descriptions........................................................................................ 13
3.2. Methods ........................................................................................................... 14
3.2.1. Artificial Neural Networks .................................................................... 14
3.2.2. Bayesian Classification ......................................................................... 20
3.2.3. Decision Trees ....................................................................................... 24
3.2.4. Ensemble Systems ................................................................................. 28
3.2.5. Feature Selection ................................................................................... 31
4. RESULTS AND DISCUSSION ............................................................................ 35
4.1. Evaluation Metrics .......................................................................................... 35
4.2. Employing Artificial Neural Networks for Diagnosis of CAD ...................... 37
4.3. Using Bayesian Networks and Decision Trees for Diagnosis of CAD ........... 39
4.4. Effect of Feature Selection on Diagnosis of CAD .......................................... 43
4.5. Employing Ensemble Methods for Diagnosis of CAD ................................... 50
4.6. More on Rotation Forest Ensemble and a New Method Proposal for CAD
Diagnosis ......................................................................................................... 51
V
5. CONCLUSIONS .................................................................................................... 57
REFERENCES……………………………………………………….………….. .... 59
CURRICULUM VITAE….………………………………………….…………... .... 67
VI
LIST OF TABLES PAGE
Table 1.1. Coronary artery disease risk factors (Onat et al., 2002)............................ 3
Table 3.1. CAD dataset summary ............................................................................ 13
Table 4.1. Accuracy values of backpropagation algorithms on CAD data .............. 38
Table 4.2. Performance values of BNs constructed by three different optimization
techniques on CAD data. ......................................................................... 40
Table 4.3. Confusion matrix of 303 data of CAD using HillClimber ...................... 41
Table 4.4. Confusion matrix of 303 data of CAD using Simulated Annealing ....... 41
Table 4.5. Confusion matrix of 303 data of CAD using TAN ................................. 42
Table 4.6. Evaluation results of five decision trees according to CAD data ........... 42
Table 4.7. Features selected by Relief-F filter ......................................................... 44
Table 4.8. Effect of Relief-F filter on classification performance ........................... 44
Table 4.9. Features selected by Gain Ratio filter ..................................................... 46
Table 4.10. Effect of Gain Ratio filter on classification performance ....................... 46
Table 4.11. Features selected by Symmetrical Uncertainty filter .............................. 48
Table 4.12. Effect of Symmetrical Uncertainty on classification performance ......... 48
Table 4.13. Accuracy values of ensemble classifiers using different base
classifiers ................................................................................................. 50
Table 4.14. Sensitivity values of ensemble classifiers using different base
classifiers ................................................................................................. 50
Table 4.15. Specificity values of ensemble classifiers using different base
classifiers ................................................................................................. 50
Table 4.16. Classification results of CAD dataset applied in different classifiers. .... 53
Table 4.17. Classification results of RF algorithm with different base classifiers .... 54
Table 4.18. Classification accuracy results of literature methods that utilize the
same dataset ............................................................................................ 53
VII
VIII
LIST OF FIGURES PAGE
Figure 1.1. Diagram of the Coronary Arteries (Texas Heart Institute, 2011) ....... 2
Figure 1.2. Normal artery and narrowing of artery (NHLBI, 2011) ..................... 2
Figure 1.3. How a heart attack happens (Healthwise, 2011) ................................ 5
Figure 3.1. A neuron with single input and bias ................................................. 15
Figure 3.2. A neuron with vector input and bias ................................................. 15
Figure 3.3. The tansig transfer function .............................................................. 16
Figure 3.4. logsig transfer function ..................................................................... 17
Figure 3.5. Representation of a multilayer perceptron with one hidden layer .... 18
Figure 3.6. A training process flowchart using backpropagation algorithm
(Moghadassi et al., 2009) ................................................................. 19
Figure 3.7. Representation of Naïve Bayes DAG as a BN ................................. 22
Figure 3.8. A Simple Bayesian Network made up of a DAG and probability
tables ................................................................................................. 23
Figure 3.9. A simple decision tree for Heart Disease (HD) diagnosis ................ 25
Figure 3.10. An ensemble system with three base classifiers ............................... 28
Figure 4.1. Experimental setup of the artificial neural network used for CAD
diagnosing ........................................................................................ 38
Figure 4.2. Comparison of performances of BP algorithms on CAD data. ........ 39
Figure 4.3. ROCs of classification of BayesNetwork using HillClimber,
Simulated Annealing and TAN respectively .................................... 40
Figure 4.4. Accuracies of classifiers with and without Relief-F filter ................ 45
Figure 4.5. AUCs of classifiers with and without Relief-F filter ........................ 45
Figure 4.6. Accuracies of classifiers with and without Gain Ratio filter ............ 47
Figure 4.7. AUCs of classifiers with and without Gain Ratio filter .................... 47
Figure 4.8. Accuracies of classifiers with and without Symmetrical Uncertainy
filter .................................................................................................. 49
Figure 4.9. AUCs of classifiers with and without Symmetrical Uncertainty
filter ................................................................................................... 49
Figure 4.10. Comparison of performances of ensemble algorithms using different
base classifiers with respect to accuracy. ......................................... 51
IX
Figure 4.11. Effect of RF algorithm on different classifiers ................................. 55
Figure 4.12. ROC analysis of Levenberg-Marquardt algorithm with and without
RF ..................................................................................................... 55
X
LIST OF ABBREVIATIONS
Acc : Accuracy
AUC : Area Under Curve
ANN : Artificial Neural Network
BN : Bayesian Network
CAD : Coronary Artery Disease
CBR : Case Based Reasoning
CDSS : Clinical Decision Support Systems
CP : Chest Pain
DAG : Directed Acyclic Graph
DM : Diabetes Mellitus
DSS : Decision Support Systems
FN : False Negatives
FP : False Positives
FT : Functional Tree
GR : Gain Ratio
HD : Heart Disease
HDL : High Density Lipoprotein
IB1 : Instance Based Learning
IG : Information Gain
LDL : Low Density Lipoprotein
MAE : Mean Absolute Error
MI : Myocardial Infarction
MLP : Multi Layer Perceptron
PCA : Principal Component Analysis
RBF : Radial Basis Function
RF : Rotation Forest
ROC : Receiver Operating Characteristics
SCG : Scaled Conjugate Gradient
Sn : Sensitivity
Sp : Specificity
XI
SU : Symmetrical Uncertainty
TAN : Tree Augmented Bayesian Network
TN : True Negatives
TP : True Positives
UCI : University of California Irvine
WNN : Wavelet Neural Network
1. INTRODUCTION Esra MAHSERECİ KARABULUT
1
1. INTRODUCTION
Coronary Artery Disease (CAD), which refers to a wide variety of diseases
and disorders affecting the heart and the blood vessels, is the most common type of
heart disease. According to 2006 statistics, 26% of deaths in the United States are
caused by heart disease, more than one in every four (Heron et al., 2009) as well as
in other countries such as Russia, New Zealand, Australia, and even in Europe. It is a
common cause of heart attacks and thus the most deadly disease in the world
(Setiawan et al., 2009), also the most common cause of sudden death in people who
are over 20 years old. By 2030, almost 23.6 million people are expected to die from
cardiovascular diseases, mainly from heart disease and stroke (WHO, 2011). This
situation also causes labor shortage and financial burden. It is necessary to get the
risk factors under control to prevent cardiovascular diseases.
The increase in prevalence in CAD in the world also increased the work for
early diagnosis and treatment of cardiovascular diseases. Clinical decision support
systems (CDSS) become an important part of various medical areas in diagnosis in
the last few decades. Not only diagnosis accuracy is improved by this way, but also
clinical complexity, details, cost control are managed and duplicate or unnecessary
tests are avoided (Perreault and Metzger, 1999). CDSS support experiences of
physicians and are a component of medical technology. Motivated by these facts we
achieved a research on improving CAD diagnosis by various decision support
systems.
1.1. Coronary Artery Disease and Risk Factors
Coronary arteries are two major vessels that provide blood, oxygen and
nutrients to the heart as represented in Figure 1.1. The narrowing and blockage of
these arteries is called atherosclerosis which cause CAD. Atherosclerosis is the
accumulation of cholesterol and fatty material (called plaques) on the inner walls of
the arteries. These plaques cause reduced or absent blood flow to the heart and result
in shortage of oxygen and vital nutrients it needs to work properly. Figure 1.2
1. INTRODUCTION Esra MAHSERECİ KARABULUT
2
represents normal artery and that kind of narrowed artery. This shortage of blood
flow causes chest pain or angina. If plaque completely blocks the artery, it may
cause a heart attack.
Figure 1.1. Diagram of the Coronary Arteries (Texas Heart Institute, 2011)
Figure 1.2. Normal artery and narrowing of artery (NHLBI, 2011)
1. INTRODUCTION Esra MAHSERECİ KARABULUT
3
Many factors can cause a higher risk for CAD; they may or may not be
related to the lifestyle of a patient. Therefore, risk factors may be divided as
changeable and unchangeable.
Table 1.1. Coronary artery disease risk factors (Onat et al., 2002) Risk Factor Description Status
Age Greater than 45 for men, and 55 for women or early menopause Unchangeable
Sex More often in men Unchangeable
Family History First degree relatives has CAD before 55 for men, and 65 for women Unchangeable
Smoking A pocket of cigarette a day increase CAD risk two times. Changeable
High Blood Pressure (Hypertension)
≥140/90 mmHg antihypertensive usage Changeable
Total Cholesterol ≥200 mg/dl Changeable High LDL ≥130 mg/dl Changeable Low HDL <40 mg/dl Changeable Diabetes Mellitus (DM)
Takes a risk of equivalent to existence of CAD Changeable
On average, CAD appears in women 10 years later than men. MI (myocardial
infarction) and other complications are seen later. Men aged between 40 and 65 are 7
times more diseased than women (Işık, 1986). In cardiovascular studies, the ratio of
vascular diseased men and women between 65 and70 ages are 33% and 22%
respectively. For old people over 85 years, these ratios are 45% and 43% for men and
women respectively (Kuller et al., 1998).
Many studies determined that, there is a relationship between CAD risk and
early beginning CAD when a first degree relative in family is diseased with CAD
(Hopkins and Williams, 1989). This risk continues even if other risk factors are
eliminated. If a male family member younger than 55 years old or a female family
member younger than 65 years old has CAD, then the risk is accepted to exist in the
family history. If the diseased family member is younger or the number of diseased
family members increases the risk also increases (Rissanen, 1979; Bassuk, 2008;
Dursun, 2010).
1. INTRODUCTION Esra MAHSERECİ KARABULUT
4
Smoking is as serious risk factor like high blood pressure and is important in
aspect of it is changeable. Smoking causes unhealthy cholesterol levels by tightening
blood vessels. CAD risk decreases in people who abandon cigarette dramatically
(Korkmaz, 1997). Cardiovascular diseases can exist even in passive smokers.
Cardiac deaths increase to 2.7 times for men, and 4.7 times for women related to
smoking (Onat et al., 2002).
High blood pressure is a major risk factor that speeds up atherosclerosis
formation. People having high blood pressure are under 2-3 times more risk than
people having normal blood pressure (Dressler, 2010). An important property of high
blood pressure is that it can be taken under control with 90% possibility by proper
drug treatment (Korkmaz, 1997).
Malnutrition can lead to obesity, diabetes and abnormal cholesterol which are
also causes of CAD. Obesity can be also caused by genetics and hormonal disorders.
Abnormal cholesterol results in increase in LDL (low density lipoprotein, bad
cholesterol) and decrease in HDL (high density lipoprotein, good cholesterol). LDL
cholesterol accumulates on the inner walls of arteries and increases the chance of
having heart disease. Therefore LDL values are required to be low. HDL cholesterol
protects the arteries by preventing LDL cholesterol from building in the arteries.
Therefore HDL values are expected to be high.
Diabetes mellitus (DM) is a kind of metabolic disease that is caused by
insufficient production of insulin or the inability of human to respond to the insulin
formed in the body. Therefore, a high level of sugar (glucose) exists in blood.
Diabetes damages a membrane called endothelium in the inner wall of arteries and
causes atherosclerosis; arteries are hardened and normal blood flow is prevented.
Another risk factor, physical inactivity can cause obesity, hypertension and
decrease in cardiovascular function capacity (Dressler, 2010). It is recommended to
walk 30 minutes a day. Regular physical exercises reduce obesity and risk of
diabetes. But those suffering CAD must avoid sudden or irregular exercises, because
it increases the risk of MI. Also emotional factors such as depression, stress, social
isolation increase the cardiovascular risk. Such factors can lead to high blood
pressure, arterial damage or irregular heart rhythms. People that have such emotional
1. INTRODUCTION Esra MAHSERECİ KARABULUT
5
problems usually have tendency to smoking, using drugs, drinking alcohol
excessively or overeating; therefore linking to other risk factors. They report the
social visit quantity of half of other patients; isolated patients were usually unmarried
and have not an intimate. (Beverley et al., 2001).
1.2. General Characteristics of a CAD Patient
Heart muscle works continuously and always needs blood supply. When a
patient works strenuously this need increases. This situation leads patient to feel pain
or discomfort such as tightness, pressure, burning or squeezing. Shortness of breath
may also occur. This is the most common symptom of CAD which called angina.
Angina can be felt not only in the chest, but also in the left shoulder, arms, neck,
back or jaw. Angina is more frequent in cold weather because vessels may contract,
increasing the work of the heart and decreasing the blood supply to the heart at the
same time. Symptoms repeat in stable angina when the patient repeats the same
strenuous activity and disappear when the patient rests. But unstable angina lasts
longer, happens immediately and can occur while patient is resting. It is a warning of
heart attack and requires treatment.
Figure 1.3. How a heart attack happens (Healthwise, 2011)
1. INTRODUCTION Esra MAHSERECİ KARABULUT
6
Some of the plaque in atherosclerosis progress may be in clot form and
temporarily blocks the artery. This situation lead to sudden angina until the clot
resolves. Figure 1.3 represents such a blockage. This sudden blockage is called acute
coronary syndrome which is a medical emergency. Therefore, if the blood supply for
heart can’t be provided for more than half an hour heart muscles start to die because
of the shortage of oxygen. This means a heart attack (i.e. myocardial infarction, MI).
Blockage of a coronary artery can cause patient to have a serious heart beat
irregularity (arrhythmia) which is a disorder in the hearth’s electrical activity. Heart
muscles are damaged and caused electrical instability in the patients suffering from
CAD or MI. Dizziness, nausea, and sweating are some other symptoms in CAD.
Sometimes no symptoms are present, which results in difficulty in the diagnosis of
the heart disease.
1.3. Aim and Scope of the Thesis
Decision Support Systems (DSSs), which means computer technologies for
decision making or solutions to problems, are getting more interest of researchers
recently in medical decision making. DSSs include tools which are proved to have a
considerable success in disease diagnosis. They improve the quality of clinical
decisions, while preventing much of the human supplied errors and provide a more
qualified service to patients.
The aim of this thesis is to develop DSSs to improve diagnosis of CAD for
supporting clinical decision making. CAD is one of the leading ones in the world
resulting in deaths of patients. Sometimes no symptoms are present, which results in
difficulty in the diagnosis of heart disease. Diagnosing the disease in its early stages
is of great importance and therefore several methods are utilized to diagnose CAD.
Our aim is to develop computer-based solutions to solve difficulties of clinical
decision making. Artificial neural networks, decision trees and Bayesian networks
are included for this thesis. Real-life data is used for our research in order to have
experimental results to be a convincing proof. Ensemble systems and effect of
feature selection are also in the research issue. In the result of the study, we aim to
1. INTRODUCTION Esra MAHSERECİ KARABULUT
7
effectively categorize patients as they have CAD or not. This categorization is made
according to cheap and available data such as patient age, sex and results of some
laboratory experiments. To achieve this aim not only individual decision support
systems, but also ensembles of these systems are studied in the scope of this thesis.
Although ensemble systems are an active research field (Opitz and Maclin, 1999) in
machine learning and pattern recognition, only few studies (Das et al., 2009; Detrano
et al., 1989) are present in the literature which diagnoses CAD by means of
computer-based methods using noninvasive and widely available data. Namely we
aim to reach a high accuracy and provide a good option for large population
diagnosis.
In the study of ensemble systems of the thesis a new method is proposed
which is the first study in literature that utilizes RF to diagnose CAD. In this method,
ANNs are used as base classifiers of the Rotation Forest algorithm, each of which
uses the Levenberg-Marquardt back propagation algorithm. This learning algorithm
is selected from among several back propagation algorithms because of its superior
performance on the CAD dataset. An ensemble system with three neural network
base classifiers is proposed, and the final unique decision of these classifiers is
determined by evaluating each of their decisions. In this way, the proposed method
reaches a high accuracy and provides a good option for large population diagnosis.
1. INTRODUCTION Esra MAHSERECİ KARABULUT
8
2. RELATED WORKS Esra MAHSERECİ KARABULUT
9
2. RELATED WORKS
In the literature, computer aided methods are proposed as automatic
diagnostics systems of CAD. Since such tools have been proven to have a
considerable success in disease diagnosis and hence improve the quality of clinical
decision-making processes, they are called Clinical Decision Support Systems
(CDSS). CDSS also decrease the human error rate and provide better-informed
service to patients. In this context, an early computerized method (Fujita et al., 1992)
attempts to diagnose the disease from SPECT Bull’s-eye images. This method
utilizes artificial neural networks (ANNs) and achieved a diagnosis accuracy of 77%
on average. Scott et al. (2004) used myocardial perfusion imaging data with clinical
data for CAD prediction. Artificial neural networks were employed in that study;
88% sensitivity and 65% specificity results are obtained.
In another study (Tsipouras et al., 2008), a decision support system based
upon fuzzy modeling was developed. The proposed method in the paper works by
evaluating patient history, demographics and some basic laboratory examinations. As
a result, 73% diagnosis accuracy is reported along with other literature methods that
resulted in higher accuracy rates. These more accurate methods performed automatic
diagnosis by means of expensive and not widely available data such as SPECT
images, stress ECHO, and Doppler ultrasound and thus were not preferable.
Haddad et al. (1997) utilized case based reasoning (CBR) to develop an
automatic image interpretation system to determine the presence of CAD.
Interpretation is achieved from a scintigraphic image dataset. Sensitivity and
specificity for detection of CAD were 98% and 70% respectively. Thus, CBR
systems may be used for clinical use, because they may achieve a considerable
diagnostic accuracy.
Yan et al. (2006) developed a multi layer perceptron (MLP)-based decision
support system to support the diagnosis of heart diseases. Three assessment methods,
namely cross validation, holdout and bootstrapping, are applied to evaluate the
generalization of the system. They concluded that MLP-based decision support
system can achieve very high diagnosis accuracy (>90%).
2. RELATED WORKS Esra MAHSERECİ KARABULUT
10
Tkacz and Kostka (2000) presented how to use wavelet neural networks
(WNN) for classifications of patients with CAD. WNN is trained with half of the
heart rate variability data while the other half is used for testing. They investigated
the effect of choice of basic wavelet function. They reported that the highest
sensitivity and specificity values are obtained when tangsoidal and linear activation
function is used in a double-layer WNN. Turkoglu et al. (2003) also used a WNN
model for evaluation of the Doppler signals of heart valve diseases based on pattern
recognition. This model is made up of two layers; the first is a wavelet layer and the
second is multi layer perceptron (MLP). Doppler heart sounds are correctly classified
at an average of %91 of 123 test samples.
An approach using radial basis function neural network for CAD diagnosis is
presented by Lewenstein (2001) who used traditional ECG exercise tests for results.
The best network correctly recognized over 97% of cases from a 400-element test
set; the results are about condition of the patient (simple "sane-sick" diagnosis) and
sick/stenosed vessels.
In a more recent study, Das et al. (2009) utilized data that can be collected
noninvasively, easily, and cheaply from the patient to develop an expert system that
was capable of diagnosing patients with an accuracy of about 89%. They used neural
network ensemble method, and also obtained 80.95% and 95.91% sensitivity and
specificity values respectively in CAD diagnosis. The experiments of Das et al. also
indicate that the increase in the number of nodes in the neural networks doesn’t
improve the performance of the network.
In another study (Mobley et al., 2000), neural networks are used to identify
patients who do not need coronary angiography. Coronary angiography is a
procedure which uses dye and special x-rays to show inside of coronary arteries.
Mobley et al. developed a neural network to predict the existence or nonexistence of
coronary artery stenosis. Patients’ records are used for training, cross-validation and
testing, and as a result some patients are kept from coronary angiography and these
patients are not endangered with any coronary stenosis. 0.89 AUC value is obtained
as a result.
2. RELATED WORKS Esra MAHSERECİ KARABULUT
11
Comak et al. (2007) proposed a decision support system for recognizing heart
valve disorders using Doppler heart sounds. Firstly, redundancy of dataset is reduced
by feature selection and normalization is applied as preprocessing. Least-squares
support vector machine and artificial neural network are used to classify the
extracted features; 90.0% and 94.0% specificity values are obtained respectively.
Therefore support vector machines outperformed neural networks when Doppler
heart sounds are evaluated.
2. RELATED WORKS Esra MAHSERECİ KARABULUT
12
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
13
3. MATERIAL AND METHODS
3.1. Dataset Descriptions
Coronary artery diseased/undiseased patients contributed to dataset of CAD
and in this study these patients are evaluated by decision support systems. We
utilized the medical records of 303 patients. Each record includes 13 features
belonging to the patient, including age, sex, and measurements obtained as a result of
medical examination (see Table 3.1).
Table 3.1. CAD dataset summary Feature Description Age
Sex 1 = male 0 = female
Chest pain type
0 = typical angina 1 = atypical angina 2 = non-anginal pain 3 = asymptomatic
Resting systolic blood pressure (mmHg) Serum cholesterol (mg/dl)
Fasting blood sugar 1 = if fbs is over 120 mg/dl 0 = if fbs is below 120
Resting electrocardiographic results 0 = normal 1 = having ST-T wave abnormality 2 = LV hypertrophy
Maximum heart rate achieved
Exercise induced angina 1 = yes 0 = no
ST depression induced by exercise relative to rest
The slope of the peak exercise ST segment
0 = up sloping 1 = flat 2 = down sloping
Number of major vessels colored by fluoroscopy
Exercise thallium scintigraphic defects 3 = normal 6 = fixed defect 7 = reversible defect
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
14
The dataset is publicly available at “The Data Mining Repository of
University of California Irvine (UCI)” (Newman et al., 1998) and was first
considered by Detrano et al. (1989). By using 13 given attributes, each sample is
classified into one of two groups of patients - those whose vessels are narrowed by
less than 50% or those whose vessels are narrowed by more than 50%. If diameter
narrowing in any major vessel is over 50% then the patient is considered to have the
disease. Otherwise the patient is classified as healthy.
3.2. Methods
3.2.1. Artificial Neural Networks
Artificial Neural Networks (ANNs) were developed with an inspiration of the
human brain; they are the systems of parallel computers. They are constructed by
many neurons connected to each other. ANN can learn by examples as human
beings do. An ANN is created for a specific application such as pattern recognition,
data classification, regression etc. with a learning process. Data is obtained from the
environment during this process.
ANNs has the ability to solve some problems that can’t be done by linear
programming methods. Data are not in a database or in a file, but directly in the
weights of neurons. In fact, it can be thought that a summary of this data is used by
using these weights. The network generalizes the decision when it is to decide about
a new example using the data got by learning in this way. With this property of
ANNs a very large variety of problems can be solved. In a situation that some
neurons are not able to work, that network can still continue to work, since it has
fault tolerance (Jain et al., 1996). Moreover, incomplete input data doesn’t prevent
the network from producing output.
A disadvantage of ANNs is that it is data dependent. An ANN works for a
specific data, and if the ANN is to be used with different data, it must be constructed
again (Haykin, 1999). Learning is achieved in either supervised or unsupervised way.
In the supervised way, output values must be given to the network, whereas in the
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
15
unsupervised way given input data is categorized into groups without defining the
desired output (Dayhoff and Deleo, 2001).
1) Mathematical Model
Figure 3.1. A neuron with single input and bias
( )bwpfa += (1)
Scalar input p is multiplied by scalar weight w, and again wp scalar value is
gained. Then bias b is added to wp value. Transfer function f is applied to this
n=wp+b sum (Beale et al., 2010).
Figure 3.2. A neuron with vector input and bias
p=[p1, p2, ..., pR] input vector is multiplied with w=[w11, w21,...,wR1] weights
and sent to sum unit. The bias value, b, is added to the wp value:
bpwpwpwn RR ++++= 1221111 ... (2)
This value can be written in a matrix form:
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
16
n = wp + b (3)
Then, this value is used by a non-linear transfer function, f, to produce the
neuron output:
a = f (wp + b) (4)
2) Transfer Functions
Transfer functions are generally a sigmoid function, a hard limit function or a
function that is defined by the researcher. The selected transfer function produces an
output. If transfer function is selected to be the tansig function as shown in equation
5, its range becomes [-1 1] and it shows a non-linear change according to input
values.
11
2)2( −
+= − ne
a (5)
Figure 3.3. The tansig transfer function
If the transfer function is logsig function as shown in equation 6, its range is
[0 1], and this function also shows a non-linear change in this range.
)(11
nea −+
= (6)
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
17
Figure 3.4. logsig transfer function
If the transfer function is hardlim as shown in equation 7, its output is 0 or 1.
≥<
≡0 10 0
)(nifnif
nf (7)
3) Single Layer and Multilayer Networks
The power of ANNs occurs when more than one neuron is interconnected.
When training data set includes two linearly separable classes, after a number of
training iterations, the perceptron learns (Jain et al., 1996). A single layer network is
made up of only input and output layer. There is no other layer of neurons, namely
no hidden layers. Learning is achieved by changing weights in each epoch, and
epoch is an iteration of presenting the training data set to the network once (Haykin,
1999).
A neural network may have more than one input and an output layer. If there
is an extra layer that supplies input to output layer of neurons, this layer is called
hidden layer. Each layer has a W weight vector, b bias vector, and an output vector.
There are many substantial points to be carefully decided while designing multilayer
networks (Haykin, 1999):
• Number of hidden layers in the network
• Number of neurons in each hidden layer
• Finding a global optimum solution to prevent local minimum
• Finding an optimal solution in an acceptable time
• Testing the validity of network
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
18
Figure 3.5. Representation of a multilayer perceptron with one hidden layer
It is generally enough to use one hidden layer in networks for a large variety
of problems. By using two hidden layer is a better model but it is more likely to drop
to local minimum. Number of neurons is also an important parameter of a network,
but sometimes large number of neurons can prevent network to decide properly. This
increases the complexity of network and overfitting occurs (Dayhoff et al., 2001). In
overfitting situation, network is so sensitive that it decides wrongly when the pattern
has a bit noise.
A very common way used by learning algorithms in training multilayer
networks is backpropagation. Backpropagation algorithm tries to optimize the
weights according to network error. It updates the weight values beginning from the
weights at last layer and continues towards the input layer at each epoch.
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
19
Figure 3.6. A training process flowchart using backpropagation algorithm
(Moghadassi et al., 2009)
There are some variants of backpropagation algorithm such as Scaled
Conjugate Gradient (SCG) (Moller, 1993), Levenberg-Marquardt (Marquardt, 1963),
Resilient Backpropagation (Riedmiller and Braun, 1993) and Powell-Beale
Conjugate Gradient (Powell, 1977). Levenberg-Marquardt is the fastest training
algorithm for networks of small and medium size, it provides a proper training but
when the size of network gets larger and the number of weights is over a hundred, its
performance decreases especially in pattern recognition problems (Beale et al.,
2010). SCG is also a general purpose backpropagation algorithm and faster than
Levenberg-Marquardt in large networks. The performance of all back propagation
algorithms is problem dependent.
Initialize training Epoch=1
Initialize weights and biases with random values
Present input pattern and calculate output values
Calculate mse
mse<msemm
Epoch≥Epochmax
Update weights and biases
Epoch=Epoch+1
Stop training network
Yes
Yes
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
20
3.2.2. Bayesian Classification
Bayesian classifiers are statistical classifiers and based on Bayes theorem.
They can predict the probabilities of class membership of a sample to each class.
1) Bayes’ Theorem
Assume A and B are two events, and equation 8 gives the formula of the
conditional probability of A, in such a condition that B has already occurred.
)()()|(
BPBAPBAP ∩
= (8)
Therefore,
)()|()( BPBAPBAP =∩ (9)
)()|()( APABPBAP =∩ (10)
When right sides of 9 and 10 are equated Bayes’ theorem is obtained:
(11) )(
)()|()|(AP
BPBAPABP =
2) Naïve Bayes’ Classification
Despite its simple nature, Naïve Bayes is one of the most efficient and well-
known algorithms (Minsky, 1961). It estimates the class probabilities of a given
sample and selects the class with maximum probability value as the decision. Naïve
Bayes algorithm assumes that each attribute value of a sample is independent of
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
21
other attributes, but the class value of the sample affects all the other attributes. This
assumption, also known as “the conditional independence”, simplifies the probability
calculation. This independence assumption works very efficiently for problems in
medical fields, probably connected to the fact that the chosen symptoms are
independent to some degree (Sierra et al., 2001)
Naïve Bayes classifier is also called simple Bayesian classifiers and works as
follows:
1. Assume that X is a data sample and has X={x1, x2, … , xn} attribute values. Assume
that there are m classes in the data set as C1, C2, … , Cm. Using Bayes’ theorem the
following equation can be used to determine the class of X sample for each of the
classes (the maximum class value is going to be regarded):
)()()|(
)|(XP
CPCXPXCP ii
i = (12)
2. In Naïve Bayes classification it is assumed that xi values are conditionally
independent and therefore the computation is simplified. And now to calculate
)|( iCXP the following equation can be used (Han and Kamber, 2000):
(13) )|()|(1
∏=
=n
kiki CxPCXP
3. P(X) value will be the same in all class probability calculations and because the
maximum class probability is going to be selected as class we can discard P(X) from
calculations.
(14) )}()|(max{arg ii CPCXP
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
22
3) Bayesian Networks
A Bayesian network (BN), also known as Belief network, is a probabilistic
graphical model representing dependence relationships between variables (Cowell et
al., 1999). A Bayesian network structure is a directed acyclic graph (DAG) and
DAGs are widely used in statistics and machine learning. The set of nodes and set of
directed edges construct a DAG, but no cycle among directed edges is allowed.
Unlike Naïve Bayes, BN regards that an attribute value of a sample may be
dependent on the other attribute value. With this aspect Naïve Bayes can be viewed
as a simple BN without conditional dependencies as shown in Figure 3.7.
Figure 3.7. Representation of Naïve Bayes DAG as a BN
The nodes in a DAG represent random variables. An edge from one node to
the another represents statistical dependence between represented variables. So the
elements of a BN are a DAG and a probability table for each node. A node having
parents has conditional probability table, and a node without any parents has
unconditional probability table. The conditional probability table for a node must
have entries for each possible combination of values of its parents
When specifying the probability tables prior probabilities must be given to the
nodes without parents and conditional probabilities for other nodes. BN can be
constructed after DAG and probability tables are given as in the following sample in
Figure 3.8:
C
x1 x2 xn
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
23
Figure 3.8. A Simple Bayesian Network made up of a DAG and probability tables
Joint probability from a Bayesian Network is computed as:
(15) ))(|()...,,(1
21 ∏=
=n
iiin xparentsxPxxxP
To use a BN as a classifier equation 15 is used for all class values. In Bayesian
classification there aren’t any learning rules, instead there are estimated probabilities.
As a classification example for Bayesian Network in Figure 3.8, assume that a new
sample has attribute values as; Exercise(E)=No, Smoking(S)=Yes, Shortness of
Breath(SB)=No and Chest Pain(CP)=Yes. The class attribute is Heart Disease(HD),
and for ‘Yes’ hypothesis and for ‘No’ hypothesis two probability values are
calculated, and the larger value indicates the class label of this sample. Exercise and
Smoking attributes have no parents so P(E=Yes)=0.7, so P(E=No)=1-P(E=Yes) and
P(E=No)=0.3, P(S=Yes)=0.25. Now according to first hypothesis the results of
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
24
HD=Yes, P(HD=Yes|E=No,S=Yes)=0.75, P(SB=High|HD=Yes)=0.85 and
P(CP=Yes|HD=Yes)=0.7 are obtained. Depending on the equation 15, first
hypothesis probability is (0.3)(0.25)(0.75)(0.85)(0.7)=0.0335. According to second
hypothesis, the results of HD=No, P(HD=No|E=No,S=Yes)=0.25,
P(SB=No|HD=No)=0.15 and P(CP=Yes|HD=No)=0.3 are reached. Depending on
the formula 15, second hypothesis formula is found these results
(0.3)(0.25)(0.25)(0.15)(0.3)=0.0008. and the result of first hypothesis 0.0335 is
larger than this value, so the class value of this sample is decided as HD=Yes.
An expert of a particular subject may prepare the DAG and probability tables
to use Bayesian Network for classification. But usually this is not the way, and
learning algorithms must be used to construct Bayesian Networks. This subject has
attracted many researches, and many approaches are presented. Some widely used
optimization algorithms such as hill climbing, simulated annealing, and tabu search
can be used to find DAG of the BN heuristically. And once the network is
constructed a DAG probability tables can be produced directly from data.
BNs are promising classifiers and are useful in medical areas and diagnoses
(John and Langley, 1995). Other than machine learning, they have been used for text
mining, natural language processing, speech recognition, signal processing,
bioinformatics and weather forecasting.
3.2.3. Decision Trees
Decision tree is one of the most commonly used algorithms in classification
and pattern recognition literature recently. The most important reason for this is the
use of comprehensive and clear rules in construction of a decision tree. A decision
tree is a prediction method which can easily integrate with information technologies,
and can be used in clinical decision making, for example a type of decision tree C4.5
can be used to yield clinically useful predictive values (Tanner et al., 2008)
A decision tree is made up of nodes, branches and leaves (Quinlan, 1993). A
node is the testing unit of the tree. The result of this test causes tree to branch without
losing data, and this branching is according to up-level branching. If there is a
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
25
specific class in a node, then this node is a leaf from now on, and no branching
continues from this node. Figure 3.9 shows a sample tree where a leaf represents a
class, each child and root node is an attribute of a data set.
Figure 3.9. A simple decision tree for Heart Disease (HD) diagnosis
Decision process in a tree is from root node until reaching a leaf, following
consequent nodes. A path from root node to a leaf produces a decision rule of the
tree. Decision rules resemble rules in programming languages (Quinlan, 1993). There
are four rules in sample tree of Figure 3.9:
Rule 1:
If Chest Pain=No Then HD=No
Rule 2:
If Chest Pain=Yes and
If Shortness of Breath=Yes Then HD=Yes
Rule 3:
If Chest Pain=Yes and
If Shortness of Breath=No and
If Exercise=No Then HD=No
Chest Pain
Shortness of Breath
Exercise HD=Yes
HD=No HD=Yes
HD=No
No Yes
No Yes
No Yes
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
26
Rule 4:
If Chest Pain=Yes and
If Shortness of Breath=No and
If Exercise=Yes Then HD=Yes
In section 3.2.2, a data sample of having attribute values; Exercise(E)=No,
Smoking(S)=Yes, Shortness of Breath(SB)=No and Chest Pain(CP)=Yes is classified
according to Figure 3.8. When this sample is required to be classified according to
decision tree represented in Figure 3.9, it is started at root node ‘Chest Pain’. If it has
‘Yes’ value, then branch is to the ‘Shortness of Breath’ node. If it has ‘No’ value,
then branch is to the ‘Exercise’ node. It has ‘No’ value, so the decision value is
HD=No.
Data classification is a two-phase operation in a decision tree. First phase is
training phase, and second is classification phase. At training phase, a training data is
used for construction of the tree. The rules of tree are determined according to this
training data. At the classification phase a test data is used for validation of the
constructed tree. If accuracy of the tree is at an acceptable ratio, then the tree is used
for new data samples. To classify a new sample it is started from the root and queried
among a top-down path until a leaf is reached. When a leaf is reached, it is
determined as the class of that sample.
For construction of the tree it is important to decide at which attribute to start
to branch. It is an NP-hard problem to construct all possible trees for a dataset and
select the best one (Kantardzic, 2002). Therefore some heuristic methods are needed.
According to the relevant literature these methods may be classified as entropy-based
methods, classification and regression trees and memory-based classification
algorithms. In this study, the entropy-based algorithm, C4.5 (Quinlan, 1993), is used,
which is a more advanced version of ID3 (Quinlan, 1993) and most popular in a
series of classification tree methods (Duda et al., 2006). J48 is the Java
implementation of C4.5 algorithm in WEKA environment (Hall et al., 2009), and J48
is used for experiments about decision trees in this study.
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
27
C4.5 algorithm selects the attributes according to their entropy quantities,
while constructing a tree. Entropy is a measure of uncertainty in a system (Shannon,
1948) and used in many areas. It is a desired situation in a system that entropy is 0.
While constructing the decision tree the Information Gain (Cover and Thomas, 2006)
and the Gain Ratio (Mitchell, 1997) algorithms are used for ranking data set
attributes. Decision tree is constructed according to this ranking. These two
algorithms are also feature selection algorithms; features are selected according to a
threshold value using this ranking. But if these algorithms are not used for feature
selection they are used in ranking of attributes, to determine at which attribute to
branch in construction of the tree. For Gain Ratio, see section 3.2.5.
1) Information Gain
Assume Y is the class attribute of a data set, and X is a given feature, both are
discrete. The information gain of X is the reduction of uncertainty of Y values, when
X values are known. This uncertainty is measured as H(Y), the entropy of Y.
Information Gain (IG) of X is the difference between entropy of Y and entropy of Y
after X values are observed, and is calculated as equation 16.
(16) )|()();( XYHYHXYIG −=
Entropy of Y is calculated with equation 17.
∑∈
−=Yy
ypypYH (17) ))((log)()( 2
Where y is a value of Y class feature, and p(y) is the probability of Y=y. And
entropy of Y after X values are observed is calculated as:
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
28
∑∑∈∈
−=YyXx
xypXypxpXYH (18) ))|((log)|()()|( 2
3.2.4. Ensemble Systems
Ensemble systems are an active research field in machine learning and pattern
recognition. In an ensemble system more than one classifier is trained and each
classifier contributes to the final decision of the system (Kuncheva, 2004). This
contribution is provided by voting the class labels, in order to select a winner which
is the decision of the ensemble system. Voting can be weighted or not. Each
classifier in the ensemble system is called a base classifier. An efficient ensemble
system consists of accurate base classifiers. In this way, a sample misclassified by a
base classifier will be corrected by others. So the outputs are more accurate than
those of a good individual classifier (Opitz and Maclin, 1999). The success of the
ensemble system is dependent on several factors such as performance of base
classifier algorithm, the number of features used, the size of ensemble, and the
decision combining algorithm (Amasyali and Ersoy, 2008).
Usually the diversity of base classifiers conflicts with the accuracy. However,
if the base classifiers are accurate, diversity among them is low (Chandra et al.,
2006). If there is not any diversity among the base classifiers, their combination will
not produce an effective output. Thus, the optimum results can be reached only by
an ensemble consisting of highly accurate classifiers that disagree as much as
possible.
Figure 3.10. An ensemble system with three base classifiers
Data
C1
C2
C3
Combining decisions
New data
Prediction
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
29
Bagging (Breiman, 1996) and Boosting (Schapire, 1990) are two main
frequently used ensemble methods in literature. In Bagging algorithm t number of
subsets is randomly taken from the dataset (bootstrap) with replacement. Each subset
is used to train a classifier, so there are t classifiers in the ensemble. This property
tolerates unstable base classifiers which are too sensitive to changes in training data
set. When a new sample is to be classified, each classifier predicts a decision; and the
final decision of the ensemble is the most frequent decision. Unlike boosting,
bagging can use same or different type base classifiers. Boosting produces base
classifiers one after another. Each base classifier is dependent on the previous
classifier, such that the training set chosen for a base classifier includes the set of
incorrectly classified instances by previous base classifier. Thus, the ensemble is
strengthened by a new base classifier that fixes previous errors. Effect of both
bagging and boosting is clearer when using weak classifiers.
Adaboost algorithm (Freund and Schapire, 1996) is the most popular variant
of boosting and takes its name from ‘adaptive boosting’. Classifiers are added until a
low error ratio is reached. Adaboost assigns a weight value for each candidate
training sample. These candidates are selected according to their weights for actual
training set of base classifier. The candidate training sample that is incorrectly
classified by previous classifiers has greater weight values (Duda et al., 2006). So
Adaboost concentrate on samples which are difficult to classify correctly.
Random Forest algorithm (Breiman, 2001) is also known to be a successful
ensemble in literature. Base classifiers are trees in random forest. In the algorithm of
random forest a t number of bootstrap samples are taken from training data for
construction of t trees. At each node of each tree, m features are selected randomly
and one is selected for a the best split. Many trees can be used to meet name forest,
but the trees are not pruned for performance issues. Also not evaluating all features
for best split at each tree is an advantage over boosting algorithm when it uses trees
as base classifiers.
As a new ensemble method, Rodriguez et al. (2006) proposed the Rotation
Forest (RF) and used decision trees as base classifiers in their study. RF can avoid
accuracy-diversity trade-off problem efficiently (Rodriguez et al., 2006; Liu and
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
30
Huang, 2008). Rodriguez and Kuncheva (2007) have reported RF to be more
accurate than bagging, Adaboost and Random Forest ensembles for a collection of
data sets. RF applies feature extraction of training data of each classifier for
improving diversity. This extraction is not for reducing the dimensions of data but
for presenting the data in another form to the base classifier.
1) Rotation Forest Algorithm
Let X be the training sample set. There are L base classifiers D1, . . . , DL in a
Rotation Forest. The following steps are processed for each of the base classifier Di:
Step 1: Splitting feature set into subsets. Assume there are n features in the
dataset X. Feature set, F, is separated into K disjointed subsets randomly so each
feature subset has M = n / K features. It is not necessary to choose K as a factor of n.
Step 2: Generating coefficients matrix. i denotes for the iteration number of
base classifier to be trained, Di and jth subset of features to train this classifier is Fij.
Let Xij be the part of X having the data that corresponds to Fij features. From each Xij
some subset of class labels is selected randomly, then 75% of remaining Xij is
selected randomly again in order to generate another dataset X’ij. Then coefficients
matrix, Cij, is generated by operation of a linear transformation on X’ij. Coefficients
of this matrix are a(1)ij , . . . , a(Mj)
ij.
Step 3: Constructing a rotation matrix. Coefficients generated in the previous
step are used to obtain Ri:
. (19)
Step 4: Generating rearrange matrix. Ri is arranged to the feature sequence of
original X dataset to generate Rai, so the actual rotation matrix is obtained. When the
1
2
( )(1)1 1
( )(1)2 2
( )(1)
,..., [0] ... [0][0] ,..., ... [0]... ... ... ...[0] [0] ... ,..., K
Mi i
Mi i
i
MiK iK
a aa a
R
a a
=
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
31
dataset is rotated by this rotation matrix, X is multiplied by Rai, training set for Di
classifier is obtained as XRai.
As for classifying an instance, x, confidence of each class label is calculated.
x is assigned to class label having largest confidence. First x’=xRai is generated.
Assume w={w1,…,wc} are class labels and dij (x’) is the probability that class label of
x is wj as determined by Di classifier. Confidence of each class label is calculated as:
1
1( ) ( '), 1,..., .L
j iji
x d x j cL
µ=
= =∑ (20)
3.2.5. Feature Selection
Feature selection is the process of removing redundant or irrelevant features
from the original data set. The execution time of the classifier that will process the
data will reduce, also accuracy will increase, because irrelevant features can include
noisy data affecting the classification accuracy negatively (Doraisamy, 2008). With
feature selection the understandability will be improved and cost of data handling
will be less with feature selection(Arauzo et al., 2011).
A classification algorithm classifies instances to a category according to a
given set of features. When classification is performed on the output of a feature
selection, the prediction will be more certain and clear.
Feature selection algorithms are divided into three categories; filters,
wrappers and embedded. Filters evaluate each feature independent from the
classifier, rank the features after evaluation and take the superior ones (Guyon and
Elisseeff, 2003). This evaluation may be done using entropy for example. When a
decision tree will be used this can be leading which feature to start. Wrappers take a
subset of the feature set, evaluates the classifier’s performance on this subset, and
then another subset is evaluated on the classifier. The subset which the classifier has
the maximum performance is selected. Wrappers are dependent on the classifier
selected. In fact this way is more reliable because classification method affects the
accuracy, but which subset to be selected is an NP-hard problem (Novakovic, 2010).
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
32
It can take considerable process time and memory. Some heuristic algorithms can be
used for subset selection such as genetic algorithm, greedy stepwise, best first or
random search. So the filters are more efficient, but it doesn’t take into account the
fact that selecting the better features may be relevant to classification algorithms.
Embedded techniques perform feature selection during learning process such as
artificial neural networks. In this study three filters, Relief-F, Gain Ratio and
Symmetrical Uncertainty is used.
1) Relief-F
Relief-F algorithm gives a weight for each feature of the dataset. To achieve
this instance is selected randomly from the dataset. Then its nearest neighbors from
the same class and different classes are found. The difference in a feature value for
the same and different classes is calculated. The ability of feature f to discriminate
the instance from the same classes determine the weight of this feature wf: The
following formula can be used to calculate this weight for each feature (Wang and
Makedon, 2004).
wf=P(different value of f | different class) – P(different value of f | same class)
This difference is always desired to be maximum.
2) Gain Ratio
Gain Ratio (GR) applies a kind of normalization to information gain using
entropy of X (Han and Kamber, 2000), such that:
( )IGGR
H X= (21)
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
33
Gain Ratio avoids selecting features having more number of different values
which is a disadvantage of Information Gain. That is features with greater numbers
of values have more information gain than those with fewer values even if they are
actually no more informative (Hall and Smith, 1999).
3) Symmetrical Uncertainty
Symmetrical Uncertainty (SU) also avoids bias of Information Gain towards
the features having more values, and normalization is provided to range [0,1] by
dividing the information gain to sum of entropies of Y class feature and given X
feature:
2( ) ( )
IGSUH Y H X
=+
(22)
As the value of symmetrical uncertainty and gain ratio approaches 1, X feature can
predict Y class more completely.
3. MATERIAL AND METHODS Esra MAHSERECİ KARABULUT
34
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
35
4. RESULTS AND DISCUSSION
4.1. Evaluation Metrics
Some metrics are required in order to measure and validate the performance
of a DSS that is to classify a sample. These metrics generally provide quantitative
results to assess and compare the performance of a classification algorithm. In this
study, each method is evaluated in terms of some metrics, namely Sensitivity,
Specificity and Accuracy. Also, Receiver Operating Characteristics (ROC) curve is
used as a graphical tool to visualize the classifier’s tendency between over-prediction
and under-prediction states. Therefore, another quantitative metric, Area Under
Curve (AUC), which is a ROC curve related value, can be calculated as an additional
means to assess the classification performance.
The calculation of the metrics requires the outcomes of the classifier system
to be labeled with four possible states, as true positives (TP), true negatives (TN),
false positives (FP) and false negatives (FN). Once the output for each sample is
labeled, quantitative metrics, as well as ROC curve, can be calculated for the
evaluated algorithm.
True Positives (TP) refers to the number of samples about which the
classifier decided that the patient has the disease, and in real the patient has the
disease. False Positives (FP) refers to the number of samples about which the
classifier decided that the patient has the disease, but in real the patient has not the
disease. False Negatives (FN) refers to the number of samples about which the
classifier decided that the patient has not the disease, but in real the patient has the
disease. True Negatives (TN) refers to the number of samples about which the
classifier decided that the patient has not the disease, and in real the patient has not
the disease.
Sensitivity (Sn) measures how well the classifier identifies positive results.
/Sn TP TP FN (23)
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
36
Specificity (Sp) measures how well the classifier is at discriminating false
predictions.
Sp TN TN FP (24)
Accuracy (Acc) is a measure to show the binary classifier’s ability to
correctly predict given samples. While Sn and Sp measures only one side of the
classifier, Acc is able to measure the overall performance of a classifier algorithm.
( )Acc TP TN TP FP FN TN (25)
Mean Absolute Error (MAE) is another quantity for measuring classification
performance, that is it measures how far the predicted values from the actual values.
∑=
−=n
iii yf
nMAE
1||1 (26)
In equation 26 fi denotes for predicted values, yi denotes for actual values, and an
average of absolute errors is calculated by MAE.
1) ROC Curve Analysis
ROC analysis is useful to measure and visualize the overall performance of
the classifier. It is also functional for evaluating and comparing classification
algorithms. Thus, it is used for medical diagnosis very commonly (Swets, 1979).
In the case of optimum situation, both Sensitivity and Specificity reach value
of 1. When the Sensitivity gets the value 1, this means that all diseased patients are
classified as diseased. In contrast, if the Specificity gets the value of 1, this means
that no healthy person is classified as diseased. This is the result when the ROC
curve tends to upper left corner.
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
37
Area Under Curve (AUC) is the total area under the ROC curve and a
measure of the performance of the diagnostic test since it reflects the test
performance at all threshold values. The area lies in the interval [0.5, 1] and the
larger area, the better performance.
2) Cross Validation
Cross validation is utilized to validate the performance of the classifier. In
cross validation, dataset is separated into two parts. The classifier is trained with the
first part, and tested with the second part. Then the accuracy of network is calculated.
The network is trained with the second part, and tested with the first part. Again the
accuracy is calculated. The average of these two accuracies is calculated to find the
general accuracy. This kind of cross validation is known as the “hold out” method.
In this study, k-fold cross validation method is utilized, which is an improved
derivative of the original hold out method. In this method, the data is separated into k
subsets, and the hold out operation is performed k times in each of which a subset is
used for testing and the other subsets are used for training. Therefore, the eventual
accuracy is calculated by averaging k number of accumulated accuracies.
4.2. Employing Artificial Neural Networks for Diagnosis of CAD
A two-layer feed-forward neural network is able to learn most input-output
relation, but when complex relationships exist between input and output, the
networks with more layers can be used for quick learning. The neural network
architecture is constructed according to CAD dataset and depicted in Figure 4.1
which includes a two-layer neural network for observations. There are 13 neurons in
input layer, 10 neurons in hidden layer and 1 in output layer. Target values in dataset
are categorical therefore not a regression but a classification is to be processed.
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
38
Figure 4.1. Experimental setup of the artificial neural network used for CAD
diagnosing
Some variants of standard backpropagation algorithm are evaluated using
neural network toolbox of MATLAB for CAD data. The training data is fed to the
created network object; the network is trained and then simulated to response to new
test inputs. Performance values of these algorithms using 10-fold cross-validation for
test data are given in Table 4.1:
Table 4.1. Accuracy values of backpropagation algorithms on CAD data
Id Algorithm Acc (%)
1 Powell -Beale conjugate gradient backpropagation 83.78 2 Fletcher-Powell conjugate gradient backpropagation 84.46 3 Polak-Ribiere conjugate gradient backpropagation 82.77 4 Gradient descent backpropagation 83.78 5 Gradient descent with momentum backpropagation 84.12 6 Gradient descent with adaptive learning backpropagation 81.76
7 Gradient descent with momentum & adaptive learning backpropagation 82.43
8 Levenberg-Marquardt backpropagation 85.14 9 One step secant backpropagation 81.77 10 Resilient backpropagation 84.46 11 Scaled conjugate gradient backpropagation 84.12
input1 (age)
input2 (sex)
input3 (c.p.t)
input12(...)
input13(…)
output1 (class)
INPUT LAYER HIDDEN LAYER OUTPUT LAYER
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
39
Standard backpropagation algorithm (Rumelhart et al., 1986) (with id 4) is a
gradient descent method in which network weights are adjusted to the negative of the
gradient of the performance function. There are various convergence problems in
gradient descent and therefore the network can be very slow in practice, so other
variants of BP are evaluated. Levenberg-Marquardt algorithm (Marquardt, 1963) is
one of these BP algorithms and can be thought as a combination of gradient descent
and Gauss-Newton method. It has the stability of gradient descent and speed of
Gauss-Newton. It has more convergence ability and out performs other BP
algorithms in many applications in efficiency and classification accuracy (Paulin and
Santhakumaran, 2011; Vongkunghae and Chumthong, 2007; Kisi and Uncuoglu,
2005). According to Table 4.1 Levenberg-Marquardt algorithm out performed other
BP algorithms in CAD data with the highest performance 85.14% in testing samples.
Figure 4.2 presents a visual comparison of results.
Figure 4.2. Comparison of performances of BP algorithms on CAD data.
4.3. Using Bayesian Networks and Decision Trees for Diagnosis of CAD
Performance of a Bayesian Network is primarily related to the optimization
technique that constructs the network. Three optimization techniques HillClimber,
Simulated Annealing and Tree Augmented Bayesian Network (TAN) algorithms are
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
40
utilized and analyzed in WEKA environment to observe the success of BN in
classification of CAD patients as summarized in Table 4.2. Evaluation metrics of
Sensitivity, Specificity, area under curve (AUC) and mean absolute error (MAE) are
used.
Table 4.2. Performance values of BNs constructed by three different optimization techniques on CAD data.
Sensitivity Specificity AUC Acc (%) MAE
HillClimber 0.804 0.861 0.908 83.50 0.195 Simulated Annealing 0.804 0.855 0.899 83.17 0.221
TAN 0.768 0.855 0.912 81.52 0.206
According to Table 4.2 HillClimber seems to be the most successful of three
algorithms with consideration of specificity, accuracy and mean absolute error. In
most applications it is difficult for HillClimber to outperform Simulated Annealing,
because it is more probable for it to result in local optimum. In fact all three
classification results are not very different from each other. A Bayesian Network
classification is very dependent on its structure, but as here if the results are similar
for different structuring algorithms then we understand that the dataset is stable.
Figure 4.3. ROCs of classification of BayesNetwork using HillClimber, Simulated
Annealing and TAN respectively
It can be seen that the largest AUC, 0.912, belongs to TAN algorithm.
Generally classifiers produce an output decimal value and they determine the last
decision of classification according to a threshold value. Selecting the proper
threshold is important for the prediction. Different threshold values are evaluated
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
41
while ROC is produced (Karabulut and İbrikçi, 2010). In a ROC the region closer to
up-left corner is the region where the classifier is successful. When the most proper
threshold value is used a curve closest to up-left corner is plotted. Larger AUC
means that TAN is the most independent from threshold value of classification. Of
course this is a positive property for a classifier, but when evaluating all three
classifiers, other metrics must also be regarded.
Table 4.3. Confusion matrix of 303 data of CAD using HillClimber
Classified As
Num=0 Num=1
Actual Class Value
Num=0 142 23
Num=1 27 111
Table 4.4. Confusion matrix of 303 data of CAD using Simulated Annealing
Classified As
Num=0 Num=1
Actual Class Value
Num=0 141 24
Num=1 27 111
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
42
Table 4.5. Confusion matrix of 303 data of CAD using TAN
Classified As
Num=0 Num=1
Actual Class Value
Num=0 141 24
Num=1 32 106
Sensitivity and Specificity values change between 0 and 1, and desired to be
close to 1. When sensitivity is 1, number of false negatives (FN) is 0; no patient with
disease is diagnosed as healthy. When specificity is 1, number of false positives (FP)
is 0, no healthy patient is diagnosed as a patient with disease. For example in Table
4.3, FN=23, FP=27 and there are a total of 50 wrong diagnosis. Similarly, Table 4.4
and Table 4.5 show wrong diagnosis as 24+27=51 and 24+32=56 respectively.
If decision trees are used to implement classification problem solving in form
of diagnostic procedure, each node of the tree corresponds to an observable. Each
node is a comparison unit of a feature of database. A comparison of some decision
trees is achieved in terms of evaluation metrics of sensitivity, specificity, area under
curve (AUC), accuracy (acc) and mean absolute error (MAE). The employed
decision trees are ADTree (Freud and Mason, 1999), BFTree (Friedman et al., 2000),
J48 (WEKA implementation of C4.5), Functional Tree (FT) (Landwehr et al., 2005)
and SimpleCart (Breiman et al., 1984).
Table 4.6. Evaluation results of five decision trees according to CAD data Sensitivity Specificity AUC Acc (%) MAE ADTree 0.761 0.842 0.896 80.53 0.278 BFTree 0.717 0.830 0.783 77.89 0.256 J48 0.710 0.836 0.804 77.89 0.259 FT 0.812 0.836 0.884 82.51 0.196 SimpleCart 0.746 0.861 0.818 80.86 0.277
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
43
According to Table 4.6 FT outperforms all other decision trees in terms of
accuracy, sensitivity and AUC. It also has the minimum classification error.
SimpleCart and ADTRee algorithms are close to FT in accuracy in classification of
CAD data.
4.4. Effect of Feature Selection on Diagnosis of CAD
In this section, the effect of three feature selection algorithms on performance
of classifiers is analyzed. These feature algorithms are Relief-f, Gain Ratio and
Symmetrical Uncertainty. The performance values are calculated before and after the
feature selection, and these values are compared. To achieve this 9 classification
algorithms are used and evaluated in WEKA (Hall et al., 2009). These algorithms
are, BayesNet, Multilayer Perceptron (MLP), Radial Basis Function (RBF), Instance
Based Learning (IB1) (Aha et al., 1991), KStar (Cleary and Trigg, 1995), PART
(Frank and Witten, 1998), ADTree, BFTree, SimpleCart. The algorithms are
evaluated in terms of Acc, AUC and Mean Squared Error.
Each of the Relief-F, Gain Ratio and Symmetrical Uncertainty algorithms
assign an evaluation score to each feature of dataset, and then rank all features in
descending order and take a pre-determined number of most successful features.
Original dataset has 13 features, and 8 features are selected by feature selecting
algorithms. Different number of features can be determined by users. So the number
of dimensions of the dataset is reduced and the data has become more
comprehensible and easier to study on.
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
44
Table 4.7. Features selected by Relief-F filter No Attribute Evaluation Score 1 cp {typ_angina, asympt, non_anginal,
atyp_angina} 0.1729
2 thal {fixed_defect, normal, reversable_defect} 0.1259 3 sex {male, female} 0.1106 4 ca numeric 0.0943 5 slope {down, flat, up} 0.0776 6 exang {no, yes} 0.0683 7 restecg {left_vent_hyper, normal,
st_t_wave_abnormality} 0.0644
8 oldpeak {numeric} 0.0237
According to Table 4.7, chest pain (cp) feature of the dataset has the highest
score according to evaluation of Relief-F algorithm. This means that cp feature
values have more ability to discriminate the diseased patients from the healthy ones,
because Relief-F selects the features according to ability of features to be
discriminative on a class member relative to other class members.
Table 4.8. Effect of Relief-F filter on classification performance Id Method Acc (%) eAcc (%) AUC eAUC MSE eMSE 1 BayesNet 83.50 83.83 0.908 0.912 0.134 0.121 2 MLP 80.20 84.82 0.878 0.891 0.171 0.133 3 RBF 84.16 83.50 0.895 0.906 0.120 0.118 4 IB1 76.23 76.56 0.760 0.764 0.238 0.234 5 KStar 74.59 81.19 0.814 0.888 0.207 0.135 6 PART 81.85 82.18 0.846 0.869 0.154 0.139 7 ADTree 80.52 84.82 0.896 0.903 0.130 0.124 8 BFTree 77.89 80.20 0.783 0.833 0.188 0.158 9 SimpleCart 80.82 81.85 0.818 0.839 0.158 0.147
Classification performances of nine algorithms under Relief-F filter are
evaluated in Table 4.8 in terms of accuracy, AUC and MSE. The ‘e-’ prefix
represents the value after the feature selection. CAD dataset is first preprocessed by
Relief-F filter, five of the features are eliminated, and new dataset is fed to each
classifier. Then the new Acc, AUC and MSE obtained is named eAcc, eAUC and
eMSE respectively.
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
45
The bold values in Table 4.8 mean that classifier is positively affected from
Relief-F feature selector. Almost all of the classifiers, except RBF, have increased
accuracy of classification. All of them have increased AUC and decreased error
quantities measured by MSE. The most affected classifier is KStar with difference in
accuracy of 6.60%. Other two are MLP and ADTree with difference in accuracy
4.62% and 4.30% respectively. In order to have a total view on comparison of
performances of classifiers affected by Relief-F see Figure 4.4 and Figure 4.5.
Figure 4.4. Accuracies of classifiers with and without Relief-F filter
Figure 4.5. AUCs of classifiers with and without Relief-F filter
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
46
Gain Ratio filter has given the highest score to ‘ca’ feature according to Table
4.9. It ranks and selects features according to their entropy quantities. To have a high
score a feature must have low entropy. It can be thought for decision tree classifiers
having this ranking in their own algorithm. But in that issue, features are ranked, but
not selected. While constructing the tree all features are used without selecting.
Table 4.9. Features selected by Gain Ratio filter No Attribute Evaluation Score 1 ca numeric 0.1741 2 thal {fixed_defect, normal, reversable_defect} 0.1698 3 exang {no, yes} 0.1560 4 thalac numeric 0.1322 5 cp {typ_angina, asympt, non_anginal,
atyp_angina} 0.1176
6 oldpeak {numeric} 0.1053 7 slope {down, flat, up} 0.0903 8 sex {male, female} 0.0656
Gain Ratio is not as successful as Relief-F, at most it does not affect 8
algorithms in accuracy as Relief-F, but it has affected 6 algorithms in accuracy, 5 in
AUC and 4 in MSE.
Table 4.10. Effect of Gain Ratio filter on classification performance Id Method Acc (%) eAcc (%) AUC eAUC MSE eMSE 1 BayesNet 83.50 83.83 0.908 0.898 0.134 0.137 2 MLP 80.20 81.52 0.878 0.894 0.171 0.150 3 RBF 84.16 85.48 0.895 0.892 0.120 0.120 4 IB1 76.23 77.23 0.760 0.770 0.238 0.228 5 KStar 74.59 78.55 0.814 0.859 0.207 0.162 6 PART 81.85 80.53 0.846 0.820 0.154 0.169 7 ADTree 80.52 80.52 0.896 0.881 0.130 0.138 8 BFTree 77.89 78.88 0.783 0.785 0.188 0.178 9 SimpleCart 80.82 80.53 0.818 0.824 0.158 0.159
Gain Ratio has most affected KStar algorithm positively. KStar increased its
accuracy by 3.96%. Other positively affected classifiers have a little improvement.
Second and third most affected algorithms are MLP and RBF with accuracy increase
1.32%. Figure 4.6 and Figure 4.7 presents a visual comparison of Acc and eAcc.
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
47
Figure 4.6. Accuracies of classifiers with and without Gain Ratio filter
Figure 4.7. AUCs of classifiers with and without Gain Ratio filter
According to Table 4.11 Symmetrical Uncertainty filter has selected ‘thal’
feature as the first, ‘ca’ was Gain Ratio’s first feature now second here, and ‘cp’ the
third which was Relief-F’s first feature. This means that although feature selection
algorithms follow very different ways to select features, the decisions are not too far
from each other.
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
48
Table 4.11. Features selected by Symmetrical Uncertainty filter No Attribute Evaluation Score 1 thal {fixed_defect, normal, reversable_defect} 0.1889 2 ca numeric 0.1727 3 cp {typ_angina, asympt, non_anginal,
atyp_angina} 0.1497
4 exang {no, yes} 0.1492 5 thalac numeric 0.1313 6 oldpeak {numeric} 0.1268 7 slope {down, flat, up} 0.1021 8 sex {male, female} 0.0624
Table 4.12. Effect of Symmetrical Uncertainty on classification performance Id Method Acc (%) eAcc (%) AUC eAUC MSE eMSE 1 BayesNet 83.50 83.83 0.908 0.897 0.134 0.137 2 MLP 1.33 80.20 80.53 0.878 0.885 0.171 0.164 3 RBF 1.32 84.16 85.48 0.895 0.892 0.120 0.122 4 IB1 1.33 76.23 77.56 0.760 0.773 0.238 0.225 5 KStar 3.94 74.59 78.55 0.814 0.854 0.207 0.166 6 PART 81.85 80.20 0.846 0.833 0.154 0.166 7 ADTree 80.52 80.52 0.896 0.882 0.130 0.138 8 BFTree 77.89 78.88 0.783 0.788 0.188 0.177 9 SimpleCart 80.82 80.86 0.818 0.832 0.158 0.154
Symmetrical uncertainty has most affected KStar, MLP, IB1 and RBF with
increases of 3.94%, 1.33%, 1.33% and 1.32 respectively. Again, this filter is not as
successful as Relief-F, and has affected 7 algorithms in accuracy, 5 in AUC and 5 in
MSE positively (see Figure 4.8 and Figure 4.9).
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
49
Figure 4.8. Accuracies of classifiers with and without Symmetrical Uncertainy filter
Figure 4.9. AUCs of classifiers with and without Symmetrical Uncertainty filter
It can be concluded that most succesful one of the three filters is Relief-F
feature selector on CAD data. Naturally, different datasets change the most
successful one. Another conclusion is that KStar and MLP is the most affected
classifiers from all three filters positively.
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
50
4.5. Employing Ensemble Methods for Diagnosis of CAD
In this section, four ensemble methods are evaluated and compared according
to their success terms of accuracy, sensitivity, specificity and MAE. These
evaluations are implemented in WEKA. The base classifiers for ensembles are MLP
from neural network, IBk from lazy classifiers, PART from rule based classifiers and
FT from decision trees.
The boosting algorithm is named AdaBoostM1 in WEKA and the results of
‘boosting’ given here is results of AdaBoostM1 in WEKA. The values in Table 4.13,
Table 4.14 and Table 4.15 are wanted to be as high as possible. The bold values in
Table 4.13, Table 4.14 and Table 4.15 represent the highest value of the
corresponding base classifier (involved column).
Table 4.13. Accuracy values of ensemble classifiers using different base classifiers Ensemble Algorithm
Accuracy (%) MLP NaiveBayes IBk PART FT
Bagging 83.50 83.83 75.91 81.19 83.50 Boosting 80.53 83.83 76.24 79.87 81.52 Decorate 80.20 83.17 74.26 80.86 80.53 Rotation Forest (RF) 84.16 79.21 77.23 83.50 84.16
Table 4.14. Sensitivity values of ensemble classifiers using different base classifiers Ensemble Algorithm
Sensitivity MLP NaiveBayes IBk PART FT
Bagging 0.797 0.797 0.717 0.790 0.812 Boosting 0.761 0.790 0.739 0.775 0.790 Decorate 0.797 0.797 0.717 0.754 0.812 RF 0.812 0.790 0.725 0.804 0.812
Table 4.15. Specificity values of ensemble classifiers using different base classifiers Ensemble Algorithm
Specificity MLP NaiveBayes IBk PART FT
Bagging 0.867 0.873 0.794 0.830 0.855 Boosting 0.842 0.879 0.782 0.818 0.836 Decorate 0.806 0.861 0.764 0.855 0.800 RF 0.867 0.794 0.812 0.861 0.867
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
51
Table 4.13 shows that RF ensemble outperforms other ensembles in four of
five base classifiers with the accuracy of 84.16%. This accuracy value is reached by
using MLP and FT as base classifiers. These two classifiers are more successful in
accuracy metric than NaiveBayes, IBk and PART algorithms when used in a RF
ensemble. The results of Table 4.14 and Table 4.15 are parallel with Table 4.13 with
respect to that; the highest values belong to almost the same base classifiers and RF
ensemble.
Figure 4.10. Comparison of performances of ensemble algorithms using different
base classifiers with respect to accuracy.
RF has a considerable success over other ensembles especially according to
Figure 4.10 in terms of accuracy. In the next section, we investigated this ensemble
method intensely with different aspects in order to improve diagnosis accuracy in
CAD data.
4.6. More on Rotation Forest Ensemble and a New Method Proposal for CAD
Diagnosis
In this section, Rotation Forest (RF) ensemble of three separate ANNs based
on Levenberg-Marquardt back-propagation algorithm is proposed and is
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
52
implemented in a MATLAB environment. Each ANN uses a different set of axes.
That is, each feature is considered as an axis, the RF algorithm selects the axes
randomly to rotate according to the algorithm parameter K. Axes are rotated by
principal component analysis (PCA), which is a statistical method for reducing
dimensions with a covariance analysis between features. PCA rotated our data set
into a different configuration, which is easier to classify. In this way, data is simpler;
relationships between features are more discriminative. Using PCA, it is possible to
rotate the axes of our multi-dimensional space to new positions (principal axes). Data
are defined differently, as if they were not the same as before. Using PCA, our aim is
not to reduce dimensions, but rather to rotate the axes in order to define each
example in the data set in a different way. For each neural network, this rotation is
performed with different subsets of features. In other words, each classifier is trained
with the whole data set with different extracted features. Each base classifier also
takes different subsets of instances having the selected features so that diversity, an
important property of ensemble methods, could be achieved. Another contribution to
diversity is that each neural network is created apart from each other, with randomly
chosen initial weights. All principal components are taken into account, so that the
accuracy of the system is not ignored while achieving diversity.
In this study, diversity is provided by three separate techniques in order to
create an ensemble consisting of classifiers that disagree on their predictions. Firstly,
the data set is rotated by the transformation matrix gained by PCA. Secondly, the
base neural network classifiers are constructed including different initial weights.
And finally, each network is trained by using different portions of the training set as
a rule of the RF algorithm.
For the implementation of other compared classification algorithms other than
Levenberg-Marquardt based RF, the WEKA data mining and machine learning
environment (Hall et al., 2009) is utilized. In all experiments, the validation is done
via the 10-folds cross validation method.
First of all, we compare the performance of base classifiers alone in
diagnosing the disease. This comparison is vital in order for us to decide on two
things; 1) what the utmost performance of arbitrary classifiers without Rotation
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
53
Forest algorithm is, and 2) whether the RF algorithm actually improves the
performance of any arbitrary classifier. Table 4.16 presents the classification
performances of several classifiers in diagnosing the disease in terms of accuracy,
AUC, sensitivity, and specificity, where the best values for each performance
measure are marked in bold.
Table 4.16. Classification results of CAD dataset applied in different classifiers.
As presented in Table 4.16, the Levenberg-Marquardt backpropagation
algorithm based ANN structure appeared to be superior to the other methods in terms
of three metrics - Acc, AUC and Sn. Another ANN derivative, RBF Network, and
Naïve Bayes were the close to match to Levenberg-Marquardt in terms of these
classification performance measures. Generally, in our experiment, rule-based
classifiers (OneR) and decision trees (J48, Random Forest) performed relatively
worse as sole classification tools. The performances of ANN derivatives (RBF
Network, Levenberg-Marquardt) and the Naïve Bayes classifier were comparable to
each other but obviously superior to the others in diagnosing CAD.
As a second experiment, we utilized the Rotation Forest algorithm to have an
ensemble of each classification algorithm. The corresponding results of the second
experiment in terms of classification measures are given in Table 4.17.
Algorithm Accuracy AUC Sensitivity Specificity J48 77.89 0.804 0.810 0.836 RBF Network 84.16 0.895 0.812 0.867 Levenberg-Marquardt 85.14 0.903 0.850 0.852 Naïve Bayes 83.83 0.902 0.803 0.867 OneR 71.62 0.716 0.717 0.715 Random Forest 80.20 0.883 0.790 0.812 KStar 74.59 0.814 0.659 0.818
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
54
Table 4.17. Classification results of RF algorithm with different base classifiers RF with Base classifer Accuracy AUC Sensitivity Specificity J48 81.85 0.889 0.775 0.855 RBFNetwork 84.82 0.899 0.783 0.903 Levenberg-Marquardt 91.20 0.915 0.956 0.867 NaiveBayes 79.21 0.876 0.790 0.794 OneR 80.53 0.887 0.739 0.861 RandomForest 82.84 0.902 0.797 0.855 KStar 74.59 0.814 0.659 0.818
When the results in Table 4.17 are compared with those in Table 4.16, it is
clearly seen that the Rotation Forest ensemble algorithm improves the classification
accuracy for almost all the classifiers even though this improvement was not
significant for all the classifiers such as the RBF Network (See Figure 4.11). Most
notably, the performance of Levenberg-Marquardt is increased to a classification
accuracy of 91.20%, which is the best value of all the results. Although Levenberg-
Marquardt appears to be clearly superior in terms of Accuracy and Sensitivity, its
AUC value is still comparable. The ROC curve, and consequently the AUC value, is
highly dependent on how much two class distributions (e.g., patients with and
without disease) are distinguishable via a threshold value. Therefore, the comparable
AUC values with different Accuracy measures mean some of the classifiers are very
sensitive to threshold changes while the others are not as sensitive.
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
55
Figure 4.11. Effect of RF algorithm on different classifiers
As a result of the two experiments, it was observed that Levenberg-Marquardt
was the best classifier with or without RF. However, when it is utilized with RF as a
base classifier, its performance, e.g., Accuracy, is improved to 91.2%, which is an
improvement of 7% above the original classification accuracy. The ROC curve in
Figure 4.12 clearly depicts the classification performances of Levenberg-Marquardt
with and without the Rotation Forest algorithm. The interpretation of ROC curve is
that the classification accuracy is better when the curve tends to be close to the
upper-left corner of the ROC area.
Figure 4.12. ROC analysis of Levenberg-Marquardt algorithm with and without RF
4. RESULTS AND DISCUSSION Esra MAHSERECİ KARABULUT
56
Finally, in order to prove the efficiency of the proposed method (i.e., Rotation
Forest ensemble with Levenberg-Marquardt based ANN), we compare the
performance of the proposed method to those performances of the literature methods
that utilize the same dataset for CAD diagnosis. Table 4.18 presents reported results
of the literature methods in terms of accuracy and their proposed methodologies.
Table 4.18. Classification accuracy results of literature methods that utilize the same dataset
Author (Year) Method Acc (%) Detrano et al. (1989) Logistic Regression 77.0 Cheung (2001) C4.5 81.4 Polat et al. (2005) Artificial Immune System 84.5 Das et al. (2009) ANN Ensemble (SAS Miner) 89.0 Proposed Method Rotation Forest, Levenberg-Marquardt 91.2
The best classification accuracy to date was obtained by Das et al. (2009) at
89.01% accuracy with an averaging ensemble of ANNs. This study proposes a
method that outperforms the highest accuracy achieved thus far, with 91.2%
accuracy by using the Rotation Forest ensemble of Levenberg-Marquardt based
ANNs. Notably, the best two performances in the literature are ensemble algorithms
that utilize some kind of ANNs as base classifiers, and that fact proves the efficiency
of ensemble approach.
5. CONCLUSIONS Esra MAHSERECİ KARABULUT
57
5. CONCLUSIONS
In this thesis, we aimed to improve the accuracy and reliability of diagnosis
of heart failures and to present a computer based approach in diagnosis. Some
decision support systems that are frequently used in medical area are researched
regarding to their performances on coronary artery disease diagnosis. Various
performance criteria are used but especially accuracy rate of diagnosis is respected.
The experimental results obtained in section 4 leads to the conclusion that artificial
neural networks are the most successful method of all other techniques in CAD
diagnosis both as standalone model and in an ensemble form. Of course the
performance of these techniques depends on many factors especially on the dataset
used. But when the literature about medical decision support systems is investigated
neural networks are observed to have a considerable power on test data of the
medical dataset which is firstly seen by the model. Also the choice of
backpropagation algorithm has an important effect on the result.
Feature selection is another important issue in classification, because it may
have a considerable effect on accuracy of the classifier. It reduces the number of
dimensions of the dataset, the processor and memory usage reduce; the data becomes
more comprehensible and easier to study on. In this thesis, we have investigated the
influence of feature selectors of Relief-f, Gain Ratio and Symmetrical Uncertainty on
nine different classifiers using CAD data. We observed that KStar and MLP are the
most affected classifiers; The classification accuracy is improved up to 6.60% and
4.62% by KStar and MLP respectively using Relief-F filter.
When ensemble systems are evaluated it is conceived that the most
considerable ensemble is the rotation forest (RF) which outperformed boosting,
bagging and decorate ensembles. Our research experiments continued with selection
of bests for a combined model. Neural networks, BayesNet and decision trees
accuracy results are 87.50%, 83.50%, 82.51% respectively by using proper
parameters. The backpropagation which is used in neural network is selected as the
Levenberg-Marquardt which outperformed the others. Therefore our model is
achieved by choosing the best components.
5. CONCLUSIONS Esra MAHSERECİ KARABULUT
58
In this thesis, also the performance of the RF ensemble method, whose base
classifiers are ANNs with the Levenberg-Marquardt back propagation algorithm, is
evaluated for the effective diagnosis of CAD. The proposed method is able to
determine the existence of the disease with the data collected noninvasively, easily,
and cheaply from the patient, such as demographic information and blood measures.
In this scheme, the obtained accuracy rate is 91.2%, which is, to the best of our
knowledge, the best rate achieved thus far in the relevant literature. Notably, the
study of Das et al. (2009), which utilized a simple ANNs based ensemble system and
obtained 89.01% accuracy, supports the fact that the Rotation Forest algorithm is
efficient and superior to other ensemble systems (Karabulut and İbrikçi, 2011).
Our experiments also show that RF not only improves the performance of
ANNs but also enhances the classification accuracy of almost all classifiers. This fact
is proven with experiments that include different types of classifiers such as decision
trees (Random Forest, J48), rule-based learners (OneR), instance-based learners
(KStar), ANNs (RBF Network, Levenberg-Marquardt) and Naïve Bayes. Therefore,
in future studies the proposed scheme, i.e., ensemble of ANNs or other classifiers
with Rotation Forest, may be utilized in order to develop efficient expert systems for
the diagnosis of several other diseases.
59
REFERENCES
AHA, D.W., KIBLER, D., ALBERT, M.K., 1991. Instance-based learning
algorithms. Machine Learning, 6: 37-66.
AMASYALI, M.F., ERSOY, O., 2008. The Performance Factors of Clustering Ensembles. Signal Processing, Communication and Applications Conference, 1-4.
ARAUZO, A., AZNARTE, J. L., BENITEZ J. M., 2011. Empirical study of feature
selection methods based on individual feature evaluation for classification
problems. Expert Systems with Applications, 38: 8170-8177.
BASSUK, S.S., MANSON, J.E., 2008. Lifestyle and risk of cardiovascular disease
and diabetes in women. A Review of the epidemiologic evidence. Am J
Lifestyle Med.: 2:191-213.
BEALE, M.H., HAGAN, M.T., DEMUTH, H.B, 2010. Neural Network Toolbox TM
7: User’s Guide, The MathWorks Inc, 7th Edition.
BREIMAN, L., FRIEDMAN, J., OLSHEN, R. A., STONE, J., 1984. Classification
and Regression Trees. Wadsworth International Group, Belmont, California.
BREIMAN, L., 1996. Bagging predictors. Machine Learning, 26: 123-140.
BREIMAN, L., 2001. "Random forests". Machine Learning, 45 (1): 5–32.
BRUMMET, B.H., BAREFOOT, J.C., SIEGLER, I.C., CLAP-CHANNING, N.E.,
LYTE, B.L, BOSSWORD, H.B., WILLIAMS, R.B, MARK D.B., 2001.
Characteristics of socially isolated Patients with coronary artery disease who
are at elevated risk for mortality. psychosomatic Medicine, 63:267-272
CHANDRA, A., CHEN, H., YAO, X., 2006. Trade-off between diversity and
accuracy in ensemble generation. Multi-objective Machine Learning,
Springer Verlag, Heidelberg, pp.429–464.
CHEUNG, N., 2001. Machine learning techniques for medical analysis. School of
Information Technology and Electrical Engineering, B.Sc. Thesis, University
of Queenland.
CLEARY, J.G., TRIGG, L. E., 1995. An Instance-based learner using an entropic
distance measure. In: 12th International Conference on Machine Learning,
108-114
60
COMAK, E., ARSLAN, A., TURKOGLU, İ., 2007. A decision support system based
on support vector machines for diagnosis of the heart valve diseases.
Computers in Biology and Medicine, 37:21-27.
COVER, T.M., THOMAS, J.A., 2006. Elements of Information Theory, 2nd edition.
Wiley-Interscience, Hoboken, 776p.
COWELL, R.G., DAWID A.P., LAURITZEN S.L., SPIEGELHARTER D.J., 1999.
Probabilistic Networks and Expert Systems. Springer, Berlin, 324p.
DAS, R., TÜRKOĞLU, İ., SENGÜR, A., 2009. Effective diagnosis of heart disease
through neural network ensembles. Expert Syst Appl, 36: 7675-7680
DAYHOFF, J.E, DELEO, J.M., 2001. Artificial Neural Networks: Opening the black
box. Factors and Staging in Cancer Management, 91: 1615-1635.
DETRANO, R., JANOSI, A., STEINBRUNN, W., PFISTERER, M., SCHMID, J.,
SANDHU, S., GUPPY, K., LEE, S., FROELICHER, V., 1989. International
application of a new probability algorithm for the diagnosis of coronary artery
disease. Am J Cardiol, 64: 304—310.
DORAISAMY, S., GOLZARI, S., NOROWI, N.M., SULAIMAN, M.N., UDZIR,
N.I., 2008 A study on feature selection and classification techniques for
automatic genre classification of traditional malay music. In Proceedings of
ISMIR. , 331-336.
DRESSLER, D.K., 2010. Management of patients with coronary vascular disorders.
In, Smeltzer S.C., Cheever K.H., Hinkle J..L, Bare B. G. (Eds.). Brunner and
Suddarth's Textbook of Medical-Surgical Nursing. 12th edition, p:775–779.
Philadelphia: USA, Wolters Kluwer Health.
DUDA, O.R., HART, P.E., STORK, D.G., 2006. Pattern Classification. John Wiley
& Sons Inc., U.K. 654 p.
DURSUN, R., 2010. Kadin hastalarda koroner risk faktörleri ve koroner arter
hastaliği varlığı ve ciddiyeti arasındaki ilişki. Kardiyoloji Uzmanlık Tezi,
İstanbul, 55 p.
61
FRANK, E., WITTEN, I.H, 1998. Generating accurate rule sets without global
optimization. In Shavlik, J., ed., Machine Learning: Proceedings of the
Fifteenth International Conference, Morgan Kaufmann Publishers, San
Francisco, CA.
FREUD, Y., MASON, L., 1999. The alternating decision tree learning algorithm.
Proceeding of the Sixteenth International Conference on Machine Learning,
Bled, Slovenia, 124-133.
FREUND, Y., SCHAPIRE, R., 1996. Experiments with a new boosting algorithm. In
Machine Learning: Proceedings Of The Thirteenth International Conference,
148-156.
FRIEDMAN, J., HASTIE, T., TIBSHIRANI, R., 2000. Additive logistic regression:
A statistical view of boosting. Annals of statistics. 28(2):337-407.
FUJITA, H., KATAFUCHI, T., UEHARA, T., NISHIMURA, T., 1992. Application
of artificial neural network to computer aided diagnosis of coronary artery
disease in myocardial SPECT bull's-eye images. J NucI Med 33:272-76.
GUYON, I., ELISSEEFF, A., 2003. An introduction to variable and feature
selection. Journal of Machine Learning Research, 3: 1157-1182.
HADDAD, M., ADLASSNIG, K.P., PORENTA, G 1997. Feasibility analysis of a
case-based reasoning system for automated detection of coronary heart
disease from myocardial scintigrams. Artif Intell Med, 9(1): 61–78.
HALL, M., FRANK, E., HOLMES, G., PFAHRINGER, B., REUTEMANN, P.,
WITTEN, I.H., 2009. The WEKA Data Mining Software: An Update;
SIGKDD Explorations, 11(1).
HALL, M. A., SMITH, L. A., 1999. Feature selection for machine learning:
Comparing a correlation-based filter approach to the wrapper. Proceedings of
the Twelfth International Florida Artificial Intelligence Research Society
Conference, AAAI Press, pp.235-239.
HAN, J., KAMBER, M., 2000. Data Mining Concepts and Techniques. Morgan
Kaufmann Publishers, 1st Ed., San Francisco, USA.
HAYKIN, S., 1999. Neural Networks: A Comprehensive Foundation, Prentice Hall,
USA, 842 p.
62
HEALTHWISE Staff, 2011. E. Gregory Thompson (Primary Medical Reviewer)
http://www.webmd.com/heart-disease/how-a-heart-attack-happens
HERON, M., HOYERT, D.L., MURPHY, S.L., KOCHANEK, K.D., TEJADA-
VERA, B., 2009. Final data for 2006. National Vital Statistics Reports;
Hyattsville, MD: National Center for Health Statistics 57(14).
HOPKINS, P.N., WILLIAMS, R.R., 1989. Human Genetics and Coronary Heart
Disease: A Public Health Perspective. Annu Rev Nutr., 9:303-306.
IŞIK, K., 1986. Acil Kalp Hastalıklarında Teşhis ve Tedavi, Beta Basım Yayım
Dağıtım, İstanbul, 459 p.
JAIN, K.A., MAO, J., MOHUIDDIN, K.M., 1996. Artificial Neural Networks: A
Tutorial. Theme Feature, 29: 31-44.
JOHN, H.J., LANGLEY P., 1995. Estimating Continuous Distributions in Bayesian
Classifiers. Proceedings of the Eleventh Conference on Uncertainty in
Artificial Intelligence. pp. 338-345. Morgan Kaufmann, San Mateo
KANTARDZIC, M., 2002. Data Mining: Concepts, Models, Methods and
Algorithms. John Wiley & Sons Inc., New York. 360 p.
KARABULUT, E., İBRİKÇİ, T., 2010. Birleştirilmiş Yapay Sinir Ağlarıyla
Parkinson Hastalığı Teşhisi, ELECO, Bursa.
KARABULUT, E., İBRİKÇİ, T., 2011. Effective Diagnosis of Coronary Artery
Disease Using The Rotation Forest Algorithm. Journal of Medical Systems
36(3):1831-1840.
KISI, O., AND UNCUOGLU, E., 2005. Comparison of three backpropagation
training algorithms for two case studies. Indian J Eng Mat Sci 12:434–442.
KORKMAZ, E.,1997. Kardiyovasküler risk fakyörlerinin değitirilmesi yönünde
yapılaca girişimler ve bunların etkinliği.. İlaç ve Tedavi, 10(6):331-341.
KULLER, L., FISHER, L., MCCLELLAND, R., FRIED, L., CUSHMAN, M.,
JACKSON, S., MANOLIO, T., 1998. Differences in prevalence of and risk
factors for subclinical vascular disease among black and white participants in
the Cardiovascular Health Study. Arterioscler Thromb Vasc Biol.,
18(2):28393.
63
KUNCHEVA, L., 2004. Combining Pattern Classifiers Methods and Algorithms,
Wiley-Interscience, 360 p.
LANDWEHR, N., HALL, M., FRANK, E., 2005. Logistic Model Trees. Machine
Learning, 95:161-205.
LEWENSTEIN, K., 2001. Radial basis function neural network approach for the
diagnosis of coronary artery disease based on the standard electrocardiogram
exercise test. Med Biol Eng Comput. 39(3):362-369.
LIU, K., HUANG, D., 2008. Cancer classification using rotation forest. Computers
in Biology and Medicine, 38: 601-610.
MARQUARDT, D., 1963. Journal of the Society for Industrial and Applied
Mathematics. J Soc Ind Appl Math, 11(2):431–441.
MINSKY, M., 1961. Steps toward artificial intelligence. Proceedings of the Institute
of Radio Engineers, 49:8-30.
MITCHELL, T.M., 1997. Machine Learning. WCB/McGrawHill, Boston, 414 p.
MOLLER M.F., 1993. A scaled conjugate gradient algorithm for fast supervised
learning. Neural Networks 6(4): 525-533.
MOBLEY, B.A., SCHECHTER, E., MOORE, W., MCKEE P.A., EICHNER, J.E.,
1999. Predictions of coronary artery stenosis by artificial neural network.
Artificial Intelligence in Medicine, 18:187-203.
NEWMAN, D.J., HEITTECH, S., BLAKE, C.L., MERZ, C.J., 1998. UCI Repository
of machine learning databases. University California Irvine, Department of
Information and Computer Science.
NHLBI National Heart Lung and Blood Institute, 2011.
http://www.nhlbi.nih.gov/health/health-topics/topics/cad/
NOVAKOVIC, J., 2010. The Impact of Feature Selection on the Accuracy of Naive
Bayes Classifier. 18th Telecommunications forum TELFOR.
ONAT, A., BÜYÜKÖZTÜRK K., SANSOY, V., AVCI, Ş.G., ÇAM, N., AKGÜN,
G., TOKGÖZOĞLU, L., ÇAĞLAR, N., ŞAN, M., NIŞANCI, Y., OTO, A.,
ERGENE, O., 2002. Türk Kardiyoloji Derneği Koroner Kalp Hastalığı,
Korunma ve Tedavi Kılavuzu. http://www.tkd.org.tr/kilavuz/k11/4e423.htm
64
OPITZ, D., MACLIN, R., 1999. Popular Ensemble Methods: An Empirical Study.
Journal of Artificial Intelligence Research, 11: 169-198
PAULIN, F., AND SANTHAKUMARAN, A., 2011. Classification of breast cancer
by comparing back propagation training algorithms. Int J Comput Sci Eng
(IJCSE), 3(1):327–332.
POLAT, K., SAHAN, S., KODAZ, H., GÜNES, S., 2005. A new classification
method to diagnosis heart disease: Supervised artificial immune system
(AIRS). Proceedings of the Turkish Symposium on Artificial Intelligence and
Neural Networks (TAINN)
POWELL, M.J.D., 1977. Restart Procedures for the conjugate gradient method.
Mathematical Programming, 12: 241-254.
PERREAULT, L., METZGER, J., 1999. A pragmatic framework for understanding
clinical decision support. Journal of Healthcare Information Management,
13(2):5-21.
QUINLAN, J. R., 1993. C4.5: Programs For Machine Learning. Morgan Kaufmann,
Los Altos, 299p.
RIEDMILLER, M., BRAUN, H., 1993. A Direct Adaptive Method for Faster
Backpropagation Learning: The RPROP Algorithm. IEEE International
Conference On Neural Networks, 586-591.
RISSANEN A.M., 1979. Familial aggregation of coronary heart disease in a high
incidence area. Br Heart J., 42(3):294-303.
RODRIGUEZ, J.J., KUNCHEVA, L.I., 2007. An Experimental Study on Rotation
Forest. Proceedings of the 7th international conference on Multiple classifier
systems (MCS'07), Berlin, Heidelberg, 459-468.
RODRIGUEZ, J. J., KUNCHEVA L. I., ALONSO, C.J., 2006. Rotation Forest: A
New Classifier Ensemble Method. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 28 (10): 1619-1630
RUMELHART, D. E., HINTON, G. E., WILLIAMS, R. J., 1986. Learning internal
representations by error propagation. In: Parallel distributed processing:
explorations in the microstructure of cognition, vol. 1., MIT Press,
Cambridge, pp 318–362.
65
SCHAPIRE, R., 1990. The Strength of Weak Learnability. Machine Learning
5(2):197-227.
SCOTT, J.A., AZIZ, K., YASUDA, T., GEWIRTZ, H., 2004. Integration of clinical
and imaging data to predict the presence of coronary artery disease with the
use of neural networks. Coron. Artery Dis., 15(7): 427–434.
SETIAWAN, N.A., VENKATACHALAM, P.A., HANI, A.F.M., 2009. Diagnosis of
Coronary Artery Disease Using Artificial Intelligence Based Decision
Support System. ICoMMS, Penang, Malaysia.
SHANNON, C.E., 1948. A Mathematical Theory of Communication. The Bell
System Technical Journal, 27: 379-423, 623-656.
SIERRA, B., SERRANO, N., LARRANAGA, P., PLASENCIA, E. J., INZA, I.,
JIMENEZ, J. J., REVUELTA, P., and MORA, M. L., 2001. Using Bayesian
networks in the construction of a bi-level multi-classifier. A case study using
intensive care unit patients data. Artif Intell Med, 22: 233-48.
SWETS, J.A., 1979. ROC Analysis Applied To The Evaluation Of Medical Imaging
Techniques. Investigation Radiology, 14:109-121.
TANNER, L., SCHREIBER, M., LOW J., ONG, A., TOLFVENSTAM, T. et al.
2008. Decision tree algorithms predict the diagnosis and outcome of dengue
fever in the early phase of illness. PLoS Negl Trop Dis 2(3): e196.
doi:10.1371/journal.pntd.0000196
TEXAS Heart Institute, 2011.
http://texasheart.org/HIC/Topics/Cond/CoronaryArteryDisease.cfm
TKACZ, E. J., KOSTKA, P., 2000. An application of wavelet neural network for
classification patients with coronary artery disease based on HRV analysis.
Proceedings of the Annual International Conference on IEEE Engineering in
Medicine and Biology, 1391–1393.
TSIPOURAS, M. G., EXARCHOS T. P., FOTIADIS D. I., KOTSIA A. P.,
VAKALIS K. V., NAKA K.K., MICHALIS L. K., 2008. Automated
diagnosis of coronary artery disease based on data mining and fuzzy
modeling. IEEE Trans. Information Technology in Biology, 12(4): 447–457.
66
TURKOGLU, I., ARSLAN., ILKAY, E., 2003. A wavelet neural network for the
detection of heart valve diseases. Expert Systems, 20(1): 1-7.
VONGKUNGHAE, A., AND CHUMTHONG, A., 2007. The performance
comparisons of backpropagation algorithm’s family on a set of logical
functions. ECTI Transactions on Electrical Eng Electronics and
Communications (ECTEEC), 5(2):114–118.
WANG, Y., MAKEDON, F., 2004. Application of Relief-F feature filtering
algorithm to selecting informative genes for cancer classification using
microarray data. In Proc. IEEE Computational Systems Bioinformatics
Conference, Stanford, California, 497-498.
WHO World Health Organization, 2011.
http://www.who.int/mediacentre/factsheets/fs317/en/index.html
YAN, H., JIANG, Y., ZHENG, J., PENG, C., LI, Q., 2006. A multilayer perceptron-
based medical decision support system for heart disease diagnosis. Expert
Syst. Appl., 30(2): 272-281.
67
CURRICULUM VITAE
She was born on June 20th, 1980 in Gaziantep, Türkiye. She received her BSc
degree in Karadeniz Technical University Computer Engineering Department in
Trabzon in 2002. She has been working as an instructor in Gaziantep University
Vocational School of Higher Education department since 2003.