Prediction of Sickle Cell Anemia Patient’s Response to Hydroxyurea Treatment Using ARTMAP Network
Hongyu Xu, Faramarz Valafar, Marko Vuskovic
Department of Computer ScienceSan Diego State UniversitySan Diego, CA 92182-7720
METMBS 2003
Las Vegas, June 24, 2003
The full paper and these slides are available at:
http://medusa.sdsu.edu/Robotics/Neuromuscular Control/Neuromuscular.htm
2
Contents
• Sickle cell anemia• Data and data preprocessing• Linear dependency of features• Feature selection• Data labeling• MART clustering algorithm• MART classification algorithm• Results• Conclusion
3
Sickle Cell Anemia
• Sickle cell anima is a genetic disorder, caused by single point mutation in the beta globin gene that changes from CCTGAGG to CCTGTGG.
• The molecules of sickle cell hemoglobin adhere to each other and distort red blood cells (RBC) into sickle shape . They stick in narrow blood vessels, blocking the flow of blood.
• Sickle cell patients experience severe painful crises. Many sickle cell patients die before the age of 20.
• In the United States, about 1 in 500 African Americans develops sickle cell anima [5]. In Africa, about 1 in 100 individuals develop the disease.
• In 1983, a drug called hydroxyurea (HU) was first used on sickle cell patients.
• The patients who responded to HU treatment positively experienced less pain and their life span were prolonged, but HU can also be quite toxic.
4
Patient Features1 Age Age of patient (in days) 2 Sex Male/Female 3 NAGG Globin gene number 4 SBAN # of BAN haplotypes 5 SBEN # of BEN haplotypes 6 SCAM # of CAM haplotypes 7 SSEN # of SEN haplotypes 8 TotalTx 9 WGT Weight of patient
10 %HbF Fetal hemoglobin, as % of total hemoglobin
11 HbF Fetal Hemoglobin, absolute value
12 Hb Total hemoglobin concentration
13 RBC Red blood cell count
14 RDW % Variation in the size of red cells
15 PCV Packed cell volume 16 Retic Reticulocytes 17 MCV Mean cell volume 18 MCH Mean cell hemoglobin 19 WBC White cell count
20 Polys Polymorph nuclear leukocytes
21 Plats Platelet count 22 Bili Bilirubin concentration
23 SNRBC Nucleated RBC seen in peripheral blood
24 %fHbF Max. percentage of HbF 25 fHbF Maximum value of HbF
Note: The data used in this research is obtained from University of Georgia, Structural Genomics Group. Dr. Homayoun Valafar was responsible for the data collection and preprocessing.
5
Excerpt from patient’s data
Age 1.1940 1.2966 0.9639 0.7705 1.0031 1.0165 1.2163 0.9955 . . . Sex 0.0001 0.0001 0.0002 0.0002 0.0002 0.0002 0.0002 0.0001 . . .
NAGG 0.0003 0.0003 0.0004 0.0003 0.0004 0.0004 0.0004 0.0004 . . . SBAN 0.0002 0.0001 0.0002 0.0001 0.0001 0.0003 0.0002 0.0002 . . . SBEN 0.0002 0.0003 0.0002 0.0003 0.0003 0.0001 0.0002 0.0001 . . . SCAM 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 . . . SSEN 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0002 . . . TTx 0.0093 0.0014 0.0024 0.0009 0.0029 0.0019 0.0075 0.0024 . . . WGT 0.0049 0.0079 0.0059 0.0060 0.0060 0.0061 0.0060 0.0059 . . . HbF 0.0001 0.0006 0.0004 0.0004 0.0007 0.0002 0.0010 0.0004 . . . Hb 0.0007 0.0008 0.0009 0.0008 0.0010 0.0009 0.0010 0.0009 . . . RBC 0.0002 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 . . . RDW 0.0022 0.0019 0.0021 0.0022 0.0025 0.0025 0.0022 0.0017 . . . PCV 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 . . . Retic 0.0288 0.0326 0.0271 0.0121 0.0285 0.0367 0.0590 0.0877 . . . MCV 0.0094 0.0086 0.0091 0.0081 0.0089 0.0091 0.0099 0.0093 . . . MCH 0.0031 0.0030 0.0030 0.0027 0.0029 0.0031 0.0034 0.0032 . . . WBC 0.0018 0.0010 0.0011 0.0011 0.0010 0.0018 0.0012 0.0015 . . . Polys 0.0010 0.0006 0.0006 0.0008 0.0006 0.0012 0.0006 0.0011 . . . Plats 0.0576 0.0842 0.0432 0.0454 0.0590 0.0478 0.0270 0.0247 . . . Bili 0.0004 0.0006 0.0003 0.0002 0.0002 0.0004 0.0003 0.0004 . . . SNBRC 0.0001 0.0002 0.0001 0.0009 0.0001 0.0001 0.0003 0.0006 . . . Class 0.0002 0.0002 0.0002 0.0002 0.0001 0.0001 0.0002 0.0001 . . .
1.0e+004 *
6
Data Preprocessing
• Normalization• Log transformation• Treatment of incomplete features
7
Patient Data (after log transform)Age 9.3877 9.4702 9.1737 8.9498 9.2135 9.2268 9.4062 9.2059 . . . Sex 0.6931 0.6931 1.0986 1.0986 1.0986 1.0986 1.0986 0.6931 . . . NAGG 1.3863 1.3863 1.6094 1.3863 1.6094 1.6094 1.6094 1.6094 . . . SBAN 1.0986 0.6931 1.0986 0.6931 0.6931 1.3863 1.0986 1.0986 . . . SBEN 1.0986 1.3863 1.0986 1.3863 1.3863 0.6931 1.0986 0.6931 . . . SCAM 0.6931 0.6931 0.6931 0.6931 0.6931 0.6931 0.6931 0.6931 . . . SSEN 0.6931 0.6931 0.6931 0.6931 0.6931 0.6931 0.6931 1.0986 . . . TTx 4.5433 2.7081 3.2189 2.3026 3.4012 2.9957 4.3307 3.2189 . . . WGT 3.9080 4.3820 4.0983 4.1076 4.1109 4.1239 4.1109 4.0882 . . . HbF 0.7419 1.8871 1.5476 1.6292 2.0919 0.9555 2.3514 1.5261 . . . Hb 2.0669 2.1633 2.3224 2.2192 2.3795 2.3321 2.3609 2.2824 . . . RBC 1.1725 1.2782 1.4061 1.4110 1.4793 1.3813 1.3376 1.3191 . . . RDW 3.1369 2.9857 3.1046 3.1224 3.2542 3.2619 3.1369 2.9014 . . . PCV 0.1906 0.2013 0.2476 0.2231 0.2631 0.2414 0.2453 0.2263 . . . Retic 5.6650 5.7909 5.6058 4.8032 5.6549 5.9067 6.3820 6.7774 . . . MCV 4.5570 4.4648 4.5250 4.4018 4.4976 4.5250 4.6052 4.5433 . . . MCH 3.4626 3.4243 3.4275 3.3142 3.3945 3.4689 3.5610 3.4995 . . . WBC 2.9444 2.3702 2.5096 2.5014 2.4423 2.9178 2.5337 2.8034 . . . Polys 2.3721 1.8764 1.9459 2.1668 1.9125 2.5703 1.9502 2.4723 . . . Plats 6.3578 6.7370 6.0707 6.1203 6.3818 6.1717 5.6021 5.5134 . . . Bili 1.6094 1.9459 1.4110 1.0986 0.9555 1.6487 1.2809 1.6292 . . . SNBRC 0.6931 1.0986 0.6931 2.3026 0.6931 0.6931 1.3863 1.9459 . . . Class 2.0000 2.0000 2.0000 2.0000 1.0000 1.0000 2.0000 1.0000 . . .
(1) (1) (1)1 2(2) (2) (2)1 2
1 2
( ) ( ) ( )1 2
D
DD
N N ND
x x x
x x x
x x x
X ξ ξ ξ
(1)
1ξ 2ξ Dξ
Columns iξ , jξ and kξ , are linearly dependent if
i i j j k ka a a ξ ξ ξ 0 (2) or
1
,D
i ii
a
ξ Xa 0 (3)
where
1 2[ , ,..., ] , 0, , ,TD ma a a a m i j k a
Linear Dependency of Features
Equation Xa 0 is equivalent to
1, or
1T
N
X Xa 0 Sa 0
(4)
where S = cov(X) (suppose X has zero means without loss of generality) Condition (4) is fulfilled if
( )eigenvectora S for which
( ) 0eigenvalue S
Linear Dependency of Features (Cont.)
10
0.00000000000000
0.00256150340499
0.01333767987431
0.02706030811732
0.24557311804605
0.35250509014528
0.35928764793261
0.42623565749994
0.50573961196044
( ) eigenvalue S
0.58261174912046
0.64043036275371
0.71088964216252
0.94270671457188
1.01103697465426
1.05724628013530
1.19139596505584
1.30377846956195
1.60639276715028
1.85530156051614
2.01212049592360
2.63302188860452
4.52076651280861
1
0.00000000000000
0.00000000000000
0.00000000000000
0.62565943239141
0.62048464459037
0.23181406284457
0.41208169184607
0.00000000000000
0.00000000000000
0.00000000000000
0.00000000000003( )
0.000000eigenvector S
00000000
0.00000000000000
0.00000000000003
0.00000000000000
0.00000000000002
0.00000000000002
0.00000000000000
0.00000000000000
0.00000000000000
0.00000000000000
0.00000000000000
SBAN
SBEN
SCAM
SSEN
11
0.00000000000000
0.00256150340499
0.01333767987431
0.02706030811732
0.24557311804605
0.35250509014528
0.35928764793261
0.42623565749994
0.50573961196044
( ) eigenvalue S
0.58261174912046
0.64043036275371
0.71088964216252
0.94270671457188
1.01103697465426
1.05724628013530
1.19139596505584
1.30377846956195
1.60639276715028
1.85530156051614
2.01212049592360
2.63302188860452
4.52076651280861
0.00256126489588
0.01332903873712
0.02701549513325
0.16384841398602
0.25413974178433
0.35323111291620
0.37258676969163
0.42642124431095
0.51885189369743
0.58642282297788
( ) 0.70543828114eigenvalue S 615
0.71850896886954
0.94469163213133
1.02193628486286
1.14660867183184
1.27633481113233
1.59635080902170
1.82624383380192
1.98186706263660
2.54312053824721
4.52049130818781
Beforeremoval:
After removal:
12
Feature Selection
Not all features carry enough information. Need to select the more important ones, to reduce the dimension of the feature space.
1 1 ( )
( )
T
TM M
z
z
v x x
v x x
Observation: If elements corresponding to a certain index, , of all
iv (1 i M ) are very small, then x x will contribute very little to the new pattern z . In that case the feature value x carries very little information and the feature type can be dropped from analysis.
13
Feature Selection (Cont.)
Steps for feature selection:
1. Calculate the eigenvalues and eigenvectors ofX . 2. Compute the relative eigenvalues. 3. Select m most important eigenvectors. 4. Scale the selected m eigenvectors with their relative
eigenvalues. 5. Sum up the magnitudes of the corresponding elements of
scaled eigenvectors. 6. Sort and select features.
Note: This method works well in some cases, but not always, like PCA.
14
Data Labeling
For supervised training and classification data have to be labeled. Here are used two approaches:
Double-rule: If the final HbF is increased at least two times over the initial value of HbF, the patient is labeled as a responder
15
Data Labeling (Cont.)
15 percent rule: If the final %HbF is over 15% while initial value of %HbF is under 15% , the patient is labeled as a responder.
16
Representation of Patient’s Data in Reduced Feature Space
Plot along three most significant dimensions
(o : non-responders, + : responders)
Double rule
17
Representation of Patient’s Data in Reduced Feature Space (Cont.)
(o : non-responders, + : sponders)
15% rule
18
Approaches in Pattern Recognition
Bayes’ Classifier
Bayes’ Classifier
Neural networks
Neural networks
Single layer
Perceptrons
Single layer
Perceptrons
Probability density estimation
Parzen window
Multilayer Perceptro
ns
Multilayer Perceptro
ns
K-nearest neighbor
Mixture model
Feed forward
Feed forward
ART ART
Recurrent
Recurrent
Basis functions
Maximum liklihood
Bayesian inference
Pattern recognition
Pattern recognition
MART MART
K-mean
SOM
Mixture model
Radial Basis
Function
Radial Basis
Function
19
ART Networks
Grossberg, 1976Unsupervised ART Learning
Fuzzy ARTCarpenter, Grossberg, etal,1991
ARTMAP
Carpenter, Grossberg, etal,1991
Fuzzy ARTMAP
Carpenter, Grossberg, etal,1991
Gaussian ARTMAP
Williamson,1992
ART1, ART2Carpenter &
Grossberg, 1987
Supervised ART Learning
Simplified ART
Baraldi and Alpaydin, 1998
Simplified ARTMAP
Kasuba, 1993
Mahalanobis distance based
ARTMAP
Vuskovic & Du, 2001
Vuskovic, Xu & Du, 2002
20
MART clustering Algorithm
C = {}, L= {}, W= {}, M=0, N= {}; % Initialize network resources while (X not empty) % Learning loop { get x; % Get a labeled pattern from X
new = true; % Set flag “new node needed”
if ( ( )label Cx ) % If the category hasn’t been seen so far
: ( )C C label x ; % add new category to C else { loop j = 1,m
{ if ( ( ) ( )Jlabel labelw x ) % For templates with the same label as x
( , , , )jt T j x w Q ; % computer activation function }
argmin ;j
j mJ t
% Find the closest template for x
if ( Jt )
{ ( , , ) : ( , , )J J J J J JN U Nw Q w Q ; % update the template new = false % Set flag " no new node needed” }
} NEWNODE(new); }
21
MART clustering Algorithm (Cont.)
macro routine: NEWNODE(new) if new == true % If the flag of “new node needed” set {
: 1m m ; % Increment the count of templates : 1m mN N ; % Increase the size of the new template
m w x ; % Initialize the new template with the pattern
: mW W w ; % Add new template to the template set
0: Q Q Q ; % Add new 0Q to the Q template
( );mL label x % Recode the label of the new template in L }
22
MART Functions
Activation fun. : ( ) ( )( , , , )T
j j j jt T j= = - -x wQ x w Q x w
Mahalanobis distance from x to jw
Match function: j jm t match function is the same as activation function
Resonance: 2; ( , )jm d p resonance happens when jt less than vigilance
Update fun. : ( ): 1j jb b= - +w w x learning rule of template j
2
:T
j jj j j
jTb
b
æ ö÷ç ÷= -ç ÷ç ÷÷ç +è ø
ggQ Q learning rule of inverse covariance matrix of
template j
where:
1 21
1 1 1, ,
1 2 jjN
, ( )j j j g Q x w
23
MART Classification Algorithm
12 1
2/ 2
( | )(2 )
jtj
D
Qp j e
x
The trained network is a Gaussian mixture model. Each class maps to one or more clusters. The class probability is proportional to the sum of posterior probabilities of individual clusters of the same class. The prediction is class that yields the maximum class probability.
Class conditional pdf of x given
cluster j
Prior probability of cluster j
1
( ) jj M
ii
NP L
N
( )
arg max ( | ) arg max ( | ) ( )k k j k
class P k p j P j
x x
Posterior probability
24
Results
15% Rule Double Rule
# of Features 19 8 19 3
Accuracy of Predicting Responders
84.4% (38/45)
68.89% (31/45)
100% (63/63)
96.82% (61/63)
Accuracy of Predicting Non-Responders 45.16%
(14/31) 67.74% (21/31)
5% (1/18)
77.77% (14/18)
Global Hit Rate 68.42% 68.42% 79.01% 92.59%
# of Output Nodes 3.50 4.20 3.32 2.06
(These results were obtained using leave-one-out approach. Each time a pattern is left out for testing, the rest is used for training. The procedure is repeated until all patterns have been tested. The results are averaged over the entire data set)
25
Conclusion
• MART has shown superior performance in various benchmarks, which has inspired us to apply MART to sickle cell anemia patients data.
• MART achieved 96.82% accuracy for predicting responders to HU treatment and give 92.59% global accuracy.
• Removal of linear dependency of features has improved the numerical stability of the algorithms.
• Reduction of the feature space from 23 to only 3 features has considerably improved the performance (decreased the numerical complexity and even increased the accuracy)
• In the future we plan to explore other labeling methods.
• We also plan to investigate more data preprocessing methods, which include both linear and nonlinear transformations.