breast cancer diagnosis via neural network classification jing jiang may 10, 2000
Post on 04-Jan-2016
213 Views
Preview:
TRANSCRIPT
Breast Cancer Diagnosis via Neural Network Classification
Jing Jiang
May 10, 2000
Outline
• Introduction and Motivation
• K-mean, k-nearest neighbor and maximum likelihood classification
• Back propagating multi-layer perceptron
• Support vector machine (SVM)
• Learning vector quantization (LVQ)
• Linear programming
Introduction and Motivation
• The data file contains the 30 attributes of both benign and malignant fine needle aspirates (FNAs).
• Our goals are to find a discriminating function to determine if an unknown sample is benign or malignant and choose a pair of the 30 attributes which will be used in diagnosis.
• Linear program has done a good job in solving this problem.
• We expect the neural network classification algorithms can be useful for this problem.
K-mean
• First we use k-mean algorithm to find the cluster of the training data set.
• K-mean algorithm doesn’t give up the discriminating function
KNN and ML
• For 100 nearest neighbors we have,
• For 20 nearest neighbors we have
• For maximum likelihood algorithm we have,
20.94_
143
151
rateC
and
Cmat
65.95_
152
151
rateC
and
Cmat
36.75_
017
052
rateC
and
Cmat
BP-MLP
• After careful choice of network parameters, we get the same Cmat and C-rate for the 30 attribute and any 2 attribute problem.
• It is interesting to note they are the same as the result we get for ML method
• The low classification rate can be due to the fact that the data is not linearly separable.
62.75_
017
052
rateC
and
Cmat
Support Vector Machine
• For attribute 1 and 23, we have 6 errors in the testing.
• For attribute 14 and 28, we have 8 errors in testing.
• It takes a long time to train a SVM for the 30 attribute problem, even for 2 attribute, it is time consuming too.
LVQ
• While using LVQ for attribute 1 and 23, the number of errors is 8.
• For attribute 14 and 18, we have 25 errors.
• The training is faster than SVQ, but so far we are only able to handle the 2 attribute problem, not a 30 attribute problem.
LVQ Training data and Weights
Linear Program
• The algorithm used is similar to SVM, but simpler.
• We device a separation plane and try to minimize the error.
• For 30 attribute we have only 3 errors
• For 2 attribute, the best combinations give 2 errors.
Linear Program
Conclusion
• We tried various neural network classification algorithm. It seems as far the simpler linear programming gives a better result. More exploration need to be done.
• BP is not very good at dealing with non-separable data.
• SVM is a good candidate, but takes a long time to train.
• LVQ is comparable with SVM.
• An question remain to be answered, why the maximum likelihood method give the same result as the BP.
top related