breast cancer diagnosis via neural network classification jing jiang may 10, 2000

12
Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

Upload: augusta-robbins

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

Breast Cancer Diagnosis via Neural Network Classification

Jing Jiang

May 10, 2000

Page 2: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

Outline

• Introduction and Motivation

• K-mean, k-nearest neighbor and maximum likelihood classification

• Back propagating multi-layer perceptron

• Support vector machine (SVM)

• Learning vector quantization (LVQ)

• Linear programming

Page 3: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

Introduction and Motivation

• The data file contains the 30 attributes of both benign and malignant fine needle aspirates (FNAs).

• Our goals are to find a discriminating function to determine if an unknown sample is benign or malignant and choose a pair of the 30 attributes which will be used in diagnosis.

• Linear program has done a good job in solving this problem.

• We expect the neural network classification algorithms can be useful for this problem.

Page 4: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

K-mean

• First we use k-mean algorithm to find the cluster of the training data set.

• K-mean algorithm doesn’t give up the discriminating function

Page 5: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

KNN and ML

• For 100 nearest neighbors we have,

• For 20 nearest neighbors we have

• For maximum likelihood algorithm we have,

20.94_

143

151

rateC

and

Cmat

65.95_

152

151

rateC

and

Cmat

36.75_

017

052

rateC

and

Cmat

Page 6: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

BP-MLP

• After careful choice of network parameters, we get the same Cmat and C-rate for the 30 attribute and any 2 attribute problem.

• It is interesting to note they are the same as the result we get for ML method

• The low classification rate can be due to the fact that the data is not linearly separable.

62.75_

017

052

rateC

and

Cmat

Page 7: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

Support Vector Machine

• For attribute 1 and 23, we have 6 errors in the testing.

• For attribute 14 and 28, we have 8 errors in testing.

• It takes a long time to train a SVM for the 30 attribute problem, even for 2 attribute, it is time consuming too.

Page 8: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

LVQ

• While using LVQ for attribute 1 and 23, the number of errors is 8.

• For attribute 14 and 18, we have 25 errors.

• The training is faster than SVQ, but so far we are only able to handle the 2 attribute problem, not a 30 attribute problem.

Page 9: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

LVQ Training data and Weights

Page 10: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

Linear Program

• The algorithm used is similar to SVM, but simpler.

• We device a separation plane and try to minimize the error.

• For 30 attribute we have only 3 errors

• For 2 attribute, the best combinations give 2 errors.

Page 11: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

Linear Program

Page 12: Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000

Conclusion

• We tried various neural network classification algorithm. It seems as far the simpler linear programming gives a better result. More exploration need to be done.

• BP is not very good at dealing with non-separable data.

• SVM is a good candidate, but takes a long time to train.

• LVQ is comparable with SVM.

• An question remain to be answered, why the maximum likelihood method give the same result as the BP.