classification of microarray data. sample preparation hybridization array design probe design...
Post on 20-Dec-2015
223 views
TRANSCRIPT
![Page 1: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/1.jpg)
Classification of Microarray Data
![Page 2: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/2.jpg)
Sample PreparationHybridization
Array designProbe design
QuestionExperimental Design
Buy Chip/Array
Statistical AnalysisFit to Model (time series)
Expression IndexCalculation
Advanced Data Analysis
Clustering PCA Classification Promoter Analysis
Meta analysis Survival analysis Regulatory Network
Normalization
Image analysis
The DNA Array Analysis Pipeline
ComparableGene Expression Data
![Page 3: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/3.jpg)
Outline
• Example: Childhood leukemia
• Classification– Method selection
– Feature selection
– Cross-validation
– Training and testing
• Example continued: Results from a leukemia study
![Page 4: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/4.jpg)
Childhood Leukemia
• Cancer in the cells of the immune system
• Approx. 35 new cases in Denmark every year
• 50 years ago – all patients died
• Today – approx. 78% are cured
• Risk groups:
– Standard, Intermediate, High, Very high, Extra high
• Treatment
– Chemotherapy
– Bone marrow transplantation
– Radiation
![Page 5: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/5.jpg)
Prognostic Factors
Good prognosis Poor prognosis
Immuno-phenotype precursor B T
Age 1-9 ≥10
Leukocyte count Low (<50*109/L) High (>100*109/L)
Number of chromosomes Hyperdiploidy (>50)
Hypodiploidy (<46)
Translocations t(12;21) t(9;22), t(1;19)
Treatment response Good response:Low MRD
Poor response:High MRD
![Page 6: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/6.jpg)
Prognostic factors:
ImmunophenotypeAgeLeukocyte countNumber of chromosomesTranslocationsTreatment response
Risk Classification Today
Risk group:
StandardIntermediateHighVery highExtra High
Patient:
Clinical data
Immuno-phenotype
Morphology
Genetic measurements
Microarraytechnology
![Page 7: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/7.jpg)
Study of Childhood Leukemia
• Diagnostic bone marrow samples from leukemia patients
• Platform: Affymetrix Focus Array– 8793 human genes
• Immunophenotype– 18 patients with precursor B immunophenotype
– 17 patients with T immunophenotype
• Outcome 5 years from diagnosis– 11 patients with relapse
– 18 patients in complete remission
![Page 8: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/8.jpg)
Reduction of data
• Dimension reduction– PCA
• Feature selection (gene selection)– Significant genes: t-test
– Selection of a limited number of genes
![Page 9: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/9.jpg)
Microarray DataClass precursorB T ...
Patient1 Patient2 Patient3 Patient4 ...
Gene1 1789.5 963.9 2079.5 3243.9 ...
Gene2 46.4 52.0 22.3 27.1 ...
Gene3 215.6 276.4 245.1 199.1 ...
Gene4 176.9 504.6 420.5 380.4
Gene5 4023.0 3768.6 4257.8 4451.8
Gene6 12.6 12.1 37.7 38.7
... ... ... ... ... ...
Gene8793 312.5 415.9 1045.4 1308.0
![Page 10: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/10.jpg)
Outcome: PCA on all Genes
![Page 11: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/11.jpg)
Outcome Prediction: CC against the Number of Genes
-0.1
0.1
0.3
0.5
0.7
0.9
0 20 40 60 80 100 120 140
Number of top ranking genes
Co
rrel
atio
n c
oef
fici
ent
(CC
)
Nearest centroid
![Page 12: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/12.jpg)
• Assumptions:– Data e.g.. Gaussian distributed
– Variance and covariance the same for all classes
• Calculation of class probability
Linear discriminant analysis
![Page 13: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/13.jpg)
• Calculation of a centroid for each class
• Calculation of the distance between a test sample and each class centroid
• Class prediction by the nearest centroid method
Nearest Centroid
kCj kijik nxx /
![Page 14: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/14.jpg)
• Based on distance measure– For example Euclidian distance
• Parameter k = number of nearest neighbors– k=1– k=3– k=... – Always an odd number
• Prediction by majority vote
K-Nearest Neighbor
![Page 15: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/15.jpg)
Support Vector Machines
• Machine learning
• Relatively new and highly theoretic
• Works on non-linearly separable data
• Finding a hyperplane between the two classes by minimizing of the distance between the hyperplane and closest points
![Page 16: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/16.jpg)
Neural Networks
Gene1 Gene2 Gene3 ...
B T
![Page 17: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/17.jpg)
Comparison of Methods
Linear discriminant analysis
Nearest centroid
KNN
Neural networks
Support vector machines
Simple method
Based on distance calculation
Good for simple problems
Good for few training samples
Distribution of data assumed
Advanced methods
Involve machine learning
Several adjustable parameters
Many training samples required
(e.g.. 50-100)
Flexible methods
![Page 18: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/18.jpg)
Cross-validation
Given data with 100 samples
Cross-5-validation:Training: 4/5 of data (80 samples)Testing: 1/5 of data (20 samples)- 5 different models and performance results
Leave-one-out cross-validation (LOOCV)Training: 99/100 of data (99 samples)Testing: 1/100 of data (1 sample)- 100 different models
![Page 19: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/19.jpg)
Validation
• Definition of – True and False positives
– True and False negatives
Actual class B B T T
Predicted class B T T B
TP FN TN FP
![Page 20: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/20.jpg)
Sensitivity: the ability to detect "true positives"
TP
TP + FN
Specificity: the ability to avoid "false positives"
TN
TN + FP
Positive Predictive Value (PPV)
TP
TP + FP
Sensitivity and Specificity
![Page 21: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/21.jpg)
Accuracy
• Definition: TP + TN
TP + TN + FP + FN
• Range: [0 … 1]
![Page 22: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/22.jpg)
• Definition: TP·TN - FN·FP
√(TN+FN)(TN+FP)(TP+FN)(TP+FP)
• Range: [-1 … 1]
Matthews correlation coefficient
![Page 23: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/23.jpg)
Overview of Classification
Expression data
Subdivision of data for cross-validationinto training sets and test sets
Feature selection (t-test)Dimension reduction (PCA)
Training of classifier: - using cross-validation
- choice of method- choice of optimal parameters
Testing of classifier Independent test set
![Page 24: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/24.jpg)
Important Points
• Avoid over fitting
• Validate performance– Test on an independent test set
– Use cross-validation
• Include feature selection in cross-validation
Why?
– To avoid overestimation of performance!
– To make a general classifier
![Page 25: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/25.jpg)
Multiclass classification
• Same prediction methods– Multiclass prediction often implemented
– Often bad performance
• Solution– Division in small binary (2-class) problems
![Page 26: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/26.jpg)
Study of Childhood Leukemia: Results
• Classification of immunophenotype (precursorB and T)– 100% accuracy
• During the training • When testing on an independent test set
– Simple classification methods applied• K-nearest neighbor• Nearest centroid
• Classification of outcome (relapse or remission)– 78% accuracy (CC = 0.59)– Simple and advanced classification methods applied
![Page 27: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/27.jpg)
Summary
• When do we use classification?• Reduction of dimension often necessary
– T-test, PCA
• Methods– K nearest neighbor– Nearest centroid– Support vector machines– Neural networks
• Validation– Cross-validation– Accuracy and correlation coefficient
![Page 28: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/28.jpg)
Coffee break
Next: Exercise
![Page 29: Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical](https://reader031.vdocument.in/reader031/viewer/2022032309/56649d485503460f94a2380e/html5/thumbnails/29.jpg)
Prognostic factors:
ImmunophenotypeAgeLeukocyte countNumber of chromosomesTranslocationsTreatment response
Risk classification in the future ?
Risk group:
StandardIntermediateHighVery highEkstra high
Patient:
Clincal data
Immunopheno-typing MorphologyGenetic measurements
Microarraytechnology
Customized treatment