molecular classification of cancer

39
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring

Upload: isi

Post on 12-Feb-2016

117 views

Category:

Documents


0 download

DESCRIPTION

Molecular Classification of Cancer. Class Discovery and Class Prediction by Gene Expression Monitoring. Overview. Motivation Microarray Background Our Test Case Class Prediction Class Discovery. Motivation. Importance of cancer classification - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Molecular Classification of Cancer

Molecular Classification

of Cancer

Class Discovery and Class Prediction by Gene Expression

Monitoring

Page 2: Molecular Classification of Cancer

Overview Motivation Microarray Background Our Test Case Class Prediction Class Discovery

Page 3: Molecular Classification of Cancer

Motivation Importance of cancer classification Cancer classification has historically

relied on specific biological insights We will discuss a systematic and

unbiased approach for recognizing tumor subtypes

Page 4: Molecular Classification of Cancer

Microarray Background Microarrays enable simultaneous

measurement of the expression levels of thousands of genes in a sample

Microarray:– Glass slide with a matrix of thousands of

spots printed on to it– Each spot contains probes which bind to a

specific gene

Page 5: Molecular Classification of Cancer

Microarray Background (cont.) The process:

– DNA samples are taken from the test subjects

– Samples are dyed with fluorescent colors and placed on the Microarray

– Hybridization of DNA and cDNA

The result:– Spots in the array

are dyed in shades of red to green

Page 6: Molecular Classification of Cancer

Microarray Background (cont.)

Microarray data is translated into an n x p table(p – number of genes, n – number of samples)

0.091.85Gene 4

1.053.34Gene 3

10.53.2Gene 2

2.081.04Gene 1

Sample 2Sample 1

Page 7: Molecular Classification of Cancer

http://www.bio.davidson.edu/courses/genomics/chip/chip.html

Demonstration

Page 8: Molecular Classification of Cancer

Our Test Case 38 bone marrow samples from acute

leukemia patients (27 ALL, 11 AML) RNA from the samples was hybridized

to microarrays containing probes for 6817 human genes

For each gene, an expression level was obtained

Page 9: Molecular Classification of Cancer

Class Prediction Initial collection of samples belonging to

known classes Goal: create a “class predictor” to

classify new samples– Look for “informative genes”– Make a prediction based on these genes– Test the validity of the predictor

Page 10: Molecular Classification of Cancer

Informative genes

Genes whose expression pattern is strongly correlated with the class distinction

strongly correlated

poorly correlated

Page 11: Molecular Classification of Cancer

Neighborhood Analysis

Are the observed correlations stronger than would be expected by chance?

C* is a random permutation of C.Represents a random class distinction

C represents the AML/ALL class distinction

Page 12: Molecular Classification of Cancer

Application to the Test Case

Roughly 1100 genes were more highly correlated with the AML-ALL class distinction than would be expected by chance

Page 13: Molecular Classification of Cancer

Make a Prediction

Use a fixed subset of “informative genes” (most correlated with the class distinction)

Make a prediction on the basis of the expression level of these genes in a new sample

Page 14: Molecular Classification of Cancer

Prediction Algorithm

Each gene Gi votes, depending on whether its expression level Xi in the sample is closer to µAML or µALL

The magnitude of the vote is Wi Vi

– Wi reflects how well the gene is correlated with the class distinction

reflects the deviation of Xi from the average of µAML and µALL

2AML ALL

i iV X

Page 15: Molecular Classification of Cancer

Prediction Algorithm (cont.) The votes for each class are summed to

obtain total votes VAML and VALL

Page 16: Molecular Classification of Cancer

Prediction Algorithm (cont.)

The prediction strength is calculated:

The sample is assigned to the winning class provided that the PS exceeds a predetermined threshold(0.3 in the test case)

win lose

win lose

V VPSV V

Page 17: Molecular Classification of Cancer

Testing the Validity of Class Predictors Cross Validation

– withhold a sample– build a predictor based on the remaining

samples– predict the class of the withheld sample– repeat for each sample

Assess accuracy on an independent set of samples

Page 18: Molecular Classification of Cancer

Application to the Test Case

50 genes most highly correlated with the AML-ALL distinction were chosen

A class predictor based on these genes was built

Page 19: Molecular Classification of Cancer

Application to the Test Case

Performance in cross validation:– Out of 38 samples there were 36

predictions and 2 uncertainties (PS < 0.3)– 100% accuracy– PS median 0.77

Page 20: Molecular Classification of Cancer

Application to the Test Case (cont.) Performance on an independent set of

samples:– Out of 34 samples there were 29

predictions and 5 uncertainties (PS < 0.3)– 100% accuracy– PS median 0.73

Page 21: Molecular Classification of Cancer

Genes useful for cancer class prediction may also provide insight into cancer pathogenesis and pharmacology

Comments

Why 50 genes?– Large enough to be robust against noise– Small enough to be readily applied in a clinical

setting– Predictors based on between 10 to 200 genes all

performed well

Page 22: Molecular Classification of Cancer

Comments (cont.)

Creation of a new predictor involves expression analysis of thousands of genes

Application of the predictor then requires only monitoring the expression level of few informative genes

Page 23: Molecular Classification of Cancer

Class Discovery Cluster tumors by gene expression

– Apply a clustering technique to produce presumed classes

Evaluation of the Classes: – Are the classes meaningful?– Do they reflect true structure?

Page 24: Molecular Classification of Cancer

Clustering Technique - SOMs

SOMs – Self Organizing MapsWell suited for identifying a small number of prominent classes– Find an optimal set of “centroids”– Partition the data set according to the centroids– Each centroid defines a cluster consisting of the

data points nearest to it We won't go into details about the

calculation of SOMs

Page 25: Molecular Classification of Cancer

Application of a two-cluster SOM to the test case

Class A1:24 ALL, 1 AML

Class A2:10 AML, 3 AML

Quite effective at automatically discovering the two types of leukemia

Not perfect

Page 26: Molecular Classification of Cancer

Evaluation of the Classes

How can we evaluate such classes if the “right” answer is not already known?

Hypothesis: class discovery can be tested by class prediction– If the classes reflect true structure, then a

class predictor based on them should perform well

Let’s test this hypothesis...

Page 27: Molecular Classification of Cancer

Validity of Predictors Based on A1 and A2 Predictors based on different numbers

of informative genes performed well For example: a 20-gene predictor

Page 28: Molecular Classification of Cancer

Validity of Predictors Based on A1 and A2 cont. Performance on

independent samples:– PS median 0.61– Prediction made for

74% of samples

Page 29: Molecular Classification of Cancer

Validity of Predictors Based on A1 and A2 cont. Performance in

cross validation:– 34 accurate

predictions with high prediction strength

– One error– Three uncertains

Page 30: Molecular Classification of Cancer

the one cross validation error

2 of the 3 cross validation

uncertains

Page 31: Molecular Classification of Cancer

Iterative Procedure

Use a SOM to initially cluster the data Construct a predictor Remove samples that are not correctly

predicted in cross-validation Use the remaining samples to generate

an improved predictor Test on an independent data set

Page 32: Molecular Classification of Cancer

Performance:– Poor accuracy in

cross validation– Low PS on

independent samples

Validity of Predictors Based on Random Clusters

Page 33: Molecular Classification of Cancer

Conclusion

The AML-ALL distinction could have been automatically discovered and confirmed without previous biological knowledge

Page 34: Molecular Classification of Cancer

Application of a 4-cluster SOM to the Test Case

Page 35: Molecular Classification of Cancer

Evaluation of the Classes Complement approach:

– Construct class predictors to distinguish each class from its complement

Pair-wise approach:– Construct class predictors to distinguish

between each pair of classes Ci,Cj

– Perform cross validation only on samples in Ci and Cj

Page 36: Molecular Classification of Cancer

Evaluation of the Classes Class predictors distinguished the

classes from one another, with the exception of B3 versus B4

Page 37: Molecular Classification of Cancer

Conclusion

The results suggest the merging of classes B3 and B4

The distinction corresponding to AML, B-ALL and T-ALL was confirmed

Page 38: Molecular Classification of Cancer

Uses of Class Discovery

Identify fundamental subtypes of any cancer

Search for fundamental mechanisms that cut across distinct types of cancers

Page 39: Molecular Classification of Cancer

Questions?

Thank you for listening