paper.doc

26
Gene expression-based classification of malignant gliomas correlates better with survival than histological classification 1 Catherine L. Nutt, D. R. Mani, Rebecca A. Betensky, Pablo Tamayo, J. Gregory Cairncross, Christine Ladd, Ute Pohl, Christian Hartmann, Margaret E. McLaughlin, Tracy T. Batchelor, Peter M. Black, Andreas von Deimling, Scott L. Pomeroy, Todd R. Golub 2 and David N. Louis 2 Molecular Neuro-Oncology Laboratory and Molecular Pathology Unit, Department of Pathology and Neurosurgical Service [C.L.N., U.P., C.H., T.T.B., D.N.L.] and Brain Tumor Center, Department of Neurology [T.T.B.], Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114; Whitehead Institute/Massachusetts Institute of Technology Center for Genome Research, Cambridge, Massachusetts 02139 [D.R.M., P.T., C.L., T.R.G.]; Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115 [R.A.B.]; Department of Oncology and Clinical Neurological Sciences, University of Western Ontario and London Regional Cancer Centre, London, Ontario N6A 4L6, Canada [J.G.C.]; Department of Pathology [M.E.M.] and Neurosurgery [P.M.B.], Brigham and Women’s Hospital and Division of Neuroscience, Department of Neurology, Children’s Hospital [S.L.P.], Boston, Massachusetts 02115; Department of Neuropathology, Charité Hospital, Humboldt University, Berlin, Germany [A.vD.]; Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts 02114 [T.R.G.]

Upload: yashika54

Post on 12-Jul-2015

109 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Paper.doc

Gene expression-based classification of malignant gliomas

correlates better with survival than histological classification1

Catherine L. Nutt, D. R. Mani, Rebecca A. Betensky, Pablo Tamayo, J. Gregory Cairncross,

Christine Ladd, Ute Pohl, Christian Hartmann, Margaret E. McLaughlin, Tracy T. Batchelor,

Peter M. Black, Andreas von Deimling, Scott L. Pomeroy,

Todd R. Golub2 and David N. Louis2

Molecular Neuro-Oncology Laboratory and Molecular Pathology Unit, Department of Pathology

and Neurosurgical Service [C.L.N., U.P., C.H., T.T.B., D.N.L.] and Brain Tumor Center,

Department of Neurology [T.T.B.], Massachusetts General Hospital and Harvard Medical

School, Boston, Massachusetts 02114; Whitehead Institute/Massachusetts Institute of

Technology Center for Genome Research, Cambridge, Massachusetts 02139 [D.R.M., P.T., C.L.,

T.R.G.]; Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts

02115 [R.A.B.]; Department of Oncology and Clinical Neurological Sciences, University of

Western Ontario and London Regional Cancer Centre, London, Ontario N6A 4L6, Canada

[J.G.C.]; Department of Pathology [M.E.M.] and Neurosurgery [P.M.B.], Brigham and Women’s

Hospital and Division of Neuroscience, Department of Neurology, Children’s Hospital [S.L.P.],

Boston, Massachusetts 02115; Department of Neuropathology, Charité Hospital, Humboldt

University, Berlin, Germany [A.vD.]; Dana-Farber Cancer Institute and Harvard Medical

School, Boston, Massachusetts 02114 [T.R.G.]

Page 2: Paper.doc

Running Title: Microarray-based classification of high grade gliomas

Key Words: microarray, glioblastoma, oligodendroglioma, diagnosis, histology

1 This work was supported in part by NIH CA57683 (D.N.L.); Affymetrix and Bristol-Myers

Squibb (Whitehead Institute/MIT Center for Genome Research); NIH NS35701 (S.L.P.); and

Canadian Institutes of Health Research MOP37849 (J.G.C.).

2Address reprint requests to: David N. Louis, Molecular Pathology Laboratory, CNY7,

Massachusetts General Hospital, 149 13th St., Charlestown, MA 02129. Phone: (617) 726-5690.

Fax: (617) 726-5079. E-mail: [email protected]

Todd R. Golub, Whitehead Institute / Massachusetts Institute of Technology Center for Genome

Research, Building 300, 1 Kendall Square, Cambridge, Massachusetts 02139. E-mail:

[email protected]

3Central Brain Tumor Registry of the United States. http://www.cbtrus.org

4The abbreviations used are: CCNU, 1-(2-chloroethyl)-3-cyclohexyl-1-nitrosourea; k-NN, k-

nearest neighbor; S2N, signal-to-noise; WHO, World Health Organization.

5This complete set of data is available at http://www-genome.wi.mit.edu/cancer/pub/glioma

6http://www-genome.wi.mit.edu/cancer/software/software.html

2

Page 3: Paper.doc

7http://www.r-project.org

3

Page 4: Paper.doc

ABSTRACT

In modern clinical neuro-oncology, histopathological diagnosis affects therapeutic decisions and

prognostic estimation more than any other variable. Among high grade gliomas, for example,

histologically classic glioblastomas and anaplastic oligodendrogliomas follow markedly different

clinical courses. Unfortunately, many malignant gliomas are diagnostically challenging; these

non-classic lesions are difficult to classify by histological features, generating considerable

interobserver variability and limited diagnostic reproducibility. The resulting tentative

pathological diagnoses create significant clinical confusion. We investigated whether gene

expression profiling, coupled with class prediction methodology, could be used to classify high

grade gliomas in a manner more objective, explicit and consistent than standard pathology.

Microarray analysis was used to determine the expression of approximately 12,000 genes in a set

of 50 gliomas: 28 glioblastomas and 22 anaplastic oligodendrogliomas. Supervised learning

approaches were used to build a two-class prediction model based on a subset of 14

glioblastomas and 7 anaplastic oligodendrogliomas with classic histology. A 20-feature k-nearest

neighbor model correctly classified 18 out of the 21 classic cases in leave-one-out cross

validation when compared to pathological diagnoses. This model was then used to predict the

classification of clinically common, histologically non-classic samples. When tumors were

classified according to pathology, the survival of patients with non-classic glioblastoma and non-

classic anaplastic oligodendroglioma was not significantly different (p=0.19). However, class

distinctions according to the model were significantly associated with survival outcome

(p=0.05). This class prediction model was capable of classifying high grade, non-classic glial

tumors objectively and reproducibly. Moreover, the model provided a more accurate predictor of

prognosis in these non-classic lesions than did pathological classification. These data suggest

that class prediction models, based on defined molecular profiles, classify diagnostically

4

Page 5: Paper.doc

challenging malignant gliomas in a manner that better correlates with clinical outcome than does

standard pathology.

5

Page 6: Paper.doc

INTRODUCTION

Malignant gliomas are the most common primary brain tumor and result in an estimated

13,000 deaths each year in the United States.3 Glial tumors are classified histologically, with

pathological diagnosis affecting prognostic estimation and therapeutic decisions more than any

other variable. Among high grade gliomas, anaplastic oligodendrogliomas have a more favorable

prognosis than glioblastomas (1). Moreover, whereas glioblastomas are resistant to most

available therapies, anaplastic oligodendrogliomas are often chemosensitive, with approximately

two-thirds of cases responding to procarbazine, CCNU4 and vincristine (2, 3). Paradoxically,

recognition of the clinical importance of diagnosing anaplastic oligodendroglioma has blurred

the histopathological line separating glioblastoma and oligodendroglioma; to ensure that patients

are not deprived of effective chemotherapy, pathologists have loosened their criteria for

anaplastic oligodendroglioma. Indeed, this diagnostic promiscuity has recently been described as

a “contagion" (4). As such, there is a critical need for an objective, clinically relevant method of

glioma classification.

The most widely used histological system of brain tumor classification is that of the

WHO (1). Gliomas are classified according to defined histological features characteristic of the

presumed normal cell of origin. Tumors of classic histology clearly display these features and

resemble typical depictions in standard textbooks (5, 6); these cases would be diagnosed

similarly by nearly all pathologists. Unfortunately, there are situations in which the WHO

classification system is problematic, primarily because pathological diagnosis remains subjective

(7); for example, intratumoral histological variability is common and high grade gliomas can

display little cellular differentiation, thus lacking defining histological features. The diagnosis of

tumors with such non-classic histology is often controversial. Consequently, diagnostic accuracy

and reproducibility are jeopardized and significant interobserver variability can occur. Coons et

6

Page 7: Paper.doc

al. found that complete diagnostic concordance among four neuropathologists reviewing gliomas

over four sessions peaked at 69% (8). Giannini et al., in a study of seven neuropathologists and

six surgical pathologists scoring histological features of oligodendroglioma, found that

agreement for identifying features ranged from 0.05 to 0.80, confirming that numerous

classification parameters are not easily reproduced (9).

To develop more objective approaches to glioma classification, recent investigations have

focused on molecular genetic analyses. Sasaki et al. demonstrated loss of chromosome 1p in

86% of oligodendrogliomas with classic histology and maintenance of both 1p alleles in 73% of

“oligodendrogliomas” with astrocytic features (10). Interestingly, tumor genotype more closely

predicted chemosensitivity, demonstrating an ability of tumor genotype to augment standard

pathology. Burger et al. also demonstrated close correlation between classic low grade

oligodendroglioma appearance and allelic losses of 1p and 19q (11). In gene expression studies,

Lu et al. suggested that expression of oligodendrocyte lineage genes (Olig1 and 2) might

augment identification of oligodendroglial tumors (12). Similarly, Popko et al. found three of

four myelin transcripts significantly more often in oligodendrogliomas than in astrocytomas (13).

The advent of expression microarray techniques now allows simultaneous analysis of

thousands of genes. We hypothesized that this approach could identify molecular markers

capable of refining the current method of malignant glioma classification. We therefore

investigated whether gene expression profiling, coupled with the computational methodology of

class prediction (14), could be used to define subgroups of high grade glioma in a manner more

objective, explicit and consistent than standard pathology. To this end, a subset of gliomas with

classic histology was used to build a class prediction model and this model was then utilized to

predict the classification of samples with non-classic histology.

7

Page 8: Paper.doc

MATERIALS AND METHODS

Glioma tissue samples

These investigations have been approved by the Massachusetts General Hospital Institutional

Review Board. Tissue samples were collected from Canadian Brain Tumor Tissue Bank

(London, Ontario, Canada), Massachusetts General Hospital (Boston, Massachusetts), Brigham

and Women’s Hospital (Boston, Massachusetts), and Charité Hospital (Berlin, Germany).

Samples were collected immediately following surgical resection, snap frozen, and stored at

-80˚C. Hematoxylin and eosin-stained frozen sections were reviewed histologically for every

specimen (DNL); samples containing significant regions of normal cell contamination (greater

than 10%) and/or excessively large amounts of necrotic material were excluded. Using these

criteria, 50 high grade glioma samples were selected (Table 1): 28 glioblastomas and 22

anaplastic oligodendrogliomas; all were primary tumors sampled prior to therapy. All cases had

been diagnosed at the primary hospital by board certified neuropathologists. Original pathology

slides were obtained and reviewed centrally by two additional neuropathologists (DNL, MEM)

for diagnostic confirmation and selection of the classic tumor subset. Anaplastic

oligodendrogliomas designated as having classic histopathology exhibited relatively evenly

distributed, uniform and rounded nuclei and frequent perinuclear halos (10). In contrast, classic

glioblastomas were characterized by irregularly distributed, pleomorphic and hyperchromatic

nuclei, sometimes with conspicuous eosinophilic cytoplasm. The classic subset of tumors were

cases diagnosed similarly by all examining pathologists and each case resembled typical

depictions in standard textbooks (5, 6). A total of 21 classic tumors were selected and the

remaining 29 samples were considered non-classic tumors, lesions for which diagnosis might be

controversial. Of the 21 classic tumors, 14 were glioblastomas and 7 were anaplastic

oligodendrogliomas.

8

Page 9: Paper.doc

Gene expression profiling

Tissues were homogenized in guanidinium isothiocyanate and RNA was isolated using a CsCl

gradient. RNA integrity was confirmed by gel electrophoresis. For each sample, fifteen

micrograms of total RNA were used to generate biotinylated cRNAs, which were hybridized

overnight to Affymetrix U95Av2 GeneChips as described previously (14, 15). Based on prior

experience, one array per sample provided reproducible results with a sample set of the size used

in this study (14, 16). Arrays were scanned on Affymetrix scanners and data was collected using

GENECHIP software (Affymetrix, Santa Clara, California). Scan quality was assured based on a

priori quality control criteria which included the absence of visible microarray artifacts (e.g.

scratches) and significant differences in microarray intensity, and the presence of greater than

30% “present” calls for the approximately 12,600 genes and ESTs on the U95Av2 GeneChips.

Class prediction methodology

The subset of classic gliomas was used to build a class prediction model. This model was then

used to predict the classification of the non-classic samples. Raw expression values were

normalized by linear scaling so that mean array intensity for active (“present”) genes was

identical for all scans.5 Data filtration settings were based on prior studies (14, 16). Intensity

thresholds were set at 20 and 16,000 units. Gene expression data was subjected to a variation

filter that excluded genes showing minimal variation across the samples; genes whose expression

levels varied less than 100 units between samples, and genes whose expression varied less than

3-fold between any two samples, were removed. The variation filters excluded 2/3 of the genes,

leaving approximately 3,900 genes for building class prediction models. Further feature (gene)

selection was effected, as described previously (14, 16), using the S2N statistic. Signal-to-noise

ratio ranks genes based on their correlation to each of the two class distinctions (i.e., classic

9

Page 10: Paper.doc

glioblastoma and classic anaplastic oligodendroglioma). In addition, the significance of the

highly ranked genes was confirmed by random permutation testing; the sample classification

labels were permuted and the S2N ratio was recomputed to compare the true gene correlations to

what would have been expected by chance. Five different k-NN class prediction models were

built, utilizing different gene numbers (10, 20, 50, 100 and 250 genes), using GeneCluster.6

Training error (on the classic cases) for these k-NN models was determined using leave-one-out

cross validation, where one sample is withheld and the class membership of this withheld sample

is predicted using a model built upon the remaining samples. Class prediction for the withheld

sample was the majority class membership of the k (k = 3 in these experiments) closest

“neighboring” samples based on the Euclidean distance between the sample under consideration

and samples used in training the k-NN model. This process was repeated for each sample in the

training set and a cumulative training error was calculated. Finally, a k-NN model was built

using all 21 classic cases (with no samples left out), which was then used to predict classification

of the remaining gliomas based on the class labels of the k nearest neighbors of each sample.

Survival analyses: Statistical methods

Survival distributions were compared between groups defined by pathology or gene

expression profiling using permutation logrank tests, computed by drawing 50,000 samples from

the relevant permutation distribution. The statistical programming language, R,7 was used to

compute permutation p-values. Kaplan-Meier plots were generated with GraphPad Prism

(Version 3.02, GraphPad Software, San Diego, California).

10

Page 11: Paper.doc

RESULTS AND DISCUSSION

Training of the k-NN class prediction models. We investigated whether gene expression

profiling could be used to define subgroups of high grade glioma more objectively and

consistently than standard pathology. To this end, we examined the expression profile of 14

glioblastomas and 7 anaplastic oligodendrogliomas with classic histology (Fig.1A). Features

(genes) correlating with each of the two class distinctions were ranked according to S2N as

described; diagrammatic results for the top 50 features of each class are illustrated (Fig 1B; the

complete list of genes is available online5). Since the expression profiles demonstrated robust

class distinctions, we proceeded to construct five k-NN class prediction models. The number of

features used in the models was chosen to give a range of prediction accuracy; increasing the

number of genes in a model can improve prediction accuracy by providing additional

biologically relevant input and affording robust signals against noise, whereas using too many

genes can increase inaccuracy by generating excess noise. Models were built using 10, 20, 50,

100 or 250 features and the training error for each model was calculated using leave-one-out

cross validation (Table 2). Although accuracy of the models was comparable, the 20-feature k-

NN model was chosen for further study as it predicted most accurately the class distinctions of

the classic glioma training set (18/21 correct calls; 86 % accuracy).

The 20 features used for prediction in this model correspond to 19 genes due to the

presence of redundant probe sets (Table 3). Genes highly correlated with glioblastoma included a

mixture of metabolic, structural, and signaling proteins. In particular, Rho GTPases (ARHC) and

MAP kinases are members of Ras signal transduction pathways known to play a role in

tumorigenesis and cell migration (17, 18). A large proportion of genes highly correlated with

anaplastic oligodendroglioma were found to be involved in protein translation and ribosome

11

Page 12: Paper.doc

biogenesis; translation factors have been implicated previously as effectors of tumorigenesis

(19). Paradoxically, ribosomal protein-encoding genes were found recently to be correlated with

poor outcome in medulloblastoma (16). These models thus provide a substantial number of

features that correlate with glioma class distinction, but determination of the biological and

clinical significance of these genes requires additional studies.

Training “errors” of the class prediction model. Although a class prediction was made for all

21 classic gliomas using the model, such techniques typically classify some samples with more

confidence than others. For this reason, confidence values were calculated for all predictions

(Table 4). Of the three “errors” within the classic training set, one prediction was made with

relative high confidence (“Brain_CO_4”; ranked 9 out of 21) and two were classified as low

confidence predictions (“Brain_CG_5” and “Brain_CG_10”; ranked 16 and 18, respectively).

“Brain_CO_4”, a classic anaplastic oligodendroglioma, displayed a gene expression profile

strikingly more similar to that of glioblastoma (Fig. 1B) and was classified as a glioblastoma

with relative high confidence in all five k-NN models examined (mean confidence value of 0.17).

Reexamination of reports from the initial diagnosis and slides from the central pathology review

gave no justification for a histological classification of glioblastoma. Although some evidence of

nuclear pleomorphism and hyperchromasia was noted in the original pathology report, the

presence of prominent perinuclear halos and a fine capillary network indicated a classic

anaplastic oligodendroglioma. Furthermore, glial fibrillary acidic protein, an astrocytic marker,

was not expressed in the neoplastic cells. Notably, however, although the histological features of

“Brain_CO_4” were consistent with anaplastic oligodendroglioma, clinical data suggested a

course more characteristic of a glioblastoma, with survival of only seven months from diagnosis.

12

Page 13: Paper.doc

Independent validation of class prediction through survival analysis. The prediction model

classified 18 of 21 classic gliomas identically to the pathological classification during leave-one-

out cross validation. The discrepancies in tumor classification could be the result of a class

prediction model “error” or a diagnostic “error”; preliminary examination of the clinical behavior

of “Brain_CO_4” suggested that the class prediction model provided more pertinent tumor

classification. Ideally, the designation of “error” requires independent validation. Differences in

survival between patients with glioblastomas and those with anaplastic oligodendrogliomas have

been well documented (1); consequently, as an independent validation of the gene expression

prediction model, prediction model classifications were compared to pathological diagnoses with

respect to survival. When the classic gliomas were sorted according to pathology, a clear

distinction was found between survival of patients with glioblastoma and those with anaplastic

oligodendroglioma (Fig. 2). Although this comparison was not statistically significant (n= 21,

P=0.210), most likely due to the small sample size and relatively short follow-up time on three of

the seven anaplastic oligodendrogliomas, statistically significant differences in survival were

seen within the pathologically defined classes when all glioblastomas and anaplastic

oligodendrogliomas were compared (n=50, P=0.009; data not shown). Remarkably however,

when the classic gliomas were sorted using class distinctions according to the model, survival

differences were statistically significant (n=21, P=0.031; Fig. 2). These results demonstrate that,

even within high grade gliomas of classic histology, the biologically and clinically relevant

information afforded by the genetic profiles augments that provided by pathology alone.

Furthermore, the clinical outcome data suggest that the discrepancies in tumor classification are

more likely due to a diagnostic “error” than a class prediction model “error”.

13

Page 14: Paper.doc

Class prediction of non-classic high grade gliomas. Next, we examined the ability of this

model to classify the common, non-classic high grade gliomas that currently cause such clinical

uncertainty regarding therapy and prognosis (Fig. 3A). The ability to identify these lesions in a

uniform and reproducible manner would facilitate more accurate therapeutic decisions and

prognostic estimation, allowing for improved clinical management of individual patients. The

prediction model classifications were compared to pathological diagnoses with respect to

survival. When these diagnostically challenging tumors were classified according to pathology,

survival of patients with non-classic glioblastoma was not significantly different from that of

patients with non-classic anaplastic oligodendroglioma (n=29, P=0.194; Fig. 3B). These results

demonstrate clearly the difficulty in distinguishing these challenging cases in a clinically relevant

manner based exclusively on histological parameters. In contrast, class distinctions according to

the gene expression-based model trained on the classic gliomas were statistically significant

(P=0.051), giving much better separation between the anaplastic oligodendroglioma and

glioblastoma survival curves (Fig. 3B). Thus, gene expression profiles have a remarkable ability

to distinguish histologically ambiguous glioblastomas and anaplastic oligodendrogliomas in a

clinically relevant manner. Indeed, gene expression profiles provide a more objective and

accurate predictor of prognosis in high grade non-classic gliomas than does traditional histology.

In addition, the ability to distinguish histologically ambiguous gliomas enables appropriate

therapies to be tailored to specific tumor subtypes, sparing patients who would not respond from

unnecessary treatments. Moreover, uniform and reproducible classification of these non-classic

lesions would provide improved stratification of patients in clinical trials and molecular marker

studies.

14

Page 15: Paper.doc

Summary. We investigated whether gene expression profiling, coupled with the computational

methodology of class prediction, could be used to define subgroups of high grade glioma in a

manner more objective, explicit and consistent than standard pathology. Not only was this

method effective at classifying high grade gliomas objectively and reproducibly, it also appeared

to provide a more accurate predictor of prognosis. Although the training sample sets for these

models were selected based on classic histological features, the biologically and clinically

relevant information afforded by the genetic profiles greatly augments that provided by

pathology alone. These data therefore suggest that class prediction models, based on defined

molecular profiles, classify diagnostically challenging malignant gliomas in a manner that better

correlates with clinical outcome than does standard pathology.

15

Page 16: Paper.doc

ACKNOWLEDGMENTS

The authors thank Magdalena Zlatescu and Loc Pham for valuable assistance with collecting

patient data; Marcela White and Jennifer Roy for accessing tissue samples and information; Lisa

Sturla for technical assistance; members of the Program in Cancer Genomics, Whitehead

Institute/MIT Center for Genome Research for valuable discussions; and Anat Stemmer-

Rachamimov for critical review of the manuscript.

16

Page 17: Paper.doc

REFERENCES

1. Kleihues, P. and Cavenee, W. K. World Health Organization Classification of Tumours

of the Nervous System. Lyon: WHO/IARC, 2000.

2. Cairncross, J. G. and Macdonald, D. R. Successful chemotherapy for malignant

oligodendroglioma. Ann Neurol, 23: 360-364, 1988.

3. Cairncross, J. G., Ueki, K., Zlatescu, M. C., Lisle, D. K., Finkelstein, D. M., Hammond,

R. R., Silver, J. S., Stark, P. C., Macdonald, D. R., Ino, Y., Ramsay, D. A., and Louis, D.

N. Specific chromosomal losses predict chemotherapeutic response and survival in

patients with anaplastic oligodendrogliomas. J Natl Cancer Inst, 90: 1473-1479, 1998.

4. Burger, P. C. What is an oligodendroglioma? Brain Pathol, 12: 257-259, 2002.

5. Ironside, J. W., Moss, T. H., Louis, D. N., Lowe, J. S., and Weller, R. O. Diagnostic

Pathology of Nervous System Tumours. London: Churchill Livingstone, 2002.

6. Burger, P. C., Scheithauer, B. W., and Vogel, F. S. Surgical Pathology of the Nervous

System and its Coverings, 4 edition, p. 592. London: Churchill Livingstone, 2002.

7. Louis, D. N., Holland, E. C., and Cairncross, J. G. Glioma classification: a molecular

reappraisal. Am J Path, 159: 779-786, 2001.

8. Coons, S. W., Johnson, P. C., Scheithauer, B. W., Yates, A. J., and Pearl, D. K.

Improving diagnostic accuracy and interobserver concordance in the classification and

grading of primary gliomas. Cancer, 79: 1381-1393, 1997.

9. Giannini, C., Scheithauer, B. W., Weaver, A. L., Burger, P. C., Kros, J. M., Mork, S.,

Graeber, M. B., Bauserman, S., Buckner, J. C., Burton, J., Riepe, R., Tazelaar, H. D.,

Nascimento, A. G., Crotty, T., Keeney, G. L., Pernicone, P., and Altermatt, H.

17

Page 18: Paper.doc

Oligodendrogliomas: Reproducibility and prognostic value of histologic diagnosis and

grading. J Neuropathol Exp Neurol, 60: 248-262, 2001.

10. Sasaki, H., Zlatescu, M. C., Betensky, R. A., Johnk, L., Cutone, A., Cairncross, J. G., and

Louis, D. N. Histopathological-molecular genetic correlations in referral pathologist-

diagnosed low-grade "oligodendroglioma". J Neuropathol Exp Neurol, 61: 58-63, 2002.

11. Burger, P. C., Minn, A. Y., Smith, J. S., Borell, T. J., Jedlicka, A. E., Huntley, B. K.,

Goldthwaite, P. T., Jenkins, R. B., and Feuerstein, B. G. Losses of chromosomal arms 1p

and 19q in the diagnosis of oligodendroglioma. A study of paraffin-embedded sections.

Mod Pathol, 14: 842-853, 2001.

12. Lu, Q. R., Park, J. K., Noll, E., Chan, J. A., Alberta, J., Yuk, D., Alzamora, M. G., Louis,

D. N., Stiles, C. D., Rowitch, D. H., and Black, P. M. Oligodendrocyte lineage genes

(OLIG) as molecular markers for human glial brain tumors. Proc Natl Acad Sci USA, 98:

10851-10856, 2001.

13. Popko, B., Pearl, D. K., Walker, D. M., Comas, T. C., Baerwald, K. D., Burger, P. C.,

Scheithauer, B. W., and Yates, A. J. Molecular markers that identify human astrocytomas

and oligodendrogliomas. J Neuropathol Exp Neurol, 61: 329-338, 2002.

14. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P.,

Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander,

E. S. Molecular classification of cancer: class discovery and class prediction by gene

expression monitoring. Science, 286: 531-537, 1999.

15. Bhattacharjee, A., Richards, W. G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C.,

Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E. J., Lander, E. S.,

Wong, W., Johnson, B. E., Golub, T. R., Sugarbaker, D. J., and Meyerson, M.

18

Page 19: Paper.doc

Classification of human lung carcinomas by mRNA expression profiling reveals distinct

adenocarcinoma subclasses. Proc Natl Acad Sci USA, 98: 13790-13795, 2001.

16. Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M.

E., Kim, J. Y. H., Goumnerova, L. C., Black, P. M., Lau, C., Allen, J. C., Zagzag, D.,

Olson, J. M., Curran, T., Wetmore, C., Biegel, J. A., Poggio, T., Mukherjee, S., Rifkin,

R., Califano, A., Stolovitzky, G., Louis, D. N., Mesirov, J. P., Lander, E. S., and Golub,

T. R. Prediction of central nervous system embryonal tumour outcome based on gene

expression. Nature, 415: 436-442, 2002.

17. Boettner, B. and Van Aelst, L. The role of Rho GTPases in disease development. Gene,

286: 155-174, 2002.

18. Ridley, A. J. Rho GTPases and cell migration. J Cell Sci, 114: 2713-2722, 2001.

19. Clemens, M. J. and Bommer, U.-A. Translational control: the cancer connection. Int J

Biochem Cell Biol, 31: 1-23, 1999.

19

Page 20: Paper.doc

Table 1 Summary of Clinical Parameters for the High Grade Glioma Dataset

Pathological diagnosis and survival from date of intial diagnosis are given for all patients.For living patients, survival is given to time of last follow-up.

GBM, glioblastoma; AO, anaplastic oligodendroglioma

Sample Name Pathology Vital Status Survival (Days)Brain_CG_1 Classic GBM Dead 308Brain_CG_2 Classic GBM Dead 281Brain_CG_3 Classic GBM Dead 501Brain_CG_4 Classic GBM Dead 670Brain_CG_5 Classic GBM Alive 729Brain_CG_6 Classic GBM Dead 21Brain_CG_7 Classic GBM Alive 630Brain_CG_8 Classic GBM Dead 263Brain_CG_9 Classic GBM Dead 219Brain_CG_10 Classic GBM Dead 408Brain_CG_11 Classic GBM Dead 242Brain_CG_12 Classic GBM Dead 323Brain_CG_13 Classic GBM Dead 213Brain_CG_14 Classic GBM Dead 97Brain_NG_1 Non-classic GBM Dead 1375Brain_NG_2 Non-classic GBM Alive 1644Brain_NG_3 Non-classic GBM Dead 406Brain_NG_4 Non-classic GBM Dead 308Brain_NG_5 Non-classic GBM Dead 177Brain_NG_6 Non-classic GBM Dead 103Brain_NG_7 Non-classic GBM Alive 992Brain_NG_8 Non-classic GBM Dead 41Brain_NG_9 Non-classic GBM Alive 1354

Brain_NG_10 Non-classic GBM Dead 276Brain_NG_11 Non-classic GBM Dead 519Brain_NG_12 Non-classic GBM Dead 368Brain_NG_13 Non-classic GBM Dead 157Brain_NG_14 Non-classic GBM Dead 1162Brain_CO_1 Classic AO Alive 231Brain_CO_2 Classic AO Alive 1674Brain_CO_3 Classic AO Alive 1604Brain_CO_4 Classic AO Dead 215Brain_CO_5 Classic AO Alive 359Brain_CO_6 Classic AO Alive 171Brain_CO_7 Classic AO Dead 272Brain_NO_1 Non-classic AO Dead 63Brain_NO_2 Non-classic AO Alive 585Brain_NO_3 Non-classic AO Alive 1804Brain_NO_4 Non-classic AO Dead 916Brain_NO_5 Non-classic AO Dead 793Brain_NO_6 Non-classic AO Dead 803Brain_NO_7 Non-classic AO Dead 559Brain_NO_8 Non-classic AO Alive 1137Brain_NO_9 Non-classic AO Alive 1100

Brain_NO_10 Non-classic AO Dead 498Brain_NO_11 Non-classic AO Alive 795Brain_NO_12 Non-classic AO Dead 790Brain_NO_13 Non-classic AO Dead 789Brain_NO_14 Non-classic AO Alive 439Brain_NO_15 Non-classic AO Alive 638

20

Page 21: Paper.doc

Table 2 Training Error of k-NN Models

Class prediction models were built using 10, 20, 50, 100 or 250 features and the training error for each model was

calculated using leave-one-out cross validation.

Number of Features Error10 features 4/2120 features 3/2150 features 5/21100 features 4/21250 features 6/21

21

Page 22: Paper.doc

Table 3 Features of the 20-feature k-NN Class Prediction Model

Genes highly correlated with the class distinction of either GBM or AO in the 20-feature k-NN class prediction model. Affymetrix feature numbers, fold increase in gene expression (GBM>AO;

AO>GBM), accession numbers and gene identifications are shown.GBM, glioblastoma; AO, anaplastic oligodendroglioma

Class Correlation

Feature Number

Fold Increase

AccessionNumber Gene Description

GBM 34091_s_at 2.55 Z19554 VIM: vimentinGBM 630_at 4.83 L39874 DCTD: dCMP deaminaseGBM 631_g_at 2.80 L39874 DCTD: dCMP deaminaseGBM 39691_at 1.80 AB007960 SH3GLB1: SH3-domain GRB2-like endophilin B1GBM 160039_at 5.57 NM_002747 MAPK4: mitogen-activated protein kinase 4

GBM 35016_at 1.89 M13560CD74: CD74 antigen (invariant polypeptide of major histocompatibility complex, class II antigen-associated)

GBM 38791_at 1.78 D29643DDOST: dolichyl-diphosphooligosaccharideprotein glycosyltransferase

GBM 1395_at 2.10 L25081 ARHC: ras homolog gene family, member CGBM 37542_at 2.41 D86961 LHFPL2: lipoma HMGIC fusion partner-like 2GBM 935_at 1.49 L12168 CAP: adenylyl cyclase-associated proteinAO 33619_at 2.20 L01124 RPS13: ribosomal protein S13AO 34679_at 2.64 X02596 BCR: breakpoint cluster regionAO 37573_at 3.96 AF007150 ANGPTL2: angiopoietin-like 2AO 33677_at 1.81 M94314 RPL24: ribosomal protein L24AO 326_i_at 2.03 HG1800-HT1823 RPS20: Ribosomal Protein S20

AO 41325_at 2.43 AF006823KCNK3: potassium channel, subfamily K,member 3 (TASK-1)

AO 38681_at 1.76 U62962EIF3S6: eukaryotic translation initiation factor 3,subunit 6 (48kD)

AO 41792_at 2.16 L78207ABCC8: ATP-binding cassette, sub-family C (CFTR/MRP), member 8

AO 37249_at 3.40 AF079529 PDE8B: phosphodiesterase 8BAO 37953_s_at 2.77 U78181 ACCN2: amiloride-sensitive cation channel 2, neuronal

22

Page 23: Paper.doc

Table 4 Summary of Training Sample Set Class Predictions

Set includes the 21 classic high grade gliomas. The “call” is the classification given by the 20-feature k-NN model during leave-one-out cross validation

and appears along with the confidence value. “Errors” are those tumors whose classification differed from the pathological classification.

GBM, glioblastoma; AO, anaplastic oligodendroglioma

Sample Name Call Confidence Pathology “Error”Brain_CG_8 GBM 0.677 GBMBrain_CG_11 GBM 0.610 GBMBrain_CG_3 GBM 0.558 GBMBrain_CG_4 GBM 0.524 GBMBrain_CG14 GBM 0.455 GBMBrain_CG_2 GBM 0.445 GBMBrain_CO_5 AO 0.377 AOBrain_CO_1 AO 0.234 AOBrain_CO_4 GBM 0.224 AO *Brain_CG_1 GBM 0.182 GBMBrain_CO_6 AO 0.166 AOBrain_CG_9 GBM 0.158 GBMBrain_CO_2 AO 0.143 AOBrain_CO_7 AO 0.141 AOBrain_CG_6 GBM 0.101 GBMBrain_CG_5 AO 0.028 GBM *Brain_CO_3 AO 0.023 AOBrain_CG_10 AO 0.021 GBM *Brain_CG_13 GBM 0.008 GBMBrain_CG_12 GBM 0.006 GBMBrain_CG_7 GBM 0.000 GBM

23

Page 24: Paper.doc

FIGURE LEGENDS

Fig. 1. Characterization of classic high grade gliomas. A, Histological features of classic high

grade gliomas. “Brain_CG_3” (top), classic glioblastoma featuring cells with copious

eosinophilic cytoplasm and fibrillary processes; “Brain_CG_7” (middle), classic glioblastoma

illustrating pleomorphic and spindled cells; “Brain_CO_1” (bottom), classic anaplastic

oligodendroglioma illustrating monomorphic cells with rounded nuclei and perinuclear halos. B,

Classification of high grade gliomas by gene expression. Genes were ranked by the S2N metric

according to their correlation with the classic glioblastoma (GBM) versus classic anaplastic

oligodendroglioma (AO) distinction. Results are shown for the top 50 genes of each distinction.

Each column represents a single glioma sample and each row represents a single gene. For each

gene, red indicates a high level of expression relative to the mean; blue indicates a low level of

expression relative to the mean. The standard deviation from the mean is indicated (σ). Asterisk

indicates “Brain_CO_4” sample.

24

Page 25: Paper.doc

Fig. 2. Survival curves of patients with the 14 classic glioblastomas (dashed line) and 7 classic

anaplastic oligodendrogliomas (solid line) used to train the 20-feature k-NN class prediction

model. Survival curves were plotted according to classifications based on either traditional

pathology or the class prediction model. When classic tumors were sorted according to

pathology, a clear distinction was found between survival of patients with glioblastoma and those

with anaplastic oligodendroglioma, although this comparison was not significantly different

(P=0.210). Survival curves generated using class distinctions according to the class prediction

model were significantly different (P=0.031).

25

Page 26: Paper.doc

Fig. 3. Characterization of non-classic high grade gliomas. A, Histological features of non-

classic high grade gliomas. “Brain_NG_1” (top), non-classic glioblastoma with region having

microgemistocytes that raise the differential diagnosis of anaplastic oligodendroglioma;

“Brain_NG_3” (middle), non-classic glioblastoma with an area of rounded cells that resemble

oligodendroglioma and more spindled cells that resemble glioblastoma; “Brain_NO_14”

(bottom), non-classic anaplastic oligodendroglioma with a region displaying the typical

branching vasculature and calcification (arrowhead) of oligodendroglioma, but with more

spindled cells. A, Survival curves of patients with the 14 non-classic glioblastomas (dashed line)

and 15 non-classic anaplastic oligodendrogliomas (solid line). Survival curves were plotted

according to classifications based on either traditional pathology or the class prediction model

trained on the classic gliomas. When tumors were classified according to pathology, survival of

patients with non-classic glioblastoma was not significantly different from that of patients with

non-classic anaplastic oligodendroglioma (P=0.194). In contrast, class distinctions according to

the class prediction model were significantly different (P=0.051).

26