catalog of gene expression in adult neural stem cells and their in vivo microenvironment

15
Research Article Catalog of gene expression in adult neural stem cells and their in vivo microenvironment Cecilia Williams a,b,1 , Valtteri Wirta a,1 , Konstantinos Meletis c , Lilian Wikström d , Leif Carlsson e , Jonas Frisén c , Joakim Lundeberg a, a School of Biotechnology, Department of Gene Technology, KTH-Royal Institute of Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden b Department of Biosciences at Novum, Karolinska Institutet, SE-14157 Huddinge, Sweden c Department of Cell and Molecular Biology, Medical Nobel Institute, Karolinska Institutet, SE-171 77 Stockholm, Sweden d NeuroNova AB, SE-114 33 Stockholm, Sweden e Umeå Center for Molecular Medicine, Umeå University, SE-90187 Umeå, Sweden ARTICLE INFORMATION ABSTRACT Article Chronology: Received 21 October 2005 Revised version received 9 February 2006 Accepted 13 February 2006 Available online 20 March 2006 Stem cells generally reside in a stem cell microenvironment, where cues for self-renewal and differentiation are present. However, the genetic program underlying stem cell proliferation and multipotency is poorly understood. Transcriptome analysis of stem cells and their in vivo microenvironment is one way of uncovering the unique stemness properties and provides a framework for the elucidation of stem cell function. Here, we characterize the gene expression profile of the in vivo neural stem cell microenvironment in the lateral ventricle wall of adult mouse brain and of in vitro proliferating neural stem cells. We have also analyzed an Lhx2-expressing hematopoietic-stem-cell-like cell line in order to define the transcriptome of a well-characterized and pure cell population with stem cell characteristics. We report the generation, assembly and annotation of 50,792 high-quality 5-end expressed sequence tag sequences. We further describe a shared expression of 1065 transcripts by all three stem cell libraries and a large overlap with previously published gene expression signatures for neural stem/progenitor cells and other multipotent stem cells. The sequences and cDNA clones obtained within this framework provide a comprehensive resource for the analysis of genes in adult stem cells that can accelerate future stem cell research. © 2006 Elsevier Inc. All rights reserved. Keywords: Neural stem cell Transcriptome Lateral ventricle wall Neurosphere EST Gene expression Microenvironment Digital signature Introduction Stem cells are generally defined as unspecialized, self-renew- ing cells that can generate one or more specialized cell types [1]. The main obstacle in the study of stem cells has been the difficulty of isolating and in vitro propagating a pure stem cell population. In the hematopoietic system, which is the most well-described stem cell system, a panel of surface markers can be used to sort and isolate long-term self-renewing stem cells, short-term self-renewing stem cells and progenitors [25]. Several studies have also reported the isolation of neural stem cells (NSCs) using surface markers [610]. However, EXPERIMENTAL CELL RESEARCH 312 (2006) 1798 1812 Corresponding author. Fax: +46 8 5537 8481. E-mail address: [email protected] (J. Lundeberg). 1 The authors contributed equally to the work. 0014-4827/$ see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.yexcr.2006.02.012 available at www.sciencedirect.com www.elsevier.com/locate/yexcr

Upload: cecilia-williams

Post on 13-Sep-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

ava i l ab l e a t www.sc i enced i rec t . com

www.e l sev i e r. com/ loca te /yexc r

Research Article

Catalog of gene expression in adult neural stem cellsand their in vivo microenvironment

Cecilia Williamsa,b,1, Valtteri Wirtaa,1, Konstantinos Meletisc, Lilian Wikströmd,Leif Carlssone, Jonas Frisénc, Joakim Lundeberga,⁎aSchool of Biotechnology, Department of Gene Technology, KTH-Royal Institute of Technology,AlbaNova University Center, SE-106 91 Stockholm, SwedenbDepartment of Biosciences at Novum, Karolinska Institutet, SE-14157 Huddinge, SwedencDepartment of Cell and Molecular Biology, Medical Nobel Institute, Karolinska Institutet, SE-171 77 Stockholm, SwedendNeuroNova AB, SE-114 33 Stockholm, SwedeneUmeå Center for Molecular Medicine, Umeå University, SE-90187 Umeå, Sweden

A R T I C L E I N F O R M A T I O N

⁎ Corresponding author. Fax: +46 8 5537 8481.E-mail address: joakim.lundeberg@biotech

1 The authors contributed equally to the w

0014-4827/$ – see front matter © 2006 Elsevidoi:10.1016/j.yexcr.2006.02.012

A B S T R A C T

Article Chronology:Received 21 October 2005Revised version received9 February 2006Accepted 13 February 2006Available online 20 March 2006

Stem cells generally reside in a stem cell microenvironment, where cues for self-renewaland differentiation are present. However, the genetic program underlying stem cellproliferation and multipotency is poorly understood. Transcriptome analysis of stem cellsand their in vivo microenvironment is one way of uncovering the unique stemnessproperties and provides a framework for the elucidation of stem cell function. Here, wecharacterize the gene expression profile of the in vivo neural stem cell microenvironment inthe lateral ventricle wall of adult mouse brain and of in vitro proliferating neural stem cells.We have also analyzed an Lhx2-expressing hematopoietic-stem-cell-like cell line in order todefine the transcriptome of a well-characterized and pure cell population with stem cellcharacteristics. We report the generation, assembly and annotation of 50,792 high-quality5′-end expressed sequence tag sequences. We further describe a shared expression of 1065transcripts by all three stem cell libraries and a large overlap with previously published geneexpression signatures for neural stem/progenitor cells and other multipotent stem cells.The sequences and cDNA clones obtained within this framework provide a comprehensiveresource for the analysis of genes in adult stem cells that can accelerate future stem cellresearch.

© 2006 Elsevier Inc. All rights reserved.

Keywords:Neural stem cellTranscriptomeLateral ventricle wallNeurosphereESTGene expressionMicroenvironmentDigital signature

Introduction

Stem cells are generally defined as unspecialized, self-renew-ing cells that can generate one or more specialized cell types[1]. The main obstacle in the study of stem cells has been thedifficulty of isolating and in vitro propagating a pure stem cell

.kth.se (J. Lundeberg).

ork.

er Inc. All rights reserved

population. In the hematopoietic system, which is the mostwell-described stem cell system, a panel of surface markerscan be used to sort and isolate long-term self-renewing stemcells, short-term self-renewing stem cells and progenitors [2–5]. Several studies have also reported the isolation of neuralstem cells (NSCs) using surface markers [6–10]. However,

.

1799E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

consensus has not yet been achieved regarding the use ofthese markers and the identity of the isolated cells, and henceNSCs are identified retrospectively on the basis of theirbehavior.

Multipotent NSCs are present in small numbers in the adultcentral nervous system (CNS) where they reside in regionsthat support self-renewal and differentiation [11]. The mainneurogenic regions in the adult brain are the subventricularzone (SVZ) of the lateral ventricle wall (LVW) where inter-neurons for the olfactory bulb are generated and the sub-granular zone of the dentate gyrus in the hippocampus.Ependymal cells, SVZ astrocytes, immature precursors andneuroblasts are the main cell types of the LVW. The exactidentity of the putative adult NSCs is under debate, but bothSVZ astrocytes and ependymal cells have been identified assuch [7,12,13]. Previous studies have shown that cells from thelateral ventricles of adult forebrain form sphere-like aggre-gates in vitro when exposed to EGF and cultured underappropriate conditions. These neurospheres contain cellsthat exhibit stem cell characteristics; that is, they are self-renewing in the presence of appropriatemitogens, they can beclonally expanded and they are capable of producing themaincell types of the brain: neurons, astrocytes and oligodendro-cytes [7,12]. However, it is not yet fully understood how closelythe neurospheres resemble their initiating in vivo cells and,furthermore, it has been suggested that neurospheres origi-nate from Dlx2-positive transit-amplifying neural progenitorcells [14], and not from the relatively quiescent stem cells.

Studies using subtractive hybridization approaches [15,16],EST sequencing [17] and microarray-based approaches [18,19]have provided an insight into the unique features of stem cellgene expression. Importantly, these studies have identifiedgenes and pathways that are active in neurospheres andneural stem/progenitor cells. The shared capabilities of stemcells make it likely that genes involved in core stem cellfunctions, such as self-renewal and multipotency, areexpressed by different stem cells regardless of their tissueorigin or culture conditions. The existence of a particulargenetic program in stem cells has been suggested, however,several microarray-based gene expression analysis effortsaiming at identifying such a program have not yet yieldedresults that survive extrapolation to other stem cell systems[16,20,21]. A possible explanation is the expression of severalkey genes below the detection limit for the microarray studiesor the lack of corresponding probes on the arrays. The latterexplanation is supported by the findings in a large-scale ESTsequencing study where 977 possibly novel genes in variouscDNA libraries related to stem cells and early embryonicdevelopment were found [17], which indicates that a substan-tial number of important genes or transcript variants may bemissing in the public databases. It is therefore important toremember that, while the microarray technology can provideuseful information on the differences in gene expressionlevels between cell types, it is restricted to the genes presenton the array and does not always give a complete descriptionof the cells' transcriptional activity.

To further increase our knowledge of the transcriptome ofthe stem cells, we carried out a large-scale sequencing ofexpressed sequence tags (EST) derived from the NSC in vitromodel (neurospheres) and from the NSCs' in vivo microenvi-

ronment (lateral ventricle wall). To control for sampleheterogeneity and to avoid the influence of culture adaptation,we used small, early-passage neurospheres with high propor-tion of cells retaining the capability to reinitiate neurosphereformation.We used a normalized neurosphere cDNA library inorder to increase the probability of detecting genes expressedat very low levels. For analysis of the NSC microenvironment,we chose an inclusive strategy where the LVW library wascreated from unfractionated tissue. This approach ensuresthat most cell types in the region, including the NSC and earlyprogenitors, are included. Furthermore, we compared theobtained gene expression signatures to previously publishedsets of 13,286 and 14,514 ESTs from unrelated sources ofneurospheres and differentiated neurospheres [17] and topreviously published stem cell gene expression signatures[20–22]. Finally, we compared the ESTs obtained from theneural stem/progenitor cells to ESTs we obtained from ahematopoietic stem cell line in order to identify sharedtranscripts.

Material and methods

Sample preparation

Lateral ventricle wall libraryMicrodissected LVW tissue from one hundred twenty 8- to 10-week-old C57bl6 mice (Charles River) was immediately afterdissection placed in RNAlater (Ambion) and pooled for RNAisolation. Total RNA was prepared using Qiagen shreddercolumns and the RNeasy mini kit (Qiagen) and eluted inRNase-free water supplemented with SuperaseIn (Ambion).

Neurosphere libraryThe lateral wall of the lateral ventricle of eighty 8- to 10-week-old mice was enzymatically dissociated in 0.8 mg/ml hyal-uronidase and 0.5 mg/ml trypsin in Dulbecco's modified Eaglemedium (DMEM) containing 4.5 mg/ml glucose and 80 U/mlDNase at 37°C for 20 min. The cells were gently triturated andmixed with three volumes of neurosphere medium (DMEM/F12, B27 supplement, 12.5mMHEPES, pH 7.4) containing 20 ng/ml EGF, 100 U/ml penicillin and 100 μg/ml streptomycin. Afterpassing through a 70-μm strainer, the cells were pelleted at250 × g for 5min. The supernatant was subsequently removed,and the cells resuspended in neurosphere medium supple-mented as above, plated in uncoated culture dishes andincubated at 37°C. Neurospheres were split according to [7] 6–7days after plating. Dissociated cells were plated and grown inneurospheremedium supplementedwith EGF for a further 3–4days by which time secondary neurospheres had developed.Throughout the neurosphere preparation procedure the sizeof the spheres was kept small (<50 cells) to avoid developmentinto heterogeneous cell aggregates. To verify the in vitrodifferentiation capacity, passaged neurospheres were trans-ferred onto laminin-coated slides and allowed to differentiatefor 7 days. Primary antibodieswere used at 4°C overnight: anti-Gfap (1:2000, DakoCytomation), anti-βIII-tubulin (Tuj1, 1:1000,Biosite), anti-O4 (1:100, Chemicon) and all secondary fluores-cent antibodies (Jackson Immunoresearch) were applied atroom temperature for 1 h. Images were acquired on a Zeiss

1800 E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

Axioplan2 microscope. Total RNA was prepared using theRNeasy RNA extraction system (Qiagen).

Hematopoietic-stem-cell-like cell line libraryThe generation andmaintenance of the bone-marrow-derivedHSC-like cell lines, denoted BM-HPC lines, have been previ-ously described in detail [23]. Lhx2 immortalizes the hemato-poietic progenitor cell lines through a cell nonautonomousmechanism, and the transduced cells were therefore kept at adensity of at least 2 × 106 cells until a population of blast-likecells was obtained. The BM-HPC clone #5 was cultured inIscove's modified Dulbecco's media (IMDM) supplementedwith 1.5 × 10−4 Mmonothiolglycerol (MTG) (Sigma-Aldrich), 5%fetal calf serum (Integro), 100 ng/ml stem cell factor (R&DSystems), 10 ng/ml of IL-6 (R&D Systems) 100 U/ml penicillinand 100 μg/ml streptomycin. Cells were cultured at 5 × 105–2 × 106 cells/ml. Total RNA from 108 cells was prepared usingthe RNeasy RNA extraction system (Qiagen).

Library manufacturing

The library construction was performed by Invitrogen. Briefly,first-strand cDNA synthesis was primedwith Oligo-dTwith aninternal NotI restriction site. Following second-strand synthe-sis and T4 DNA polymerase treatment to yield blunt ends, thecDNA was digested with NotI and ligated into NotI/EcoRV-cleaved pCMVSport6.0 vector (pCMVSport 6.1 for the BM-HPClibrary) and transformed into E. coli EMDH10B-TON A cells(Invitrogen) through electroporation. The NS library wasnormalized in order to facilitate detection of low-abundancetranscripts (Invitrogen).

Sequencing

Transformed bacteria were plated and colonies inoculatedusing an automated colony picker into 96-well plates (50 μl LBmedium supplemented with ampicillin) and incubated over-night at 37°C. From these, 1-ml LB cultures were grown, andplasmids prepared according to a high-throughput 96-wellmicrowave boiling protocol [24]. Sequencing into the 5′-end ofthe cDNA inserts was performed using the M13 reverse primerand BigDye Terminator chemistry on an ABI 3700 instrument(Applied Biosystems). In addition to the sequences obtainedand analyzed within this work, we also included two publicly

Table 1 – Summary of the EST libraries

Library Library ID Notes Cfu ESTs End redun

LVW Lib. 16789 – 1.2 × 106 14,884 65%NS* Lib. 16808 normalized 4 × 105 25,501 75%BM-HPC Lib. 16809 pre-amplified 6 × 107 10,407 93%In total 50,792

The library id (Lib.) refers to the NCBI's dbEST library number. Cfu indicatrepresents the number of sequences that remain after low quality, vectorrepresents the redundancy when the sequencing was discontinued (at 93not already detected in the library). Sequences were submitted to dbESclusters indicates the number of different clusters represented in the liunknowns. The 25 largest clusters shows the number and percentage of sthe library was normalized.

available EST libraries into the analysis. These libraries weredownloaded from the UniGene library browser using acces-sion numbers Lib. 12356 (NIA Mouse Neural Stem Cell(Undifferentiated) cDNA Library, called NS-NIA) and Lib.12357 (NIA Mouse Neural Stem Cell (Differentiated) cDNALibrary, called NS-DIFF).

Sequence data analysis

Base calling of the electropherograms was performed usingphred [25] and TraceTuner (Paracel). Sequence qualitycontrol and assembly of ESTs were performed using ParacelTranscript Assembler (Paracel) with the default settingsoptimized for clustering and assembly of ESTs. Low-qualityend sequence (defined by iterating a 30-bp window inwardsfrom start and end of the sequence until reaching a meanQV > 13) and vector, E. coli, rRNA, mitochondrial and viralsequences were filtered, and low-complexity sequence andavian repeats were masked. ESTs with <100 bp sequencewere discarded. All sequences were deposited in the dbESTsequence database (http://www.ncbi.nlm.nih.gov/dbEST/)and are available using the library accession numbers listedin Table 1.

EST and transcript annotation

The annotation of the ESTs is based on the UniGene clusteringprovided by NCBI [26]. UniGene cluster IDs (Mus musculus build145) were linked to other data sources using files downloadedin March 2005 from the NCBI UniGene and Gene ftp sites. Allmappings were carried out using functions available in thekth-package (version 0.4.3; http://www.ktharray.se) imple-mented in R [27]. For the ESTs which did not receive a UniGenecluster ID, annotation and similarity searcheswere carried outusing stand-alone Blast [28] version 2.2.9 searches in NCBI'sRefSeq (version 9; April 2005), UniGene (build 145), Ensembltranscript database (April 2005), Ensembl Mouse genome(build NCBIm33) and Ensembl unmapped contigs (April2005). A hit was considered high quality if the E value was0.0, the identity >97% and at least 80% of the query aligned tothe hit. If no high-quality hit was found, a hit was accepted ifthe E score was <10−50, the identity was >90% and >20/60% ofthe query aligned with the hit (see Fig. 1D for additionaldetails). All sequences were searched against the Rfam and

dancy UniGene clusters Unknowns 25 largest clusters

5417 1076 (7.2%) 1112 (7.5%)7848 1590 (6.2%) 647 (2.5%)2592 481 (4.6%) 1690 (16.2)

3147 (6.2%)

es the estimated number of colony forming units in the library. ESTsand contaminating sequences have been removed. End redundancy%, only 7 of 100 new high-quality sequences represented transcriptsT and annotated against Mus musculus UniGene build 145. UniGenebrary. Sequences not included in the UniGene build 145 were calledequences included in the 25 largest UniGene clusters. * Indicates that

Fig. 1 – Neurosphere culturing and library analysis. (A) Image of a representative neurosphere cultured under the conditionsused for extraction of RNA for the construction of the NS cDNA library. Scale bar is 10 μm. (B) Multilineage differentiation of aneurosphere cultured under the same conditions as in panel A. (C) The EST frequencies in each library were converted totranscripts per million (TPM) to assist in comparison of the within-library redundancies. The EST frequencies are visualized byusing boxplots where the widths of the boxes are drawn in proportion to the number of high-quality ESTs. (D) ESTs that werenot mapped to M. musculus UniGene build 145 are called unknowns and were annotated using sequence similarity searchesagainst several databases. A hit was considered significant if the requirements shown on the right were fulfilled. E, E value forthe Blast search; qa, proportion of query sequence that must align with the hit sequence.

1801E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

Ensembl databases for non-coding RNA. A match wasconsidered significant if the E value was <10−20 and theidentity >90%. The remaining sequences were searchedagainst the mouse sequences in the dbEST section of theNCBI GenBank.

Functional annotations were derived by parsing the GeneOntology annotations provided by the Mouse Genome Infor-matics [29]. Analysis of overrepresentation of functionalgroups was carried out using the EASE software [30] and thecomplete mouse genome as background for the frequencycalculations. A theme was considered significantly overrep-resented if the EASE score was <0.05. The results of thefunctional analyses using EASE are summarized in the Resultssection, but not included as Supplementary Material. Ifrequired, the analyses can easily be repeated using the on-line version of EASE and the transcript lists provided asSupplementary Material.

Analyses of differentially represented transcripts and libraryoverlaps

Differences in EST representations between the libraries wereanalyzed using the Fisher's Exact Test. The libraries LVW, BM-HPC, NS-NIA and NS-DIFF were included and all pair-wiseover- and underrepresentations calculated. The P values wereadjusted for multiple hypothesis testing using the conserva-tive Bonferroni adjustment, and transcripts with P < 0.05 wereconsidered significant. The general overlap between theexpressed transcripts of two or more libraries was analyzedusing Venn diagrams. The observed overlap was compared toan expected overlap using a standard t test. An expectedoverlap was derived by repeated random sampling of thecomplete UniGene using transcript-wise probabilities. Thesewere derived by dividing the EST count for each UniGenecluster by the total number of ESTs included in theM. musculus

1802 E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

UniGene build 145. The sampling procedure was iterated 500times, and a P value <0.05 was considered significant.

Results

Neurosphere culturing and library construction

Three cDNA libraries (Table 1) were generated from samplescontaining stem and progenitor cells: from the in vivo neuralstem cell microenvironment referred to as the lateralventricle wall (LVW) library, from in vitro cultured neuralstem cells (neurospheres, referred to as NS) and from an invitro cultured hematopoietic-stem-cell-like cell line (referredto as BM-HPC). Typically, prolonged culturing of the neuro-spheres to large aggregates of hundreds to thousands of cellsreduces the proportion of cells with true stem cell properties.To avoid this and to enrich for NSCs, the cultured neuro-spheres were always collected after two passages when thesize of the cell aggregates was small (<50 cells) (Fig. 1A).These neurospheres also retained the multilineage differen-tiation capacity (Fig. 1B). At harvesting, the fraction of cellsexhibiting stem cell properties as judged by neurospherereinitiation capacity was estimated to 30% (data not shown),which is one magnitude higher than what is commonlyachieved with larger neurospheres. The low passage numberand the short culturing time yield only a limited number ofcells, and therefore, to obtain sufficient amount of RNA forthe NS library generation, the LVW region was dissected from80 mice. The NS library was normalized in order to facilitatedetection of transcripts expressed at low levels. Randomlyselected clones were sequenced from the 5′-end andsequencing continued until a high coverage was achieved(Table 1).

Analyses of the EST profiles

Lateral ventricle wall libraryFor the NSC microenvironment, 14,884 ESTs resulted in 5417UniGene clusterswith the frequency distribution shown in Fig.1C and 1076 unknown sequences (7.2% of the ESTs). ESTfrequencies were normalized to transcripts per million (TPM)in order to assist in the comparison between the libraries. The25 most highly expressed transcripts are listed in Table 2, anda complete annotation is available on-line as SupplementaryMaterial 1. Of the 5417 annotated transcripts, approximately50% were assigned a Gene Ontology theme in the BiologicalProcesses branch. Genes with the highest expression levelswere involved in protein modification, cell differentiation,transcription, metabolism and several processes specific forthe CNS, such as ‘synaptic transmission’, ‘ionic insulation ofneurons’ and ‘neuropeptide signaling pathway’ (data notshown). Compared to the complete mouse genome, tran-scripts that belong to various metabolism, biosynthesis,protein localization and signal transduction related themeswere significantly overrepresented. Furthermore, severalbrain and neuron-specific themes, such as ‘synaptic trans-mission’, ‘regulation of neurotransmitter levels’, ‘behavior’,‘locomotory behavior’ and ‘axonogenesis’, were overrepre-sented (data not shown).

Neurosphere librarySequencing of the NS library generated 25,501 ESTs. Assemblyinto UniGene clusters generated 7848 clusters (SupplementaryMaterial 2), which together contain 23,911 (93.8%) of all ESTs inthe library. An additional 1590 ESTs (6.2%) were not incorpo-rated into UniGene (Table 1). The purpose of cDNA librarynormalization is to increase the detection probability of lowabundance genes by minimizing the influence of highlyexpressed genes [31,32], which facilitates the identificationof the majority of genes expressed in a certain cell or tissue.However, the possibility to estimate the true expression levelof a given gene is lost. As expected, fewer transcripts arerepresented multiple times in the NS library compared to theother libraries (Fig. 1C) and the 25 largest clusters (Table 2)contain only 2.5% (647) of all ESTs, indicative of successfullibrary normalization. The highest observed EST count in theNS library represents Metrn, a 31-kDa secreted protein. TheUniGene cluster forMetrn consists of 139 ESTs, out of which 39originate from the normalized NS library. However, only oneEST for Metrn was detected in the LVW library. Interestingly,several previous studies have shown a high Metrn expressionin the olfactory bulb [33]; in unfertilized, fertilized and in 2-and 4-cell stage embryos and a strongly reduced expression atlater stages (NCBI GEO [34] dataset GDS578); in retinalprogenitor cell containing cell layers of the developing retina[35]; and a high expression in neurospheres derived fromcentral nervous system and peripheral nervous system [36].This 31-kDa secreted protein is involved in regulation of glialcell differentiation, probably through autocrine signaling, butmay also be involved in the maintenance of the undifferen-tiated state of neural progenitors [35]. Analysis of GO themesindicated that several metabolism-related themes were over-represented, but also themes relating to protein localizationand cell proliferation (data not shown).

Hematopoietic stem cell libraryAs a model system for hematopoietic stem cells (HSCs), weused an HSC-like cell line (BM-HPC) generated by expressingthe LIM-homeobox gene Lhx2 in adult bone marrow cells. Thiscell line is strictly stem-cell-factor- and interleukin-6-depen-dent and shares many characteristics with normal HSCs, suchas growth factor response, transcription factor expression, cellsurface marker expression and, most importantly, long-termengraftment upon transplantation into HSC-deficient mice[23]. The BM-HPC cell line represents a homogenous popula-tion of hematopoietic multipotent progenitor/stem cells.Sequencing of this library resulted in 10,407 ESTs that werefurther assigned into 2592 UniGene clusters (SupplementaryMaterial 3). The top 25 clusters contain 16.2% (1690) of all ESTsin the library, which indicates that the redundancy is muchhigher in the BM-HPC library than in the NS or LVW library(Table 2). The most highly expressed genes are involved inprotein folding, regulation of cell cycle, signal transduction,protein transport and microtubule-based processes. A total of481 (4.6%) ESTs were not assembled into UniGene clusters.

Differentially represented transcripts

Statistically significant differences between EST libraries weredetected using a 2 × 2 contingency table and the conservative

Table 2 – UniGene clusters with the highest representations in the three stem cell cDNA libraries

UniGene ID Library ESTs TPM TPM, UniGene Gene symbol Name

Mm.305152 LVW 76 5106 511 Apoe Apolipoprotein EMm.282093 LVW 74 4972 552 Gm1821 Gene model 1821, (NCBI)Mm.335315 LVW 63 4233 100 Eef1a1 Eukaryotic translation elongation

factor 1 alpha 1Mm.45372 LVW 61 4098 71 Ppp1r1b Protein phosphatase 1, regulatory

(inhibitor) subunit 1BMm.371591 LVW 56 3762 1244 Tuba1 Tubulin, alpha 1Mm.331905 LVW 54 3628 128 Hpca HippocalcinMm.29846 LVW 51 3426 392 Ndrg4 N-myc downstream regulated gene 4Mm.252063 LVW 50 3359 1543 Mbp Myelin basic proteinMm.285993 LVW 49 3292 1244 Calm1 Calmodulin 1Mm.329243 LVW 49 3292 861 Calm2 Calmodulin 2Mm.297444 LVW 43 2889 386 MGI:107562 Cyclic AMP-regulated phosphoprotein, 21Mm.16831 LVW 42 2822 356 Ckb Creatine kinase, brainMm.1268 LVW 41 2755 912 Plp1 Proteolipid protein (myelin) 1Mm.275831 LVW 40 2687 454 Aldoa Aldolase 1, A isoformMm.290774 LVW 39 2620 300 Hspa8 Heat shock protein 8Mm.29870 LVW 37 2486 376 Itm2c Integral membrane protein 2CMm.31395 LVW 37 2486 481 Cpe Carboxypeptidase EMm.216135 LVW 36 2419 654 Pkm2 Pyruvate kinase, muscleMm.212703 LVW 35 2352 407 Mdh1 Malate dehydrogenase 1, NAD (soluble)Mm.238973 LVW 32 2150 1150 Atp5b ATP synthase, H+ transporting mitochondrial

F1 complex, beta subunitMm.3304 LVW 32 2150 366 Nsg2 Neuron-specific gene family member 2Mm.196173 LVW 29 1948 350 Actg1 Actin, gamma, cytoplasmic 1Mm.277498 LVW 29 1948 1831 Psap ProsaposinMm.278458 LVW 29 1948 390 Fkbp1a FK506 binding protein 1aMm.1776 LVW 28 1881 749 Fth1 Ferritin heavy chain 1Mm.41925 NS 39 1529 49 Metrn Meteorin, glial cell differentiation regulatorMm.30016 NS 37 1451 63 Arhgdia Rho GDP dissociation inhibitor (GDI) alphaMm.289707 NS 34 1333 284 Fscn1 Fascin homolog 1, actin bundling protein

(Strongylocentrotus purpuratus)Mm.44173 NS 34 1333 83 Lrrc4b Leucine-rich repeat containing 4BMm.306946 NS 30 1176 100 Wnt7b Wingless-related MMTV integration site 7BMm.5021 NS 30 1176 243 Ddr1 Discoidin domain receptor family, member 1Mm.30141 NS 29 1137 157 Gnb2 Guanine nucleotide binding protein, beta 2Mm.236443 NS 27 1059 407 Fasn Fatty acid synthaseMm.280175 NS 27 1059 40 Mmp14 Matrix metalloproteinase 14 (membrane-inserted)Mm.30214 NS 25 980 296 0610006O14Rik RIKEN cDNA 0610006O14 geneMm.329616 NS 25 980 151 Nrxn2 Neurexin IIMm.41665 NS 25 980 66 Grina Glutamate receptor, ionotropic, N-methyl

D-asparate-associated protein 1 (glutamate binding)Mm.242644 NS 24 941 108 Pcdh10 Protocadherin 10Mm.29737 NS 24 941 149 Mlf2 Myeloid-leukemia factor 2Mm.254134 NS 23 902 266 Gpiap1 GPI-anchored membrane protein 1Mm.338476 NS 23 902 204 Pbp Phosphatidylethanolamine binding proteinMm.4598 NS 23 902 216 Bcan BrevicanMm.16898 NS 22 863 165 Phgdh 3-Phosphoglycerate dehydrogenaseMm.331129 NS 22 863 118 Nes NestinMm.42855 NS 22 863 143 Ptov1 Prostate tumor over expressed gene 1Mm.22524 NS 21 823 364 AI481750 Expressed sequence AI481750Mm.29196 NS 21 823 77 Cyc1 Cytochrome c-1Mm.1451 NS 20 784 139 Mfge8 Milk fat globule-EGF factor 8 proteinMm.207432 NS 20 784 440 Atp1a2 ATPase, Na+/K+ transporting, alpha 2 polypeptideMm.278701 NS 20 784 77 Srebf1 Sterol regulatory element binding factor 1Mm.290774 BM-HPC 197 18,930 4638 Hspa8 Heat shock protein 8Mm.240490 BM-HPC 169 16,239 266 1200009K13Rik RIKEN cDNA 1200009K13 geneMm.274926 BM-HPC 132 12,684 853 Emb EmbiginMm.289630 BM-HPC 116 11,146 293 Ywhaq Tyrosine 3-monooxygenase/tryptophan

5-monooxygenase activation protein,theta polypeptide

Mm.16660 BM-HPC 81 7783 826 P4hb Prolyl 4-hydroxylase, beta polypeptide

(continued on next page)

1803E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

Table 2 (continued )

UniGene ID Library ESTs TPM TPM, UniGene Gene symbol Name

Mm.16771 BM-HPC 77 7399 479 H2-D1 Histocompatibility 2, D region locus 1Mm.4266 BM-HPC 73 7015 666 Itm2b Integral membrane protein 2BMm.273538 BM-HPC 71 6822 639 Tubb5 Tubulin, beta 5Mm.298 BM-HPC 63 6054 1252 Slc25a3 Solute carrier family 25 (mitochondrial

carrier, phosphate carrier), member 3Mm.258204 BM-HPC 62 5958 26 Nsep1 Nuclease-sensitive element binding protein 1Mm.125770 BM-HPC 53 5093 319 Gnas GNAS (guanine nucleotide binding protein,

alpha stimulating) complex locusMm.289800 BM-HPC 53 5093 613 Eif3s3 Eukaryotic translation initiation factor 3,

subunit 3 (gamma)Mm.166372 BM-HPC 50 4804 0 Rnf130 Ring finger protein 130Mm.371570 BM-HPC 48 4612 933 Pabpc1 Poly A binding protein, cytoplasmic 1Mm.296454 BM-HPC 46 4420 106 1700030G06Rik RIKEN cDNA 1700030G06 geneMm.4668 BM-HPC 46 4420 506 Mpo MyeloperoxidaseMm.336743 BM-HPC 44 4228 3918 Hspa8 Heat shock protein 8Mm.19187 BM-HPC 42 4036 319 Ptma Prothymosin alphaMm.288974 BM-HPC 41 3940 639 Arpc5 Actin-related protein 2/3 complex, subunit 5Mm.300095 BM-HPC 41 3940 53 Ugt1a2 UDP-glucuronosyltransferase 1 family, member 2Mm.30010 BM-HPC 39 3747 239 Arpc1b Actin-related protein 2/3 complex, subunit 1BMm.221440 BM-HPC 37 3555 293 Raly hnRNP-associated with lethal yellowMm.315962 BM-HPC 37 3555 426 Pgam1 Phosphoglycerate mutase 1Mm.18516 BM-HPC 36 3459 106 LOC433382 LOC433382Mm.30043 BM-HPC 36 3459 106 Spcs2 Signal peptidase complex subunit

2 homolog (S. cerevisiae)

The UniGene ID refers to build 145 ofMusmusculusUniGene, the ESTs to the number of ESTs thatmap to the UniGene cluster in each library, TPM(transcripts per million) to the normalized proportion of each transcript in each library and TPM, UniGene to the normalized expression of thistranscript in the brain (for NS and LVW) or bone marrow (for BM-HPC) (obtained using UniGene EST Profile Viewer available on-line at the NCBIsite [26]). Each UniGene cluster is further mapped to gene IDs, which were used to obtain Gene Symbol and Name.

1804 E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

Fisher's Exact Test, which is also implemented in the DigitalDifferential Display included in UniGene [26]. For the NSlibrary, the normalization procedure has removed trueexpression level information, and the library was thereforeexcluded from the analysis. We carried out all pair-wise over-and underrepresentation analyses for the remaining libraries(LVW, BM-HPC, NS-NIA and NS-DIFF) and adjusted the Pvalues for multiple hypothesis testing using the conservativeBonferroni adjustment. At the P < 0.05 significance level, 27 to110 transcripts were found to be differentially represented(Table 3). In the six comparisons carried out, a total of 187transcripts were identified as differentially represented be-tween any two libraries (Supplementary Material 4).

Analyses of unknowns and rare transcripts

A total of 3147 ESTs were not incorporated in the UniGenebuild 145 from the three EST libraries (Table 1). We attemptedto annotate these using sequence similarity searches againstknown mouse genes and the mouse genome. After lowering

Table 3 – Transcripts differentially represented between the d

LVW vs.BM-HPC

LVW vs.NS-NIA

Overrepresented in first library 24 (298) 40 (345)Overrepresented in second library 55 (423) 28 (601)In total 79 (721) 68 (946)

The numbers refer to the number of transcripts with P < 0.05 after the Boparenthesis refer to the number of transcripts with unadjusted P < 0.05.

the criteria for alignment (Fig. 1D), 1021 previously unseenESTs remained without acceptable annotation. These werefurther analyzed against the Rfam database of non-codingRNA [37], and 16 potential hits to tRNA and 1 to an internalribosome entry site for the Bag1 gene were obtained. No hitswere obtained in the non-coding RNA database available atEnsembl. Additional research, in combination with full-lengthsequencing, is required to completely characterize the un-known and rare transcripts, which may represent unsplicedmRNA, UTRs, novel transcript splice variants, non-coding RNAand sequences of low quality.

Furthermore, we searched for transcripts that weredetected multiple times in the NS-NIA (see below) or our NSor LVW libraries but had very few ESTs (<30) incorporated intothe UniGene clusters. This comparison identifies transcriptsthat have a restricted expression pattern and may identifygenes that have specific stem cell functions. We identified 167UniGene clusters that fulfilled these criteria, and 57 ESTs weresupported by at least three ESTs in our data. We calculated thefold enrichment of the normalized TPM counts for each

ifferent libraries

LVW vs.DIFF-NS

BM-HPC vs.NS-NIA

BM-HPC vs.DIFF-NS

NS-NIA vs.DIFF-NS

38 (324) 63 (532) 76 (524) 12 (315)35 (486) 20 (404) 34 (472) 15 (313)73 (810) 83 (936) 110 (996) 27 (628)

nferroni adjustment for multiple hypothesis testing. The numbers in

1805E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

transcript and report these, together with a preliminaryannotation, in Table 4.

Chromosome distribution of transcripts

We used the Pearson's Chi-square test to identify chromo-somes that had a deviating representation of UniGene clustersas compared to the expected distribution derived throughcalculation of the fractions of total UniGene clusters mappedto each chromosome (see Fig. 2A legend for details). In the fourlibraries included in the analysis (LVW, NS, BM-HPC and theNS-NIA), six chromosomes had a lower and four had a highernumber of UniGene clusters expressed compared to a randomdistribution of ESTs. The number of ESTs on chromosome 11was overrepresented in the LVW, NS and NIA-NS libraries(P < 0.05 in all). To investigate whether the EST distributionwas restricted to a certain region of the chromosome, we useda sliding-window approach to calculate the fraction of genesexpressed along each position of the chromosome (Fig. 2B).The analysis was carried out using the ‘known genes’ track ofthe USCS Genome Browser [38] and window size of 5 Mb. Forchromosome 11, the fraction of genes expressed in eachwindow varied between 0.2 and 0.6. The variability isexemplified by a detailed view of the 70–89 Mb region ofchromosome 11 (Fig. 2C), which has previously been identifiedas a major quantitative trait locus associated with hemato-poietic stem cell turnover [39]. In addition to expression-denseregions, the example region contains clusters of olfactoryreceptors and chemokine ligands for which no ESTs weredetected in the stem cell libraries, which is not surprisinggiven their narrow expression pattern [40].

Comparison between in vivo and in vitro NSC-containinglibraries

The LVW and NS libraries share the same tissue origin, andhence we expect a large overlap between the libraries. Sixty-seven percent of the LVW transcripts were found in the NSlibrary (Fig. 3A). However, only 46% of the NS-expressedtranscripts were detected in the LVW library, which is notsurprising given the higher number of clones sequenced fromthe NS library and the normalization procedure that isexpected to increase the detection of low abundance tran-scripts. The accumulation of transcripts expressed at lowlevels in the non-overlapping NS and LVW sets is also evidentwhen the global UniGene EST counts for the transcripts areanalyzed; transcripts detected only in the NS or LVW librarieshave significantly lower expression levels (P < 1 × 10−4, Welchtwo-sample t test) than transcripts detected in both libraries(Fig. 3B). The observed overlap (3590 transcripts) between thelibraries was significantly larger than would be expected fortwo unrelated libraries (∼3000 transcripts), which indicatesthat the in vivo expression pattern is partly retained in the invitromodel. Geneswith the highest EST count in the NS librarythat went undetected in the LVW library include Lrrc4b (34ESTs in NS), Wnt7b (30), Mmp14 (27), Gpiap1 (23), Srebf1 (20),Fkbp4 (17), Lfng (17), Bmp1 (17), Igfbp4 (16), Med25 (15) and Traf4(15). Geneswith high EST counts in the LVW library and absentfrom the NS library include Eef1a1 (63 ESTs in LVW), Ppp1r1b(61),Hpca (54), MGI:107562 (43), Eif4a2 (28), Pcp4l1 (28), Chn1 (27),

Gpr88 (24), LOC14433 (21) and Adcy5 (20). A complete listing ofthese genes is provided in Supplementary Material 5. Theexpression of such genes may indicate processes performedspecifically in the neurospheres and in the stem cell micro-environment, respectively, including the specialized actionscarried out by the differentiated cells present in the LVW.Functional annotation using EASE [30] further identifiedspecific themes being overrepresented in the in vitro stem/progenitor cells (NS library) versus in the in vivo microenvi-ronment (LVW library). The biological processes significant forthe NS library were ‘cell cycle’, (especially ‘mitotic cell cycle’and ‘S phase of mitotic cell cycle’), ‘cell proliferation’ and ‘DNAreplication’. For the LVW library ‘cell–cell signaling’, ‘synaptictransmission’, ‘cell communication’, ‘transmission of nerveimpulse’, ‘G-protein-coupled receptor pathway’ and ‘signaltransduction’ were overrepresented. Genes expressed prefer-entially in both libraries were those involved in ‘energyderivation by oxidation of organic compounds’, ‘energy path-ways’ and ‘main pathway of carbohydrate metabolism’.

Overlap between the libraries

To identify transcripts that are expressed by the stem-cell-containing populations, but not by differentiated cells, we firstcompared the overlap between the libraries LVW, NS and BM-HPC and continued by comparing the obtained intersection of1065 transcripts (Supplementary Material 6) to the transcriptsexpressed by the differentiated neurospheres (NS-DIFF; Li-brary ID 12357). The purpose of this second comparison is toremove ‘housekeeping’ genes from further analysis and toenrich the initial overlap for genes involved in stem-cell-related functions. The BM-HPC is a well-characterized in vitromodel with all common stem cell characteristics and has beenshown to resemble hematopoietic stem cells to a large extent[23]. A comparison that includes this library may therefore beenriched for genes involved in stem/progenitor cell character-istics and exclude genes involved in secondary downstreamneural differentiation processes. In total, 639 of the 1065transcripts that were expressed in LVW, NS and BM-HPClibraries were not expressed in the NS-DIFF library. Functionalannotation of these using EASE [30] and KEGG [41] identifiedseveral significantly overrepresented metabolism themes(‘proteasome’ (P = 0.000482), ‘pentose phosphate pathway’(0.00364), ‘glycolysis’ (0.00926), ‘glutamate metabolism’(0.0124) and others). Overrepresentation analysis using Bio-logical processes branch of Gene Ontology identified similarlyseveral metabolism-related themes as enriched, but interest-ingly also ‘protein localization’, ‘DNA methylation’, ‘cellproliferation’ and ‘cell death’ (Supplementary Material 7).

The set of 639 transcripts found in all three libraries, andexcluded from the NS-DIFF library, included also 40 tran-scripts, such as Atf4, Ccnh, Ezh2, Ncor2, Sin3b and Tfdp1, andothers, involved in ‘transcription’ (GO:0006350) (Supplemen-tary Material 8). We further investigated the chromosomaldistribution of the shared 1065 transcripts (Fig. 2D) in order tosearch for loci regulating stem cell behavior. We identified anoverrepresentation of genes located on chromosome 11(adjusted P = 0.0384, Person's Chi-square test) and onchromosome X (P = 0.0095) when comparing to the completeUniGene. For the shared 639 transcripts not expressed by the

Table 4 – Analysis of rare transcripts

UniGene ID ESTs in LVW ESTs in NS ESTs in NS-NIA Sum of ESTs ESTs in UniGene Enrichment Symbol

Mm.245186 0 0 4 4 4 268.3Mm.310490 0 0 4 4 5 268.3Mm.373963 0 3 0 3 4 201.2Mm.361153 0 0 3 3 4 201.2Mm.245264 0 0 6 6 9 134.2Mm.41972 0 0 4 4 6 134.2 Phxr4Mm.297496 1 14 0 15 28 77.4Mm.359113 0 3 0 3 6 67.1 Rap2bMm.360092 0 0 3 3 6 67.1Mm.122235 0 0 5 5 11 55.9Mm.354926 0 1 3 4 9 53.7Mm.312348 0 0 3 3 7 50.3Mm.44137 2 6 0 8 19 48.8 Egr4Mm.152937 0 10 0 10 25 44.7 Dbx2Mm.352316 0 6 0 6 16 40.2 LOC432901Mm.132710 0 0 3 3 8 40.2Mm.140700 1 2 3 6 17 36.6 2610316D01RikMm.372412 1 1 2 4 12 33.5Mm.319313 0 3 0 3 9 33.5Mm.359123 4 0 0 4 13 29.8Mm.357990 1 3 0 4 13 29.8 Cacng8Mm.259924 1 6 0 7 23 29.3 BC025575Mm.315457 2 2 0 4 14 26.8Mm.355640 0 0 3 3 11 25.2Mm.157588 0 0 3 3 12 22.4Mm.72944 1 4 0 5 21 21.0 B230396O12RikMm.131870 0 3 1 4 17 20.6Mm.252064 0 1 3 4 17 20.6Mm.244443 0 0 4 4 17 20.6 9230119C12RikMm.359259 0 3 0 3 13 20.1Mm.272495 1 3 2 6 27 19.2Mm.322371 2 1 0 3 14 18.3 Gm162Mm.290693 3 0 0 3 14 18.3 Gpr6Mm.214994 0 3 0 3 14 18.3 Cacng7Mm.41158 0 3 0 3 14 18.3Mm.351476 1 0 3 4 19 17.9Mm.326723 3 0 0 3 15 16.8 Serpina9Mm.196699 0 1 4 5 25 16.8 Pcdhb14Mm.372081 0 5 0 5 27 15.2Mm.259785 0 2 1 3 17 14.4Mm.350587 0 4 0 4 23 14.1Mm.145175 0 0 3 3 19 12.6 9530085L02RikMm.317894 2 0 2 4 26 12.2Mm.44882 0 4 0 4 26 12.2Mm.326648 0 0 4 4 27 11.7 LOC245297Mm.183264 0 3 0 3 21 11.2 Lrfn1Mm.3159 2 1 0 3 22 10.6 Phkg1Mm.333252 0 3 1 4 30 10.3 Adamts14Mm.360547 0 3 0 3 23 10.1Mm.347681 3 0 0 3 24 9.6 BC038167Mm.158483 0 3 0 3 24 9.6Mm.330639 1 2 0 3 26 8.7 Nkx2-2Mm.347730 0 3 0 3 28 8.0 C230009H10RikMm.32171 0 0 3 3 28 8.0 Gm1805Mm.479 0 1 2 3 29 7.7 Mybl1Mm.289463 2 0 1 3 30 7.5Mm.341540 3 0 0 3 30 7.5 Aldoa-ps1

We identified transcripts that had few supporting ESTs (<30) in the public databases but were present multiple times (>3) in our LVW or NS,or the NS-NIA data set. The number of ESTs in each library, as well as an enrichment factor (ratio between the transcript per million counts(TPM) in our library versus the complete UniGene) and gene symbol are reported. This analysis was carried out using UniGene build 146.

1806 E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

Fig. 2 – Distribution of the expressed UniGene clusters on the different chromosomes. (A) The chromosomal distributionpattern for each library was calculated in order to identify chromosomes with either under or overrepresentation of UniGeneclusters. The expected chromosomal frequencies were derived by calculating the distribution pattern for the 39,543 M.musculus UniGene build 145 clusters that were mapped to one of the autosomes or the X chromosome. UniGene clusters thatwere mapped to the Y chromosome (182), chromosome “Un” (1323) or lacked chromosomal assignment (6183) were excludedfrom further analysis. Significant deviation from the expected pattern was calculated using the Pearson's Chi-square test. ABonferroni-adjusted P value <0.05 was considered significant. The significantly deviating chromosomes are marked with anarrow. The coloring of the arrows assists in interpretation of which library is significantly deviating from the expected pattern.(B) For the LVWand the NS libraries, the fraction of transcripts expressed for different regions of chromosome 11 shows unevendistribution. Certain chromosomal regions are strongly underrepresented, while other regions show overrepresentation. Thisis exemplified in panel C, where the UniGene clusters that map to the region between 70 and 89 Mb of chromosome 11 areshown. Detected transcripts are shown in green. The regions with little or no expression contain olfactory receptors orchemokine ligands, which are not expected to be expressed in these libraries. (D) The overrepresentation of transcripts onchromosome 11 and underrepresentation on chromosome X is present also for the 1065 transcripts shared by the threelibraries. The reduced set of 639 transcripts not expressed in theNS-DIFF library (see Results for details) also shows a significantunderrepresentation of transcripts on chromosome X.

1807E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

Fig. 3 – Overlap between data sets. The overlap betweendifferent set of transcripts was analyzed using Venndiagrams. All comparisons were carried out using UniGeneIDs. (A) The overlap between the two NSC-containinglibraries. (B) The distribution of the number of ESTs in eachUniGene cluster, stratified by the three groups in the Venndiagram comparison between LVW and NS in panel A. Thisindicates that the shared genes have higher expressionlevels than the non-overlapping groups of genes. Thedifferences are statistically significant (Welch two-sample ttest). Data are plotted on log-scale, and the widths of theboxes are drawn in proportion to the number of observationsin each group. (C) Analysis of the overlap between the threelibraries. The NS library shows the largest proportion ofnon-overlapping genes, which is expected for a normalizedlibrary sequenced to high redundancy. (D) The overlapbetween the two neurosphere libraries NS-NIA and NS. TheNS-NIA library is a non-normalized library generatedelsewhere and described in more detail in Sharov et al. [17].(E) A stem cell signature described previously [17] andcomparison with the NS library. (F) Comparison between theneural stem/progenitor gene expression signature describedin [22] and the NS library shows a large overlap that includesthe majority of the genes in the stemness signature. In thefigures, n refers to the number of observations in each dataset.

1808 E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

differentiated neurospheres, chromosome X had a transcriptfrequency deviating from the expected (P = 0.0213) (Fig. 2D).

Comparison between two independent neurosphere libraries

Gene expression in neurospheres has been previously ana-lyzed on a smaller scale using a non-normalized cDNA library(NS-NIA; UniGene library ID 12356) [17,42]. An updatedannotation of these ESTs shows that 11,387 (85.7%) map toUniGene and cluster into 3358 transcripts (SupplementaryMaterial 9). The frequency distribution is similar to the LVWlibrary, but deviates from the NS library, which is in line withwhat would be expected in a comparison between a normal-ized and a non-normalized library. Sixty-four percent of thetranscripts in the NS-NIA library were detected in our NSlibrary (Fig. 3D and Supplementary Material 10). However,several transcripts expressed at low levels went undetected inthe NS-NIA library (data not shown). An overrepresentationanalysis of the shared transcripts identified several genesinvolved in protein and DNA metabolism, cell-to-cell contactsand signaling as enriched. The NS-NIA library contained 175transcription-related genes, of which 14 overlapped with the40 expressed in our three libraries (Supplementary Material 8).

Comparison of gene expression signatures for neural andother stem cells

A stemness gene expression signature has previously beensuggested [20,21]. However, the overlap in candidate stemnessgenes between the different studies has been surprisinglysmall [22]. A subsequent re-analysis of the data, limited to theneural stem/progenitor cells and supplemented with a thirddata set, identified a gene expression signature consisting of236 genes enriched in neural stem/progenitor cells [22]. Tofurther validate this signature, we analyzed the presence ofthe candidate stemness genes in our NS library. We firstupdated the annotation of the Affymetrix IDs used using EASE;228 probes received an annotation and were mapped to 223unique UniGene clusters. We could confirm the expression of174 genes (78%) in the NS library (Fig. 3F). These are listed inSupplementary Material 11 together with their expressionvalues in the NS, LVW and BM-HPC libraries. A different stemcell gene expression signature was described by Sharov et al.[17] and obtained by comparing ESTs from embryonic stemcells, embryonic germ cells, neural stem/progenitor cells andmesenchymal stem cells. The signature consists of 140 ESTs,which we re-annotated (as described in the legend ofSupplementary Material 12) and mapped to 130 UniGeneclusters. A comparison between the NS library and thissignature reveals an overlap of 80 transcripts (Fig. 3E), whichare listed in Supplementary Material 12. An additional 11transcripts were expressed in the LVW library, which bringsthe total number of common transcripts detected in the LVWor NS library to 91. A comparison of the overlap between thetranscripts expressed in the NS library and either of the twosignatures (i.e. sets of 174 and 80 transcripts) shows an overlapof only three transcripts (Tnc (Mm.980), Rnf138 (Mm.253542)and BC031407 (Mm.270044)). Interestingly, Tnc has beenindicated to play an important role in the establishment ofthe NSC in vivo microenvironment through modulation of

1809E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

growth factor signaling [43]. Furthermore, both Rnf138 andBC031407 (Gatad2a) encode Zn-finger proteins, possibly func-tioning as transcription factors.

Expression of stem cell genes

We further analyzed the expression of transcripts represent-ing known stem cell markers and found several of theexpected genes to be expressed (Table 5). The number ofESTs per transcript appears to correctly reflect the abundanceof mRNA in vivo, as exemplified by the well characterized NSCmarker Nestin that is represented by 22 ESTs. Noteworthy, theNS library expressed all but one of the known NSC markergenes whereas the BM-HPC library did not express any,providing further validation of the experimental setup.Furthermore, the combined effect of heterogeneity of theLVW library and the low proportion of true NSC in the SVZregion is reflected by the absence of detected ESTs for Msi1h,Egfr and Sox2, all known NSC markers, indicating that thesampling of the LVW library is not complete and sometranscripts may therefore be missed.

Discussion

To contribute to the understanding of NSC, we have, withinthe framework of this study, provided an analysis of the genesexpressed by neurospheres (NSCs' in vitro model system) andthe NSCs' in vivo microenvironment (LVW region). To controlfor neurosphere heterogeneity, we used early-passage neuro-spheres with high neurosphere reinitiation capacity. At

Table 5 – Genes expected to be expressed in the neural stem/p

Gene name Gene s

Epidermal growth factor receptor EgfrFibroblast growth factor receptor 1 FgfrFibroblast growth factor receptor 3 FgfrForkhead box G1 FoxForkhead box J1 FoxForkhead box J3 FoxKit ligand KitlLunatic fringe gene homolog (Drosophila) LfngMusashi homolog 1 (Drosophila) MsiMusashi homolog 2 (Drosophila) MsiNestin NesNotch gene homolog 1 (Drosophila) NotNotch gene homolog 2 (Drosophila) NotNotch gene homolog 3 (Drosophila) NotNumb gene homolog (Drosophila) NumPaired box gene 6 PaxProminin 1 PromRecombining binding protein suppressor of hairless (Drosophila) RbpSRY-box containing gene 1 SoxSRY-box containing gene 2 SoxSRY-box containing gene 3 SoxSRY-box containing gene 6 SoxSRY-box containing gene 8 Sox

Wemined the literature for genes associated with stem and progenitor celnd indicates that the transcript was not detected in the library.⁎ Contains the EST CX211077, which shares 92% sequence similarity with

harvesting, the size of the neurospheres was kept small tofurther avoid the formation of heterogeneous spheres. Insequencing of the neurosphere library, we used a normaliza-tion approach that increases the probability of detecting allexpressed transcripts by enhancing the detection of low-abundant transcripts and by avoiding excessive sampling ofhighly expressed genes. Sequencing of the neurospherelibrary has generated the highest number of publicly available,stem-cell-related ESTs, which serve as a potential futurereference for identification of genes expressed in NSCs. Wealso analyzed the LVW in vivo NSC microenvironment tofurther facilitate the identification of genes important formaintenance of the stem cell phenotype and genes involved instem and progenitor cell differentiation. Sequencing of theunfractionated LVW region provides a comprehensive list ofgenes expressed by cells, including stem cells, in this location.This is an inclusive strategy that provides information on thefull complement of the cells in this region. Together, thesedigital representations are important resources and can beused to resolve discrepancies in defining a stem cell signature,as evident from previous studies based on array analysis[20,21].

Use of EST data for estimation of gene expression levels isapproximate and provides reliable expression estimates onlyfor moderate to highly expressed genes. However, on theglobal level, the EST profile reflects the transcriptional activityof the sample and allows for an unbiased detection of theexpressed genes. The LVW, NS and BM-HPC libraries differ inthe number of unique transcripts represented, aswell as in thesize distribution of the UniGene clusters (Fig. 1C and Table 1).As expected, the normalized NS library contains the largest

rogenitor cells

ymbol Gene ID UniGene ID LVW NS NS-NIA BM-HPC

13649 Mm.8534 nd 4 17 nd1 14182 Mm.265716 2 6 nd nd3 14184 Mm.6904 2 8 1 ndg1 15228 Mm.4704 4 1 nd ndj1 15223 Mm.4985 2 3 nd ndj3 230700 Mm.259227 1 1 nd nd

17311 Mm.45124 nd 3 4 nd16848 Mm.12834 nd 17 nd nd

1h 17690 Mm.5077 nd 6 nd nd2h 76626 Mm.167586 2 ⁎ 3 nd

18008 Mm.331129 2 22 nd ndch1 18128 Mm.290610 nd 5 nd ndch2 18129 Mm.254017 nd 1 nd ndch3 18131 Mm.4945 nd 2 nd ndb 18222 Mm.4390 2 2 nd nd

6 18508 Mm.3608 nd 2 nd nd1 19126 Mm.6250 nd 1 nd nd

suh 19664 Mm.209292 nd 1 nd nd1 20664 Mm.39088 1 2 nd nd2 20674 Mm.4541 nd 1 2 nd3 20675 Mm.35784 nd 7 nd nd6 20679 Mm.377113 3 4 7 nd8 20681 Mm.258220 1 6 nd nd

ls and analyzed their expression in the NS, LVW and NS-NIA libraries.

Msi2h.

1810 E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

number of unique transcripts, with few transcripts detected athigh levels and several at low levels. The corresponding LVWin vivo microenvironment library showed fewer expressedtranscripts and a size distribution that was shifted towardshigher number of ESTs per cluster. The BM-HPC librarydiffered in its cluster size distribution from the two otherlibraries by its expression of a few genes at very high levels.This reflects the homogeneity of the cell population used forthe construction of the library, but probably also the lack ofexternal stimuli.

To identify transcripts that were significantly over orunderrepresented between the libraries, we used a conserva-tive Fisher's Exact Test and a strict Bonferroni adjustment tocontrol for multiple hypothesis testing. The number oftranscripts detected as differentially represented is thereforeprobably an underestimate. Furthermore, transcript abun-dance estimates are not possible for the normalized NS libraryas the within-library transcript redundancy is removed by thenormalization. In the comparisons between the other librar-ies, the largest differences were observed in comparisons thatincluded the BM-HPC library, which reflects the differenttissue origin for this library. This is also shown by the largeoverlap between the sets of transcripts that are overrepre-sented in the BM-HPC library when compared one-by-one toeither the LVW, NS-NIA or NS-DIFF library (47 of 55 (LVW), 63(NS-NIA) or 76 (NS-DIFF) transcripts overrepresented in BM-HPC library are shared by all three comparisons; data notshown).

We have also identified a set of rare transcripts and novelESTs, which are expressed by stem/progenitor cells or by thecells in the LVW microenvironment, and that are notavailable on other platforms. A comparison between theprobes on the currently most recent Affymetrix array fortranscriptional profiling in mouse (GeneChip Mouse Genome430 2.0 Array) and the transcripts expressed in the NS, LVWand BM-HPC libraries showed 9903 overlapping transcripts(analysis carried out on September 2005 annotation of theAffymetrix array). However, 1118 (10.2% of the transcriptsfrom the NS, LVW and BM-HPC libraries) were not present onthe Affymetrix array, and hence any genes among these thathave a function in maintenance of the undifferentiated stateor involved in subsequent differentiation processes would goundetected.

We used the Gene Ontology classification to carry out afunctional annotation of the transcripts expressed in the threelibraries. Themes such as ‘mitotic cell cycle’ and ‘cellproliferation’ were overrepresented in the in vitro neuro-sphere library, which correlates with the characteristics ofdividing stem/progenitor cells. The LVW library, on the otherhand, expressed genes that were involved in ‘cell–cellsignaling’, ‘transmission of nerve impulse’ and ‘G-protein-coupled receptor pathway’, which is in line with the libraryoriginating from a tissue where different cell types interactand where there is a mixture of stem, progenitor anddifferentiated cells. Genes that were expressed by bothlibraries were also involved in various carbohydrate-metabo-lism-related themes. It must be taken into account that low-abundant genes are more likely to be detected in thenormalized NS library and that abundant genes expressed inboth have a higher probability of being detected.

We also used the GO classification to mine the data forexpression of members of pathways previously associatedwith various stem cell characteristics (Supplementary Mate-rial 13 and 14). The Wnt (GO ID GO:0016055), Smoothened(GO:0007224), BMP (GO:0030509), Notch (GO:0007219) and TGFβreceptor (GO:0007179) signaling pathways have all previouslybeen shown to be involved in processes related to stem cellproliferation, self-renewal or differentiation. It is interesting tonote that in the NS library the Wnt and the Notch pathwaysmay be active since at least 40 (51%) respectively 10 (90%) ofknown members from each pathway are expressed. In theanalysis of the signaling pathways, it should be noted that thelibraries are not sequenced to completion, and hence somefalse negatives are expected, in particular, for pathwaymembers expressed at low levels.

The EST sequencing approach we have presented haspicked up several NSC or NSC-microenvironment-associatedtranscripts as potential candidates for further research.Instead of selecting a few genes for further characterization,we describe in Supplementary Material genes in our materialthat have been previously analyzed on large scale usingeither in situ hybridizations or tissue microarray immuno-histochemistry. Supplementary Material 15 lists 121 genes inour material that have been analyzed in the large scale insitu hybridization localization of transcription factors duringnervous system development by Gray et al. [44]. We foundthat the expression pattern of several of the transcriptionfactors (e.g. FoxJ1, FoxG1, Tcf12, Tcf4, Ndrg2 and Uhrf1)identified in our screen as candidate stem/progenitor cellmarkers is expressed in the ventricular zone in the develop-ing nervous system in a pattern that suggests a potentialfunctional role for these genes in neural stem/progenitorcells. Supplementary Materials 16 and 17 give the genes inour material that have been analyzed in the GeneAtlas andGenePaint in situ hybridization efforts by Carson et al. andVisel et al. [45,46]. Supplementary Material 18 lists the genesin our material for which antibodies have been generated totheir predicted gene products and analyzed on human tissueand cancer microarrays [47].

In conclusion, to contribute to the analysis of the neuralstem cell transcriptome, we have presented a catalog of genesexpressed in neural stem/progenitor cells and in the in vivolateral ventricle wall microenvironment. We have shown anextensive overlap in gene expression between the microenvi-ronment and the neurosphere in vitro neural stem model. Inaddition to Supplementary Material that contains detailedresults of the analyses, we have made the EST data publiclyavailable through the GenBank. Using the UniGene ESTclustering to annotate the clones, we have identified raretranscripts that are overrepresented in our libraries comparedto the complete Mouse UniGene and we have also showedstatistically significant differences in EST representationsbetween the different libraries. The comparison of the invivo microenvironment and the in vitro NSC expressionpattern to that of a multipotent progenitor/stem cell linegenerated a list of genes shared by the three libraries. Inparticular, we have identified 40 transcription factors andtranscription-related genes that are expressed in all threelibraries, but not by differentiated neurospheres. We have alsoshown a large overlap between stem cell gene expression

1811E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

signatures with previously published studies [17,22]. Altogeth-er, these digital signatures serve as valuable resources forfuture investigations of the neural stem cell transcriptomeand for elucidation of a stem cell signature.

Acknowledgments

Wewish to thank the personnel at the KTHGenome Center forexcellent technical assistance with the sequencing, and PeterSavolainen, Rikard Erlandsson, Karl Nyberg and Fredrik Sterkyfor valuable discussions on the EST sequence analysis. Thiswork was supported by grants from the Knut and AliceWallenberg Foundation, the Wallenberg Consortium North,the Swedish Cancer Foundation, the Swedish ScientificResearch Council and the Foundation for Strategic Research.LC is supported by the Tobias Foundation.

Appendix A. Supplementary data

Supplementary data associated with this article can be foundin the online version at doi:10.1016/j.yexcr.2006.02.012.

R E F E R E N C E S

[1] I.L. Weissman, D.J. Anderson, F. Gage, Stem and progenitorcells: origins, phenotypes, lineage commitments, andtransdifferentiations, Annu. Rev. Cell Dev. Biol. 17 (2001)387–403.

[2] G.J. Spangrude, S. Heimfeld, I.L. Weissman, Purification andcharacterization of mouse hematopoietic stem cells, Science241 (1988) 58–62.

[3] N. Uchida, L. Jerabek, I.L. Weissman, Searching forhematopoietic stem cells. II. The heterogeneity of Thy-1.1(lo)Lin(-/lo)Sca-1+ mouse hematopoietic stem cells separated bycounterflow centrifugal elutriation, Exp. Hematol. 24 (1996)649–659.

[4] J.F. Zhong, Y. Zhao, S. Sutton, A. Su, Y. Zhan, L. Zhu, C. Yan, T.Gallaher, P.B. Johnston, W.F. Anderson, M.P. Cooke, Geneexpression profile of murine long-term reconstituting vs.short-term reconstituting hematopoietic stem cells, Proc.Natl. Acad. Sci. U. S. A. 102 (2005) 2448–2453.

[5] S.J. Morrison, I.L. Weissman, The long-term repopulatingsubset of hematopoietic stem cells is deterministic andisolatable by phenotype, Immunity 1 (1994) 661–673.

[6] A. Capela, S. Temple, LeX/ssea-1 is expressed by adult mouseCNS stem cells, identifying them as nonependymal, Neuron35 (2002) 865–875.

[7] C.B. Johansson, S. Momma, D.L. Clarke, M. Risling, U. Lendahl,J. Frisen, Identification of a neural stem cell in the adultmammalian central nervous system, Cell 96 (1999) 25–34.

[8] R.L. Rietze, H. Valcanis, G.F. Brooker, T. Thomas, A.K. Voss,P.F. Bartlett, Purification of a pluripotent neural stem cellfrom the adult mouse brain, Nature 412 (2001) 736–739.

[9] N. Uchida, D.W. Buck, D. He, M.J. Reitsma, M. Masek, T.V.Phan, A.S. Tsukamoto, F.H. Gage, I.L. Weissman, Directisolation of human central nervous system stem cells, Proc.Natl. Acad. Sci. U. S. A. 97 (2000) 14720–14725.

[10] A. Lee, J.D. Kessler, T.A. Read, C. Kaiser, D. Corbeil, W.B.Huttner, J.E. Johnson, R.J. Wechsler-Reya, Isolation of neuralstem cells from the postnatal cerebellum, Nat. Neurosci.8 (2005) 723–729.

[11] F.H. Gage, Mammalian neural stem cells, Science 287 (2000)1433–1438.

[12] F. Doetsch, I. Caille, D.A. Lim, J.M. Garcia-Verdugo, A.Alvarez-Buylla, Subventricular zone astrocytes are neuralstem cells in the adult mammalian brain, Cell 97 (1999)703–716.

[13] C.M. Morshead, Adult neural stem cells: attempting to solvethe identity crisis, Dev. Neurosci. 26 (2004) 93–100.

[14] F. Doetsch, L. Petreanu, I. Caille, J.M. Garcia-Verdugo, A.Alvarez-Buylla, EGF converts transit-amplifying neurogenicprecursors in the adult brain into multipotent stem cells,Neuron 36 (2002) 1021–1034.

[15] J. Aubert, M.P. Stavridis, S. Tweedie, M. O'Reilly, K.Vierlinger, M. Li, P. Ghazal, T. Pratt, J.O. Mason, D. Roy, A.Smith, Screening for mammalian neural genes viafluorescence-activated cell sorter purification of neuralprecursors from Sox1-gfp knock-in mice, Proc. Natl. Acad.Sci. U. S. A. 100 (Suppl. 1) (2003) 11836–11841.

[16] D.H. Geschwind, J. Ou, M.C. Easterday, J.D. Dougherty, R.L.Jackson, Z. Chen, H. Antoine, A. Terskikh, I.L. Weissman, S.F.Nelson, H.I. Kornblum, A genetic analysis of neural progenitordifferentiation, Neuron 29 (2001) 325–339.

[17] A.A. Sharov, Y. Piao, R. Matoba, D.B. Dudekula, Y. Qian, V.VanBuren, G. Falco, P.R. Martin, C.A. Stagg, U.C. Bassey, Y.Wang, M.G. Carter, T. Hamatani, K. Aiba, H. Akutsu, L.Sharova, T.S. Tanaka, W.L. Kimber, T. Yoshikawa, S.A.Jaradat, S. Pantano, R. Nagaraja, K.R. Boheler, D. Taub, R.J.Hodes, D.L. Longo, D. Schlessinger, J. Keller, E. Klotz, G. Kelsoe,A. Umezawa, A.L. Vescovi, J. Rossant, T. Kunath, B.L. Hogan,A. Curci, M. D'Urso, J. Kelso, W. Hide, M.S. Ko, Transcriptomeanalysis of mouse stem cells and early embryos, PLoS Biol. 1(2003) E74.

[18] S.L. Karsten, L.C. Kudo, R. Jackson, C. Sabatti, H.I. Kornblum,D.H. Geschwind, Global analysis of gene expression in neuralprogenitors reveals specific cell-cycle, signaling, andmetabolic networks, Dev. Biol. 261 (2003) 165–182.

[19] M.C. Easterday, J.D. Dougherty, R.L. Jackson, J. Ou, I. Nakano,A.A. Paucar, B. Roobini, M. Dianati, D.K. Irvin, I.L. Weissman,A.V. Terskikh, D.H. Geschwind, H.I. Kornblum, Neuralprogenitor genes. Germinal zone expression and analysis ofgenetic overlap in stem cell populations, Dev. Biol. 264 (2003)309–322.

[20] M. Ramalho-Santos, S. Yoon, Y. Matsuzaki, R.C. Mulligan, D.A.Melton, “Stemness”: transcriptional profiling of embryonicand adult stem cells, Science 298 (2002) 597–600.

[21] N.B. Ivanova, J.T. Dimos, C. Schaniel, J.A. Hackney, K.A. Moore,I.R. Lemischka, A stem cell molecular signature, Science 298(2002) 601–604.

[22] N.O. Fortunel, H.H. Otu, H.H. Ng, J. Chen, X. Mu, T.Chevassut, X. Li, M. Joseph, C. Bailey, J.A. Hatzfeld, A.Hatzfeld, F. Usta, V.B. Vega, P.M. Long, T.A. Libermann, B.Lim, Comment on “ ‘Stemness’: transcriptional profiling ofembryonic and adult stem cells” and “a stem cell molecularsignature”, Science 302 (2003) 393 (author reply 393).

[23] O.P. Pinto do, K. Richter, L. Carlsson, Hematopoieticprogenitor/stem cells immortalized by Lhx2 generatefunctional hematopoietic cells in vivo, Blood 99 (2002)3939–3946.

[24] M.A. Marra, T.A. Kucaba, L.W. Hillier, R.H. Waterston,High-throughput plasmid DNA purification for 3 cents persample, Nucleic Acids Res. 27 (1999) e37.

[25] B. Ewing, L. Hillier, M.C. Wendl, P. Green, Base-calling ofautomated sequencer traces using phred. I. Accuracyassessment, Genome Res. 8 (1998) 175–185.

[26] D.L. Wheeler, T. Barrett, D.A. Benson, S.H. Bryant, K.Canese, V. Chetvernin, D.M. Church, M. DiCuccio, R. Edgar,S. Federhen, L.Y. Geer, W. Helmberg, Y. Kapustin, D.L.Kenton, O. Khovayko, D.J. Lipman, T.L. Madden, D.R.Maglott, J. Ostell, K.D. Pruitt, G.D. Schuler, L.M. Schriml, E.

1812 E X P E R I M E N T A L C E L L R E S E A R C H 3 1 2 ( 2 0 0 6 ) 1 7 9 8 – 1 8 1 2

Sequeira, S.T. Sherry, K. Sirotkin, A. Souvorov, G.Starchenko, T.O. Suzek, R. Tatusov, T.A. Tatusova, L.Wagner, E. Yaschenko, Database resources of the NationalCenter for Biotechnology Information, Nucleic Acids Res.34 (2006) D173–D180.

[27] R Development Core Team R: a language and environment forstatistical computing. R Foundation for StatisticalComputing, Vienna, Austria.

[28] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang,W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs, NucleicAcids Res. 25 (1997) 3389–3402.

[29] J.A. Blake, J.E. Richardson, C.J. Bult, J.A. Kadin, J.T. Eppig, MGD:the mouse genome database, Nucleic Acids Res. 31 (2003)193–195.

[30] D.A. Hosack, G. Dennis Jr., B.T. Sherman, H.C. Lane, R.A.Lempicki, Identifying biological themes within lists of geneswith EASE, Genome Biol. 4 (2003) R70.

[31] M.B. Soares, M.F. Bonaldo, P. Jelene, L. Su, L. Lawton, A.Efstratiadis, Construction and characterization of anormalized cDNA library, Proc. Natl. Acad. Sci. U. S. A. 91(1994) 9228–9232.

[32] J.G. Wetmur, N. Davidson, Kinetics of renaturation of DNA,J. Mol. Biol. 31 (1968) 349–370.

[33] A.I. Su, T. Wiltshire, S. Batalov, H. Lapp, K.A. Ching, D.Block, J. Zhang, R. Soden, M. Hayakawa, G. Kreiman, M.P.Cooke, J.R. Walker, J.B. Hogenesch, A gene atlas of themouse and human protein-encoding transcriptomes, Proc.Natl. Acad. Sci. U. S. A. 101 (2004) 6062–6067.

[34] T. Barrett, T.O. Suzek, D.B. Troup, S.E. Wilhite, W.C. Ngau, P.Ledoux, D. Rudnev, A.E. Lash, W. Fujibuchi, R. Edgar, NCBIGEO: mining millions of expression profiles—Database andtools, Nucleic Acids Res. 33 (2005) D562–D566.

[35] J. Nishino, K. Yamashita, H. Hashiguchi, H. Fujii, T. Shimazaki,H. Hamada, Meteorin: a secreted protein that regulates glialcell differentiation and promotes axonal extension, EMBO J.23 (2004) 1998–2008.

[36] A.V. Molofsky, R. Pardal, T. Iwashita, I.K. Park, M.F. Clarke, S.J.Morrison, Bmi-1 dependence distinguishes neural stem cellself-renewal from progenitor proliferation, Nature 425 (2003)962–967.

[37] S. Griffiths-Jones, S. Moxon, M. Marshall, A. Khanna, S.R.Eddy, A. Bateman, Rfam: annotating non-coding RNAs incomplete genomes, Nucleic Acids Res. 33 (2005) D121–D124.

[38] W.J. Kent, C.W. Sugnet, T.S. Furey, K.M. Roskin, T.H. Pringle, A.M. Zahler, D. Haussler, The human genome browser at UCSC,Genome Res. 12 (2002) 996–1006.

[39] L. Bystrykh, E. Weersing, B. Dontje, S. Sutton, M.T. Pletcher, T.Wiltshire, A.I. Su, E. Vellenga, J. Wang, K.F. Manly, L. Lu, E.J.

Chesler, R. Alberts, R.C. Jansen, R.W. Williams, M.P. Cooke, G.de Haan, Uncovering regulatory pathways that affecthematopoietic stem cell function using ‘genetical genomics’,Nat. Genet. 37 (2005) 225–232.

[40] J.M. Young, B.M. Shykind, R.P. Lane, L. Tonnes-Priddy, J.A.Ross, M. Walker, E.M. Williams, B.J. Trask, Odorant receptorexpressed sequence tags demonstrate olfactory expression ofover 400 genes, extensive alternate splicing and unequalexpression levels, Genome Biol. 4 (2003) R71.

[41] M. Kanehisa, S. Goto, KEGG: kyoto encyclopedia of genes andgenomes, Nucleic Acids Res. 28 (2000) 27–30.

[42] R. Galli, R. Fiocco, L. De Filippis, L. Muzio, A. Gritti, S. Mercurio,V. Broccoli, M. Pellegrini, A. Mallamaci, A.L. Vescovi, Emx2regulates the proliferation of stem cells of the adultmammalian central nervous system, Development 129 (2002)1633–1644.

[43] E. Garcion, A. Halilagic, A. Faissner, C. ffrench-Constant,Generation of an environmental niche for neural stem celldevelopment by the extracellularmatrixmolecule tenascin C,Development 131 (2004) 3423–3432.

[44] P.A. Gray, H. Fu, P. Luo, Q. Zhao, J. Yu, A. Ferrari, T. Tenzen, D.I.Yuk, E.F. Tsung, Z. Cai, J.A. Alberta, L.P. Cheng, Y. Liu, J.M.Stenman, M.T. Valerius, N. Billings, H.A. Kim, M.E. Greenberg,A.P. McMahon, D.H. Rowitch, C.D. Stiles, Q. Ma, Mouse brainorganization revealed through direct genome-scale TFexpression analysis, Science 306 (2004) 2255–2257.

[45] J.P. Carson, T. Ju, H.C. Lu, C. Thaller, M. Xu, S.L. Pallas, M.C.Crair, J. Warren, W. Chiu, G. Eichele, A digital atlas tocharacterize the mouse brain transcriptome, PLoS Comput.Biol. 1 (2005) e41.

[46] A. Visel, C. Thaller, G. Eichele, GenePaint.org: an atlas of geneexpression patterns in the mouse embryo, Nucleic Acids Res.32 (2004) D552–D556.

[47] M. Uhlen, E. Bjorling, C. Agaton, C.A. Szigyarto, B. Amini, E.Andersen, A.C. Andersson, P. Angelidou, A. Asplund, C.Asplund, L. Berglund, K. Bergstrom, H. Brumer, D. Cerjan, M.Ekstrom, A. Elobeid, C. Eriksson, L. Fagerberg, R. Falk, J. Fall, M.Forsberg, M.G. Bjorklund, K. Gumbel, A. Halimi, I. Hallin, C.Hamsten, M. Hansson, M. Hedhammar, G. Hercules, C. Kampf,K. Larsson, M. Lindskog, W. Lodewyckx, J. Lund, J. Lundeberg,K. Magnusson, E. Malm, P. Nilsson, J. Odling, P. Oksvold, I.Olsson, E. Oster, J. Ottosson, L. Paavilainen, A. Persson, R.Rimini, J. Rockberg, M. Runeson, A. Sivertsson, A. Skollermo, J.Steen, M. Stenvall, F. Sterky, S. Stromberg, M. Sundberg, H.Tegel, S. Tourle, E. Wahlund, A. Walden, J. Wan, H. Wernerus,J. Westberg, K. Wester, U. Wrethagen, L.L. Xu, S. Hober, F.Ponten, A human protein atlas for normal and cancer tissuesbased on antibody proteomics, Mol. Cell. Proteomics 4 (2005)1920–1932.