eukaryotic genomes: fungi wednesday, october 22, 2003 introduction to bioinformatics me:440.714 j....
TRANSCRIPT
Eukaryotic Genomes:Fungi
Wednesday, October 22, 2003
Introduction to BioinformaticsME:440.714J. Pevsner
Many of the images in this powerpoint presentationare from Bioinformatics and Functional Genomicsby J Pevsner (ISBN 0-471-21004-8). Copyright © 2003 by Wiley.
These images and materials may not be usedwithout permission from the publisher.
Visit http://www.bioinfbook.org
Copyright notice
We are in the last third of the course:
Today: Fungi. Exam #2 is due at the start of class.
Next Monday: Functional genomics (Jef Boeke)Next Wednesday: Pathways (Joel Bader)
Monday Nov. 3: Eukaryotic genomesWednesday Nov. 5: Human genome
Monday Nov. 10: Human diseaseWednesday Nov. 12: Final exam (in class)
Announcements
Outline of today’s lecture
Description and classification of fungi
The Saccharomyces cerevisiae genome
Duplication of the yeast genome
Functional genomics in yeast
Comparative genomics of fungi
Introduction to fungi: phylogeny
Fungi are eukaryotic organisms that can be filamentous (e.g. molds) or unicellular (e.g. the yeast Saccharomycescerevisiae).
Most fungi are aerobic (but S. cerevisiae can grow anaerobically). Fungi have major roles in the ecosystemin degrading organic waste. They have important rolesin fermentation, including the manufacture of steroidsand penicillin.
Several hundred fungal species are known to causedisease in humans.
Eukaryotes(Baldauf et al., 2000)
Fungi and metazoa are sister groups
Fig. 15.1Page 504Baldauf et al., 2000
Classification of fungi
About 70,000 fungal species have been described (as of 1995), but 1.5 million species may exist.
Four phyla:Ascomycota yeasts, truffles, lichens
Basidiomycota rusts, smuts, mushroomsChytridiomycota AllomycesZygomycota feed on decaying vegetation
Box 15-1Page 505
Classification of fungi
About 70,000 fungal species have been described (as of 1995), but 1.5 million species may exist.
Four phyla:Ascomycota yeasts, truffles, lichens
Hemiascomycetae Génolevure projectEuascomycetae NeurosporaLoculoascomycetaeLaboulbeniomycetae parasites of insects
Basidiomycota rusts, smuts, mushroomsChytridiomycota AllomycesZygomycota feed on decaying vegetation
Box 15-1Page 505
Page 505
Page 505
Introduction to Saccharomyces cerevisiae
First species domesticated by humans
Called baker’s yeast (or brewer’s yeast)
Ferments glucose to ethanol and carbon dioxide
Model organism for studies of biochemistry,genetics, molecular and cell biology
…rapid growth rate…easy to modify genetically…features typical of eukaryotes…relatively simple (unicellular)…relatively small genome
Page 505
Sequencing the S. cerevisiae genome
The genome was sequenced by a highly cooperative consortium in the early 1990s, chromosome by chromosome(the whole genome shotgun approach was not used).
This involved 600 researchers in > 100 laboratories.
--Physical map created for all XVI chromosomes--Library of 10 kb inserts constructed in phage--The inserts were assembled into contigs
The sequence released in 1996, and published in 1997(Goffeau et al., 1996; Mewes et al., 1997)
Page 505
Features of the S. cerevisiae genome
Sequenced length: 12,068 kb = 12,068,000 base pairs Length of repeats: 1,321 kbTotal length: 13,389 kb (~ 13 Mb)
Open reading frames (ORFs): 6,275 Questionable ORFs (qORFs): 390 Hypothetical proteins: 5,885
Introns in ORFs: 220Introns in UTRs: 15Intact Ty elements: 52tRNA genes: 275snRNA genes: 40
Page 506
Features of the S. cerevisiae genome
A notable feature of the genome is its high gene density(about one gene every 2 kilobases). Most bacteria haveabout one gene per kb, but most eukaryotes have a much sparser gene density.
Also, only 4% of S. cerevisiae genes are interruptedby introns. By contrast, 40% of Schizosaccharomycespombe genes have introns.
What are the most common protein families and proteindomains? You can see the answer at EBI’s website:http://www.ebi.ac.uk/proteome/
Page 506
Fig. 15.2Page 508
Page 506
Fig. 15.3Page 509http://www.ebi.ac.uk/proteome/
The EBI website offers a variety of proteome analysis tools, such as this summary of protein length distribution in S. cerevisiae.
ORFs in the S. cerevisiae genome
How are ORFs defined? In the initial genome analysis,an ORF was defined as >100 codons (thus specifyinga protein of ~11 kilodaltons).
390 ORFs were listed as “questionable”, because they were considered unlikely to be authentic genes. For example, they were short, or exhibited unlikely preferences for codon usage.
How many ORFs are there in the yeast genome?There are 40,000 ORFs > 20 amino acids; how many of these are authentic?
Page 506-507
ORFs in the S. cerevisiae genome
Several criteria may be applied to decide if ORFs are authentic protein-coding genes: [1] evidence of conservation in other organisms [2] experimental evidence of gene expression (microarrays, SAGE, functional genomics)
The groups of Elizabeth Winzeler and Michael Snyder eachrecently described hundreds of previously unannotatedgenes that are transcribed and translated.
Page 507
ORFs in the S. cerevisiae genome
The MIPS Comprehensive Yeast Genome Database lists criteria for assigning ORFs, based on FASTAsearch scores:
NumberCategory of proteinsKnown protein 3400Strong similarity to known protein 230Similarity or weak similarity to known protein 825Similarity to unknown protein 1007No similarity 516Questionable ORF 472
Total 6450
Page 507, 510
Exploring a typical S. cerevisiae chromosome
We will next familiarize ourselves with the S. cerevisiaegenome by exploring a typical chromosome, XII.
Page 508
Exploring a typical S. cerevisiae chromosome
We will next familiarize ourselves with the S. cerevisiaegenome by exploring a typical chromosome, XII.
This chromosome features• 38% GC content• very little repetitive DNA• few introns• six Ty elements (transposable elements)• a high ORF density: 534 ORFs > 100aa, and 72% of the chromosome has protein-coding genes
Page 508-511
Key S. cerevisiae databases
Web resources include:
NCBI (Entrez Genome Eukaryotic genome projects)
EBIhttp://www.ebi.ac.uk/proteome/
SGD: Saccharomyces Genome Databasehttp://genome-www.stanford.edu/Saccharomyces/
MIPS Comprehensive Yeast Genome Database(MIPS = Munich Information Center for Protein Sequences)http://mips.gsf.de/proj/yeast/CYGD/db/
Page 508
NCBI: Entrez genomes for yeast resources
Fig. 15.4Page 510
NCBI: Entrez genomes for yeast resources
~Fig. 15.5Page 511
NCBI: Entrez genomes for yeast resources
~Fig. 15.5Page 511
Fig. 15.6Page 512
MIPS offers a ComprehensiveYeast Genome Database
http://mips.gsf.de/genre/proj/yeast/index.jsp
Fig. 15.7Page 513http://www.yeastgenome.org/
Saccharomyces Genome Database (SGD)
Fig. 15.7Page 513
S. cerevisiae gene nomenclature
YKL159c
Y = yeastK = 11th chromosomeL = left (or right) arm159 = 159th ORFc = Crick (bottom) or w (Watson, top) strand
Box 15-2Page 514
S. cerevisiae gene nomenclature
YKL159c
Y = yeastK = 11th chromosomeL = left (or right) arm159 = 159th ORFc = Crick (bottom) or w (Watson, top) strand
RCN1 = wildtype geneRcn1p = proteinrcn1 = mutant allele
Box 15-2Page 514
Duplication of the S. cerevisiae genome
Analysis of the S. cerevisiae genome revealed that manyregions are duplicated, both intrachromosomally andinterchromosomally (within and between chromosomes).These duplicated regions include both genes andnongenic regions.
Such duplications reflect a fundamental aspect ofgenome evolution.
What are the mechanisms by which regions of the genomeduplicate?
Page 511
Duplication of the S. cerevisiae genome
Mechanisms of gene duplication
tandem repeatslippageduring
recombination
Geneconversion
Lateralgene
transfer
Segmentalduplication
polyploidye.g.
genometetraploidy
Fig. 15.8Page 514
Duplication of the S. cerevisiae genome
Fate of duplicated genes
Bothcopiespersist
One copy isdeleted
One copybecomes a
pseudogene
One copyfunctionally
diverges
Fig. 15.8Page 514
Duplication of the S. cerevisiae genome
In 1970, Susumu Ohno published the book Evolution by Gene Duplication.
He hypothesized that vertebrate genomes evolved by two rounds of whole genome duplication. This providedgenomes with the “raw materials” (new genes) with which to introduce various innovations.
Page 512
Duplication of the S. cerevisiae genome
Ohno (1970):
“Had evolution been entirely dependent upon naturalselection, from a bacterium only numerous forms ofbacteria would have emerged. The creation of metazoans,vertebrates, and finally mammals from unicellularorganisms would have been quite impossible, for suchbig leaps in evolution required the creation of new geneloci with previously nonexistent function. Only the cistron that became redundant was able to escape fromthe relentless pressure of natural selection. By escaping,it accumulated formerly forbidden mutations to emergeas a new gene locus.”
Page 512
Duplication of the S. cerevisiae genome
Wolfe and Shields (1997, Nature) provided support forOhno’s paradigm. They hypothesized that the yeast genome duplicated about 100 million years ago. There was a diploid yeast genome with about 5,000 genes. It doubled to a tetraploid number of 10,000 genes. Then there was massive gene loss and chromosomal rearrangement to yield thepresent day 6,000 genes.
Page 515
Fig. 15.9Page 515
Distance along chromosome X (kb)
Dis
tan
ce a
lon
g c
hro
mo
som
e X
I (k
b) Wolfe and Shields (1997)
performed blastp and found 55 blocks ofduplicated regions. Theyproposed that the entireS. cerevisiae genomeunderwent a duplication.
Matches with scores >200are shown. These arearranged in blocks of genes.
Duplication of the S. cerevisiae genome
Evidence of genome duplication in yeast-- Systematic BLAST searches show 55 blocks of duplicated sequences.-- There are 376 pairs of homologous genes.
You can see the results of chromosomal comparisonson Ken Wolfe’s web site and at the SGD web site.
Page 515
Fig. 15.10Page 516
The SGD website includes a pairwise chromosomesimilarity viewer.
Kenneth Wolfe offers a website that permits analysisof yeast duplications:http://oscar.gen.tcd.ie/~khwolfe/yeast/
Page 516
Page 516
As an example,note the SSO1 gene on XVI
SSO1 (XVI) & SSO2 (XVIII)are part ofa block
Duplication of the S. cerevisiae genome
Two models for the presence of duplication blocks
[1] Whole genome duplication (tetraploidy) followed by gene loss and rearrangements
[2] Successive, independent duplication events
Page 516
Duplication of the S. cerevisiae genome
Model [1] is favored for several reasons:
-- For 50 of 55 duplicated regions, the orientation of the entire block is preserved with respect to the centromere. The orientation is not random.
-- For model [2] we would expect 7 triplicated regions. We observe only 0 or 1.
-- Gene order is maintained in 14 hemiascomycetes (the Génolevures project)
Page 516
Duplication of the S. cerevisiae genome
The Génolevures project:
-- Partial sequencing of 13 hemiascomycetes-- Gene order can be compared in 14 fungi-- 70% of the S. cerevisiae genome maps to sister regions with only minimal overlap-- Proposal that the 16 centromeres form 8 pairs
Page 517
Duplication of the S. cerevisiae genome
The Génolevures project:
-- Partial sequencing of 13 hemiascomycetes-- Gene order can be compared in 14 fungi-- 70% of the S. cerevisiae genome maps to sister regions with only minimal overlap-- Proposal that the 16 centromeres form 8 pairs
Phylogenetic analyses place the divergence of S. cerevisiaeand Kluyveromyces lactis prior to the whole genomeduplication (~100 million years ago). Perhaps the genomeduplication enabled S. cerevisiae to acquire new propertiessuch as the capacity for anaerobic growth.
Page 517
Duplication of the S. cerevisiae genome
What is the fate of duplicated genes?
A duplicated gene (overall in eukaryotes) has a half lifeof just several million years (Lynch and Conery, 2000).
50% to 92% of duplicated genes are lost (Wagner, 2001)
Consider four possible fates of a duplicated gene:
Page 517
Duplication of the S. cerevisiae genome
What is the fate of duplicated genes?
A duplicated gene (overall in eukaryotes) has a half lifeof just several million years (Lynch and Conery, 2000).
50% to 92% of duplicated genes are lost (Wagner, 2001)
Consider four possible fates of a duplicated gene:[1] Both copies persist (gene dosage effect)
Page 517
Duplication of the S. cerevisiae genome
What is the fate of duplicated genes?
A duplicated gene (overall in eukaryotes) has a half lifeof just several million years (Lynch and Conery, 2000).
50% to 92% of duplicated genes are lost (Wagner, 2001)
Consider four possible fates of a duplicated gene:[1] Both copies persist (gene dosage effect)[2] One copy is deleted (a common fate)
Page 517
Duplication of the S. cerevisiae genome
What is the fate of duplicated genes?
A duplicated gene (overall in eukaryotes) has a half lifeof just several million years (Lynch and Conery, 2000).
50% to 92% of duplicated genes are lost (Wagner, 2001)
Consider four possible fates of a duplicated gene:[1] Both copies persist (gene dosage effect)[2] One copy is deleted (a common fate)[3] One copy accumulates mutations and becomes a pseudogene (no functional protein product)
Page 517
Duplication of the S. cerevisiae genome
What is the fate of duplicated genes?
A duplicated gene (overall in eukaryotes) has a half lifeof just several million years (Lynch and Conery, 2000).
50% to 92% of duplicated genes are lost (Wagner, 2001)
Consider four possible fates of a duplicated gene:[1] Both copies persist (gene dosage effect)[2] One copy is deleted (a common fate)[3] One copy accumulates mutations and becomes a pseudogene (no functional protein product)[4] One copy (or both) diverges functionally. The organism can perform a novel function.
Page 517
Duplication of the S. cerevisiae genome
Why are duplicated genes commonly lost? It might seemhighly advantageous to have a second copy of gene,thus permitting functional divergence.
Ohno suggested two reasons:
[1] After duplication, a deleterious mutation in one of the twogenes might now persist. Without duplication, the individual would have been selected against by such a mutation.
[2] The presence of a new paralogous sequence could lead tounequal crossing over of homologous chromosomes during meiosis.
Page 518
Duplication of the S. cerevisiae genome
To consider the fate of duplicated genes, consider theexample of genes involved in vesicle transport.
Vesicles carry cargo from one destination to another.Proteins on vesicles (e.g. vesicle-associated membraneprotein, VAMP; Snc1p in yeast) bind to proteins on targetmembranes (e.g. syntaxin in mammalian and othereukaryotic systems, or Sso1p in yeast).
In S. cerevisiae, genome duplication appears to be responsible for the presence of two syntaxins(SSO1 and SSO2) and two VAMPs (SNC1 and SNC2).
Page 518
Duplication of the S. cerevisiae genome
Sso1p Sso2p
Snc1p Snc2p
Fig. 15.11Page 518
Search for informationon SSO1 (or anyyeast gene) at theSGD website
Fig. 15.12Page 519
The SGD record for SSO1 provides information on function
Duplication of the S. cerevisiae genome
The SGD website reveals that the SSO1 gene is nonessential(i.e. the null mutant is viable), but the double knockout ofSSO1 and SSO1 is lethal. Thus, these paralogs may offerfunctional redundancy to the organism.
Also, these proteins could participate in distinct (butcomplementary) intracellular trafficking steps.
Page 519
Duplication of the S. cerevisiae genome
Andreas Wagner (2000) considered two ways an organismcan compensate for mutations: via genes with overlappingfunctions (e.g. paralogs), or via genes with unrelatedfunctions that participate in regulatory networks.
He reported that overall, gene duplications did not providerobustness. Instead, interactions among unrelated genesprovide robustness against mutations.
Page 519
Functional genomics in yeast
Functional genomics refers to the assignmentof function to genes based on genome-widescreens and analyses.
Next week, Jef Boeke will describe functional genomics(Monday). Joel Bader will describe proteomicsin yeast (Wednesday).
Page 520
Fig. 15.13Page 520
We can consider functional genomics in yeastin terms of high throughput approaches at the levels of genes, transcripts, and proteins
Functional genomics in yeast (next week)
Protein levelTwo-hybrid screensAffinity purification and mass spectrometryPathways
RNA levelMicroarraysSAGEtransposon tagging
Gene levelGenetic footprintingTransposon insertion: random mutagenesisGene deletion: targeted deletion of all ORFs!!!
Today’s final topic: comparative analysis of fungal genomes
The fungi offer unprecedented opportunitiesfor comparative genomic analyses
-- relatively small genome sizes-- they are eukaryotes-- they exhibit significant differences in biology-- opportunities to apply functional genomics approaches in a comprehensive, genome-wide manner
Page 528
Fungal and metazoan phylogeny
Baldauf et al., 2000Page 528
A variety of fungal genome sequencing projects
size chromosomes Aspergillus fumigatus 30 Mb 8Aspergillus nigrans 29 Mb 8Apergillus parasiticusCandida albicans 16 Mb 8Cryptococcus neoformans 21 MbFusarium sporotrichiodesMagnaporthe grisea 40 Mb 7Neurospora crassa 43 Mb 7Phanerochaete chrysoporium 30 Mb 10Saccharomyces cerevisiae 13 Mb 16Schizosaccharomyces pombe 14 Mb 3Ustilago maydis 20 Mb
An atypical fungus: Encephalitozoon cuniculi
Microsporidia are single-celled eukaryotes that lackmitochondria and peroxisomes. Consistent with theirroles as parasites, the E. cuniculi genome is severelyreduced in size (2000 proteins, only 2.9 Mb). They were thought to represent deep-branching protozoans, butrecent phylogenetic studies place them as an outgroupto fungi.
Page 529
Fig. 15.22Page 529
Encephalitozoon cuniculi as a fungal outgroup
Orange bread mold: Neurospora crassa
Beadle and Tatum chose N. crassa as a model organismto study gene-protein relationships. The genome sequencewas reported: 39 Mb, 7 chromosomes, 10,082 ORFs(Galagan et al., 2003).
N. crassa has only 10% repetitive DNA, and incredibly, only 8 pairs of duplicated genes that encode proteins >100 amino acids. This is because Neurospora uses“repeat-induced point mutation” (RIP), a mechanism bywhich the genome is scanned for duplicated (repeated)sequences. This appears to serve as a genomic defensesystem, inactivating potentially harmful transposons.
Page 530
Schizosaccharomyces pombe
The S. pombe genome is 13.8 Mb and encodes ~4900predicted proteins. Some bacterial genomes encode more proteins (e.g. Mesorhizobium loti with 6752, and Streptomyces coelicolor with 7825 genes).
Chromosome genes Coding1 5.6 Mb 2,255 59%2 4.4 Mb 1,790 58%3 2.5 Mb 884 55%
Total 12.5 Mb 4,929 58%
See: TIGR www.tigr.orgEBI www.sanger.ac.uk/Projects/S_pombe Page 530
Schizosaccharomyces pombe
Chromosome genes Coding1 5.6 Mb 2,255 59%2 4.4 Mb 1,790 58%3 2.5 Mb 884 55%
Total 12.5 Mb 4,929 58%
See: TIGR www.tigr.orgEBI www.sanger.ac.uk/Projects/S_pombe
Schizosaccharomyces pombe
S. pombe diverged from S. cerevisiae about330 to 420 million years ago.
Many genes are as divergent between thesetwo fungi as they are diverged from humans.
To see this, try TaxPlot at NCBI.
Page 530
Perspective and pitfalls
The budding yeast S. cerevisiae is one of the most significantorganisms in biology:• Its genome is the first of a eukaryote to be sequenced• Its biology is simple relative to metazoans• Through yeast genetics, powerful functional genomics approaches have been applied to study all yeast genes
It is important to note that even for yeast, our knowledge of basic biological questions is highly incomplete. We still understand little about how the genotype of anorganism leads to its characteristic phenotype.
Page 531