comparative genomics of aspergilli william nierman tigr
TRANSCRIPT
Comparative Genomics of Aspergilli
William NiermanTIGR
Electrophoretic Karyotyping 5 day run
CHEF DRII 1.2% CGA, 1x TAE, 14C, 1.8 V/cm: 2200 s, 48 h; 2200-1800 s, 68 h sizes in Mb
5.74.6
3.5
Sc
Sp
5.0
1x 4.0
3.5
1.8Af
A. fumigatus Chromosomes
Centromeric area Telomere
Size (MB)
3
2
6
5
8
7
1
4~35 copies rDNA
4.891
4.834
4.018
3.933
3.922
3.779
2.021
1.789
Centromeres and Telomeres
• Telomere repeat TTAGGG, 7-21 repeat units – Subtelomeric regions- identical sequences for several kb,
helicase pseudogenes, 7 secondary metabolite clusters, niche adaption role? (Mark Farman)
• Centromeres– Uncloned in shotgun libraries; 36.2 - 55.9kb – Flanked on each side by low complexity AT rich repeat
region– Chromosome 2 centromere 12 kb PCR product 75% AT,
overall centromeric AT of 63%, 40kb.
Finished chromosome sequences
Masked genomic sequence
Gene prediction Protein alignmentsEST alignments
Optimize Predictions
Eukaryotic Genome Control (EGC) is the annotation pipeline responsible for processing genomic sequence
Annotation Pipeline
Training Data
– Full Length cDNAs (625) and 42 partials from 589 loci in 19 Aspergillus species
– 2,633 A. fumigatus ESTs from UK and Spanish collaborators
Gene and splicing site predictions including Glimmer,Exonomy, Unveil, Phat and GeneSplicer were trained with following experimental data:
Optimize Predictions
Combiner combines gene model evidence from:
• Gene prediction programs
• Splice site prediction programs
• Alignments from protein, cDNA and EST databases
• Generates final gene model.
All the genes were manual reviewed and the observed splits and merges were corrected.
Annotation Station Screenshot
Brown 2 Brown 1Yellowish-green
1,3,6,8-tetrahydroxynaphthalene reductase
Scytalone dehydratase
Polyketide synthetase
Chromosome AFU ANA AOA
Size 28635699 30068514 36746653
GC Content 49.9 50.3 48.3
# of Genes 9746 9967 14063
Mean Gene Length 1442.4 1535.9 1177.5
Gene Density 2938.2 3016.8 2613
Percent of Coding 49.1 50.9 45.1
Percent Genes with Introns 75.8 88.7 80.7
Exons AFU ANA AOA
Number 26181 36249 40133
Mean # per Gene 2.7 3.6 2.9
GC Content 54 53.4 52
Mean Length(bp) 536.9 422.3 412.6
Total Length(bp) 14057166 15308196 16559586
Introns AFU ANA AOA
Number 16432 26282 26070
GC Content 46.3 46.1 45.5
Mean Length(bp) 121.8 104.6 129.7
Total Length(bp) 2000799 2748240 3380731
Intergenic Regions AFU ANA AOA
GC Content 46 47.5 45.3
Mean Length(bp) 1276.4 1159.5 1174.3
Functional Annotation AFU ANA AOA
# of Genes w/PFAM Hits 4403 4512 5306
# of Genes with Computed Families 4603 4536 6263
Gene Summary Statistics
Domains Domain name #ProteinsPF00172 Fungal Zn(2)-Cys(6) binuclear cluster dom. 147
PF00083 Major facilitator superfamily 109
PF00400 WD domain G-beta repeat 105
PF00069 Protein kinase domain 105
PF00106 Oxidoreductase, sh. Chain dehydro./reduc. 95
PF00271 Helicase conserved C-terminal domain 75
PF00023 Ankyrin repeat 64
PF00067 Cytochrome P450 65
PF00096 Zinc finger C2H2 type 61
PF00107 Oxidoreductase, Zn-binding dehydrogenase 61
PF00076 RNA recognition motif 59
PF00005 ABC transporter 51
PF00501 AMP-binding enzyme 44
PF00270 DEAD/DEAH box helicase 39
PF01360 Monoxygenase 39
Most Common Domains in A. fumigatus
Synteny Map of A. fumigatus and A. nidulans
Synteny Map of A.fumigatus and A. oryzae
Synteny Map of A. fumigatus, A. nidulans, A. oryzae
The ortholog was computed by performing an all vs. all BlastP of the three
proteomes with a cut-off of 1 x e-15 (no length requirement). The mutual best
hits were then organized into clusters based on shared protein nodes.
COG A. fumigatus A. Oryzae A. nidulans avg_pctid avg_coverage num_cogs
3 member + + + 70% 86% 5899
+ + 65% 84% 967
2 member + + 61% 79% 533
+ + 61% 80% 936
Species #genes included in COG percent of predicted proteome
A. fumigatus 7507 79%
A. nidulans 7429 75%
A. Oryzae 7988 57%
Total 22924 68%(22924/33552)
Overview – Comparative Statistics
TIGR Autoannotation vs Sanger Curated Annotation
• Status Count• Total Sanger Genes analyzed 360• Same gene structure 137• Different gene structure 177• Sanger missing in TIGR annotation 37• Sanger matches multiple TIGR annotations 2• Sanger, TIGR annotations opposite strands 7• TIGR missing in Sanger annotation 12• TIGR matches multiple Sanger annotations 9
Using Ortholog Clusters to Identify Potential Annotation Problems
Using Ortholog Clusters to Identify Potential Annotation Problems
Different exon number due to annotation discrepancy
We need to be able to distinguish annotation inconsistencies from real, interesting phenomena
In some cases, differences in exon number are real
Apoptosis in Fungi
• Apoptosis-like process detected in S. cerevisiae, S. pombe, and Aspergilli.
• Fungal genomes lack metazoan upstream machinery.
• Metacaspase-dependent phenotype observed in A. fumigatus and A. nidulans.
• Analysis by Goeff Robson
DOMAINS S.cerevisiae S.pombe A.fumigatus A.nidulans A.oryzae
NB-ARC X X 57.m0539456.m0242472.m1982166.m04653asfu05688
10025.m0012610051.m0044210115.m0008110157.m0005410176.m0000510016.m0017810150.m0005210062.m0013610153.m00210
20175.m0042720175.m0034720116.m0007820180.m0089120167.m0034720122.m0010220168.m00299
Caspase-activated nuclease
X X X X X
CAS/CSE CSE-1 CSE-1 X X X
MATH UBPF UBPFUBP5
53.m0378053.m04162
10139.m00184 20147.m00277
PROTEIN FAMILY
Metacaspase MCA1 AL031179 59.m0848654.m06827
10098.m0029910042.m0004710062.m00137
20149.m002720166.m0020420161.m00321
Anti silencing protein1 ASF1 ASF1 59m.08789 10084.m00239 20175m.00377
STM1 STM1/MPT4 Q42914 X X X
CDC48p CDC48 CDC48 72.m19795 10124.m00023 20134.m00118
Apoptosis in Fungi
Aspergillus fumigatus Secondary Metabolites
• Heterogeneous group of low molecular weight products.
• Toxic, antibiotic, and immunosuppressant activities.–– fumagillin, gliotoxin (apoptosis and phagocyte dysfunction), fumitremorgin, verruculogen, fumigaclavine, helvolic acid, phthioc acid (granulomas when injected into mice) and sphingofungins
• Virulence properties may be augmented by the A. fumigatus numerous secondary metabolites.
Gene type A. oryzae A. fumigatus A. nidulans
PKS 30 14 27
NRPS 18 14 14
FAS 5 1 6
Sesquiterpene cyclase
1 (1) (1)
DMATS 2 7 2
Secondary Metabolite Genes
Analysis by G. Turner, N. Keller, Dr. Kitamoto, and R. Kulkarni
TryptophanProlineNRPS?DMAT synthetase
TryptophanDMAT synthetase (X2)
SerinePhenylalanine2 module NRPS?
TerpeneSesquiterpene cyclase
Gliotoxin
Fumagillin
Fumigaclavines
Fumitremorgens
Gene type A. oryzae A. fumigatus A. nidulans
PKS2 30 14 27
NRPS 18 14 14
FAS 5 1 6
Sesquiterpene cyclase
1 (1) (1)
DMATS 2 7 2
Five 2-module NRPS
A. fumigatus Secondary Metabolite Genes
• Few true orthologues across the genus Aspergillus. Each species has its own repertoire.
• Gene/product relationship requires functional analysis in most cases
• Indole alkaloid pathway in A. fumigatus only. Closely related to Claviceps purpurea ergotamine pathway
• Penicillin and aflatoxin pathways are absent.
• A hybrid PKS/monomodular NRPS seems to be present in several fungi.
Identify A. fumigatus specific genesA. fumigatus genes
All vs. all BlastP of the AFU1,ANA1, AOAN proteomescut-off E value: 1 x e-15, filtering the results for mutual best hitsbetween genomes.
A. fumigatus singletons
(9746)
(2075)BLASTP vs ANA1 and AOA1 proteomes
A. fumigatus singletons E-value > e-10(1081)
Extend 50bp on both ends of the gene in the genome, Tblastx the genomic seq of the gene vs ana and aoa genomic seq
A. fumigatus specific gene candidates E-value > e-50
e-5>E-value>e-10(203)
BLASTP vs ANA1 and AOA1 proteomes
E-value > e-5(808)
e-50<E-value < e-10(181)
e-5>E-value>e-10(75)
E-value > e-5(552)
(1011)
Extend 50bp on both ends of the gene in the genome, Tblastx the genomic seq of the gene vs ana and aoa genomic seq
Aspergillus fumigatus Unique Genes
• Vast majority are hypothetical• Includes
– Several transcriptional regulators
– A chaperonin
– An hsp 70 related protein
Arsenic Fungi
• 19th century poisonings associated with green pigments.• 1892 B. Gosio, certain fungi could metabolize arsenic
pigments producing toxic trimethylarsine (Gosio gas).• Screen in the 1930s (Thom & Raper) found A. fumigatus
to be an arsenic fungus.• Napoleon, imperial colors green and gold, copper arsenite
(Jones 1982).• Analysis of history and genome by J. Bennett, N. Hall, J.
Wortman, C. Lu.
A. fumigatus Arsenate Genes
• Arsenite efflux pump• Arsenite translocating ATPase• Two possibly duplicated clusters
– arsC – arsenate reductase (A. fumigatus unique)
– arsB – arsenite symporter– arsH – Methyltransferase
Chromosome 1
Chromosome 5
arsBarsCarsH Methytrasferase
arsH MethyltransferasearsB
arsC
A. Fumigatus Teichoic Acid Biosynthesis Protein
• Good homology to a the full length of the Streptomyces griseus protein.
• Secretion signal peptide may direct for cell wall.
• Teichoic acids demonstrated to be a virulence factor for Staphylococcus aureus.
• No intervening sequences in gene.
Analysis by Neil Hall
More highly expressed at 48oC
More highly expressed at 37oC
A. Fumigatus Thermotolerance
A. fumigatus Thermotolerance
• Relatively few genes altered• Some HSPs transiently or stably induced
(weakly) and repressed at 37oC.• HSPs induced throughout 180 min 48oC period• Transposases induced at 48oC (Mariner 4). • Stress related genes up regulated at 48oC. • Metabolic proteins down regulated at 48oC
“This fungus likes it hot.”J. Bennett
Microarray Detection of Clusters
Aspergillus fumigatus AF293 Project Participants
• The University of Manchester, UK
• The Wellcome Trust Sanger Centre, UK
• The Institute for Genomic Research, USA
• The University of Salamaca, Spain
• Complutense University, Spain
• Centro de Investigaciones Biológicas, Spain
Aspergillus fumigatus AF293
David DenningMichael AndersonArnab PainGoeff RobsonJavier ArroyoGoeff TurnerDavid Archer
Joan BennettMatt BerrimanJean Paul LatgePaul DyerPaul BowyerNeil Hall
Aspergillus nidulans – James GalaganAspergillus oryzae – Masayuki Machida
TIGR
Sequencing and ClosureTamara FeldblyumHoda Khouri
AnnotationJennifer WortmanJiaqi HuangResham KulkarniNatalie FedorovaCharles Lu
Claire Fraser
Lab GroupHeenam KimDan Chen
NIAID and Dennis Dixon