comparative transcriptomic analysis of fungi group nicotiana daan van vliet, dou hu, joost de jong,...
TRANSCRIPT
Comparative transcriptomic
analysis of fungi
Group NicotianaDaan van Vliet, Dou Hu, Joost de Jong, Krista Kokki
Research objective
To study differences in gene expression in related fungus species
Studies species:- Reference genome- RNA reads > 100 bp- Preferably: Paired-end- Related species - Similar conditions
Comparison
Comparison between different species- Saccharomyces cerevisiae (yeast)- Komogataella pastoris (Pichia, yeast)- Aspergillus oryzae (fungus)
Methods – Data [Daan]
RNA-seq: SRAGenome and annotation: Ensembl FungiRead quality analysis performed with FastQC
Methods - Data processing
Cleaning reads: SolexaQA Mapping reads: TopHatAssembly/Quantification: CufflinksOptional replicate assembly: CuffmergeExtracting transcript seqs: gffreadSelection of top 100 genes: Linux
Methods – Gene properties
Property Explanation Tool (input datafile)
Expression Count of mapped reads Perl script (fasta)
Length Count of base pairs of whole gene Perl script (fasta)
Intron length Count of base pairs within introns Perl script (gtf)
GC content GC count/Length Perl script (fasta)
Nc Ratio: 20-61; 20 = one codon per amino acid; 61: random codon use
CodonW (fasta)
CG3s GC content of 3RD synonymous codon position
CodonW (fasta)
Methods – Interaction
Top 100 genes were mapped to the interactome file and visualised through Cytoscape.
Hypothesis for yeast - Validation
• GC-content correlates positively with gene length.
• Negative correlation with gene length and degree of codon bias.
• Codon bias is more extreme in highly expressed genes.
• Genes with longer introns show higher bias in codon usage.
• The overall codon usage matches the known bias.
GO-terms and gene locationsGOBPID Pvalue OddsRatio ExpCount Count Size Term
1GO:0002181 3.58E-97 54.125 6.305508 95 171cytoplasmic translation
2GO:0044238 1.80E-14 3.670035 51.04701 96 1319primary metabolic process
3GO:0071843 2.06E-12 3.421344 21.49811 57 587cellular component biogenesis at cellular level
4GO:0006407 7.92E-12 37.62835 0.700612 11 19rRNA export from nucleus
5GO:0070925 4.44E-11 8.76313 2.949945 19 80organelle assembly
Chromosome I II III IV V VI VII VIII IX X XI XII XII XIV XV XVINro. of genes 3 15 5 46 5 16 14 5 0 11 12 37 17 11 1 2
The top 5 most over-represented GO-terms for all the found genes
The chromosomes the genes are found in.
Results – CorrelationsGene expression vs. Gene length
Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris
Results – CorrelationsGene expression vs. Intron length
Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris
Results – CorrelationsGene expression vs. Effective Nr of codons
Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris
Results – CorrelationsEffective Nr of Codons vs. GC-cont. 3rd pos.
Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris
Results – CorrelationsGene length vs. Effective Nr of Codons
Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris
Results – CorrelationsGene length vs. GC-content
Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris
Results – CorrelationsGene length vs. Intron length
Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris
Results – CorrelationsIntron length vs. Nc
Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris
Results – CorrelationsOverall:- Within species:
Few correlations between gene properties- Between species:
Different patterns(?)
Cytoscape
• GO terms
Top100 genes show different interactive network in GO terms
Results - First choice
Yeast Interactome Project for S. cerevisiae•high-throughput yeast two-hybrid (Y2H) provides high-quality binary interaction information. •high-throughput Y2H dataset covering ~20% of all yeast binary interactions. •This binary map is enriched for transient signalling interactions and inter-complex connections with a highly significant clustering between essential proteins.
Database choosing
• interactions from CCSB-YI1 1,809 interactions among 1,278 proteins
Second choiceYeastNet v. 2•a probabilistic functional gene network of yeast genes, constructed from ~1.8 million expermental observations from DNA microarrays, physical protein interactions, genetic interactions, literature, and comparative genomics methods.• In total, YeastNet v.2 covers 102,803 linkages among 5,483 yeast proteins •a modified Bayesian integration of diverse data types, with each data type weighted according to how well it links genes that are known to share functions. (LLS)
Database choosing
• All the top 100 genes could find interactors in the Yeastnet v.2.
• We could find 9896 possibilities among 102,803 linkages
The end
Questions?
Results – CorrelationsGene expression vs. CG content
Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris