whole genome transcriptome variation in arabidopsis thaliana xu zhang borevitz lab whole genome...
Post on 22-Dec-2015
219 views
TRANSCRIPT
Whole genome transcriptome Whole genome transcriptome
variation in variation in Arabidopsis thalianaArabidopsis thaliana
Xu Zhang
Borevitz Lab
Whole genome transcriptome Whole genome transcriptome
variation in variation in Arabidopsis thalianaArabidopsis thaliana
Xu Zhang
Borevitz Lab
Arabidopsis thaliana have been adapted to highly variable environments
Transcription and splicing
Chromosomal DNA
Transcription
Nuclear RNA
Exon 1 Exon 2 Exon 3Intron 1 Intron 2
RNA splicing
Messenger RNA Exon 1 Exon 2 Exon 3 Exon 1 Exon 3
Whole genome tiling array
Genetic hybridization polymorphisms could affect the estimation of gene expression
High density and resolution: 1.6M unique probes at 35bp spacing
Without bias toward known transcripts
Col♀ x Col♂ Van ♀ x Van ♂ Col ♀ x Van ♂Van ♀ x Col ♂
parental strains and reciprocal F1 hybrids mRNA from total RNA; genomic DNA
The experiment
Double-stranded random labeling
Random reverse transcription
Double-stranded cDNA
Random priming
AAAAA
AAAAA
Sequence polymorphisms
Gene expression variation
Splicing variation
A functional network of differentially spliced genes
HMM for a de novo transcription profiling
Outlines
Sequence polymorphisms
Gene expression variation
Splicing variation
A functional network of differentially spliced genes
HMM for a de novo transcription profiling
Outlines
SFP
deletion or duplication in Van
Single Feature Polymorphisms and indels
SFPs
SFP
Sequence polymorphisms
SPFs and indels (>200bp) were removed before gene expression analysis
SFPsa
FDR Col > Vanc Van > Colc Total
11.82% 135769 14934 150703
7.66% 126443 9479 135922
5.22% 118381 6662 125043
3.88% 110861 4979 115840
3.15% 104115 3820 107935
Indelsb
Model selection deletion duplication Total
BICd 518 22 540
AICe 1645 136 1781
Deletions vs duplications
Distribution of indels along chromosomes
Sequence polymorphisms
Gene expression variation
Splicing variation
A functional network of differentially spliced genes
HMM for a de novo transcription profiling
Outlines
Additive, dominant and maternal effects of gene expression
The linear model
Gene probe Intensity ~ additive + dominant + maternal + εin
ten
sity
Co
l
Van
F1c
F1v
additivematernal
dominant
genotypes
Gene expression variation between genotypes
Deltaa Sig+b Sig-c Total Falsed FDR
additive
0.5 4911 3967 8878 901 10.15%
1 2674 1736 4410 215 4.88%
1.5 1626 923 2549 70 2.76%
1.8 1249 676 1925 39 2.03%
2.5 690 334 1024 13 1.24%
dominant
0.5 1511 3190 4701 767 16.31%
1 405 1521 1926 186 9.65%
1.5 157 811 968 67 6.93%
1.8 92 575 667 40 5.99%
2.5 41 270 311 14 4.65%
maternal
0.5 5998 95 6093 735 12.06%
1 2046 8 2054 151 7.37%
1.5 480 0 480 49 10.29%
1.8 163 0 163 28 17.33%
2.5 41 0 41 9 22.84%
Mea
n g
ene
inte
nsi
ty
Van d
omina
nt
Col do
mina
nt
over
dom
inan
tF1
v do
min
ant
F1c
dom
inan
t
Mat
erna
l pa
tern
al
The pattern of gene expression inheritance
Col Van F1v F1c
The pattern of gene expression inheritance
Enrichment in GO functional categories
GO enrichment for additive dominant maternal effect genes
Defense response genes are highly expressed in F1 hybrid lines, while many growth related pathway are down-regulated
Sequence polymorphisms
Gene expression variation
Splicing variation
A functional network of differentially spliced genes
HMM for a de novo transcription profiling
Outlines
Default expression status of exon and intron
Exons: correction for gene expression
corrected by gene mean
corrected by a gene median
splicing index (Meanexon/Meangene)
Introns: direct comparison
Exon/intron probe Intensity ~ additive + dominant + maternal + ε
Differential exon splicing
Exon probe Intensity ~ additive + dominant + maternal + ε
Deltaa Sig+b Sig-c Total Falsed FDR
corrected by gene mean
0.3 287 190 477 559 117%
0.4 177 129 306 205 67.0%
0.5 127 109 236 97 41.0%
0.6 92 86 178 55 30.8%
0.7 77 69 146 34 23.4%
Corrected by gene median
0.3 523 280 803 556 69.2%
0.4 328 172 500 203 40.6%
0.5 223 120 343 96 28.0%
0.6 154 76 230 54 23.5%
0.7 123 52 175 34 19.3%
Splicing index
0.3 407 235 642 425 66.0%
0.4 292 175 467 132 28.0%
0.5 230 143 373 50 13.0%
0.6 178 104 282 21 7.50%
0.7 148 86 234 10 4.30%
Differential intron splicing
Intron probe Intensity ~ additive + dominant + maternal + ε
Deltaa Sig+b Sig-c Total Falsed FDR
0.3 561 1034 1595 332 20.8%
0.4 405 523 928 85 9.17%
0.5 316 352 668 28 4.26%
0.6 239 220 459 12 2.61%
0.7 202 155 357 7 1.91%
0.8 176 120 296 5 1.53%
Differential exon splicing is predominantly additive in F1 hybrids
Some dominant effect in differential intron splicing in F1 hybrids
Comparison for enrichment in known alternatively spliced exons
Threshold 1 Threshold 2
Called Not called Called Not called
Corrected by gene mean
Known 28 991 7 1012
Not known 397 55145 90 55452
Fold enrichment 3.92 4.26
p-value 5.97E-09 1.90E-03
Corrected by gene median polish
Known 24 995 6 1013
Not known 430 55112 85 55457
Fold enrichment 3.09 3.86
p-value 3.60E-06 6.14E-03
Splicing index
Known 24 1093 5 1112
Not known 537 72328 88 72777
Fold enrichment 2.96 3.72
p-value 6.84E-06 1.36E-02
AT1G21350
AT1G34180
AT1G76170
AT1G29120
AT1G51350
AT1G80960
AT1G07350
Experimental determined FDR for differential splicing
# of significant
calls
estimated FDR
# of tested # of
confirmedexperimental
FDR
Exon (corrected by mean)
477 117% 45 22 51.1%
111 20.8% 18 10 44.4%
Exon (corrected by median)
500 40.6% 40 21 47.5%
103 15.60% 17 10 41.2%
Exon (splicing
index)
642 66.0% 50 23 54.0%
102 1.00% 20 10 50.0%
intron459 2.61% 65 38 41.5%
195 1.15% 58 33 43.1%
Sequence polymorphisms
Gene expression variation
Splicing variation
A functional network of differentially spliced genes
HMM for a de novo transcription profiling
Outlines
Enrichment of differentially spliced genes in chloroplast thylakoid
enrichment of differentially spliced genes
Chloroplast thylakoid
Differrentially spliced genes which are located in chloroplast thylakoid
Photosynthesis related genes
AT5G38660 APE1 (Acclimation of Photosynthesis to Environment) mutant has altered acclimation responses
AT1G07350 transformer serine/arginine-richribonucleoprotein putative
AT1G55310 SC35-like splicing factor 33 kD(SCL33)
AT2G29210 splicing factor PWIdomain-containing protein
AT5G04430 KH domain-containing proteinNOVA putative
Splicing regulator tend to be differentially spliced
Sequence polymorphisms
Gene expression variation
Splicing variation
A functional network of differentially spliced genes
HMM for a de novo transcription profiling
Outlines
Generalized tiling array HMM
3-state HMM Discrete distribution for emission probability Transition probability counts for probe spacing Baum-Welch parameter estimation
(by Jake Byrnes)
An example of HMM detected segments
A nice model also needs better array
Array density is not enough to distinguish exon/intron boundaries
Probe quality
Differential segments
>=3 continuous probes with posterior probability >0.99.
Differentially expressed genes
annotated genes for which ≥33% of their probes reside within the observed differential segments.
Differentially spliced genes
annotated genes for which <33% of probes resided within the differential segment, or annotated genes containing ≥2 differential segments with different states.
Novel gene boundaries
differential segments with >= 5 probes extending beyond annotated gene boundary
Novel transcripts
differential segments with >= 5 probes and outside any annotated gene boundary.
Length distribution of segments called by HMM
Comparison of annotation-based analysis and HMM
Col > Van Van > Col Total
Annotation
differential expressiona 1626 923 2549
differential exonic splicingb 287 190 477
differential intronic splicingc 202 155 357
HMM
differential expressiond 1654 962 2616
differential splicinge 874 530 1404
un-annotated transcriptf 34 42 76
un-annotated 5'g 30 19 49
un-annotated 3'g 28 8 36
Comparison of annotation-based analysis and HMM
AnnotationExpression(Col>Van)
Expression(Van>Col)
Splicing(Col>Van)
Splicing(Van>Col)
HMM1654 962 921 550
Expression(Col>Van)
1626 1270 225
Expression(Van>Col)
923 727 132
Splicing(Col>Van)
441 181 47
Splicing(Van>Col)
300 90 38
Acknowledgements
Justin Borevitz
Yan Li
Christos Noutsos
Geoff Morris
Andy Cal
Jake Byrnes
Josh Rest