whole genome transcriptome variation in arabidopsis thaliana xu zhang borevitz lab whole genome...

41
Whole genome transcriptome Whole genome transcriptome variation in variation in Arabidopsis thaliana Arabidopsis thaliana Xu Zhang Borevitz Lab

Post on 22-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Whole genome transcriptome Whole genome transcriptome

variation in variation in Arabidopsis thalianaArabidopsis thaliana

Xu Zhang

Borevitz Lab

Whole genome transcriptome Whole genome transcriptome

variation in variation in Arabidopsis thalianaArabidopsis thaliana

Xu Zhang

Borevitz Lab

Page 2: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Arabidopsis thaliana have been adapted to highly variable environments

Page 3: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Transcription and splicing

Chromosomal DNA

Transcription

Nuclear RNA

Exon 1 Exon 2 Exon 3Intron 1 Intron 2

RNA splicing

Messenger RNA Exon 1 Exon 2 Exon 3 Exon 1 Exon 3

Page 4: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Whole genome tiling array

Genetic hybridization polymorphisms could affect the estimation of gene expression

High density and resolution: 1.6M unique probes at 35bp spacing

Without bias toward known transcripts

Page 5: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Col♀ x Col♂ Van ♀ x Van ♂ Col ♀ x Van ♂Van ♀ x Col ♂

parental strains and reciprocal F1 hybrids mRNA from total RNA; genomic DNA

The experiment

Page 6: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Double-stranded random labeling

Random reverse transcription

Double-stranded cDNA

Random priming

AAAAA

AAAAA

Page 7: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

Page 8: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

Page 9: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

SFP

deletion or duplication in Van

Single Feature Polymorphisms and indels

SFPs

SFP

Page 10: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Sequence polymorphisms

SPFs and indels (>200bp) were removed before gene expression analysis

SFPsa

FDR Col > Vanc Van > Colc Total

11.82% 135769 14934 150703

7.66% 126443 9479 135922

5.22% 118381 6662 125043

3.88% 110861 4979 115840

3.15% 104115 3820 107935

Indelsb

Model selection deletion duplication Total

BICd 518 22 540

AICe 1645 136 1781

Page 11: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Deletions vs duplications

Page 12: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Distribution of indels along chromosomes

Page 13: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

Page 14: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Additive, dominant and maternal effects of gene expression

Page 15: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

The linear model

Gene probe Intensity ~ additive + dominant + maternal + εin

ten

sity

Co

l

Van

F1c

F1v

additivematernal

dominant

genotypes

Page 16: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Gene expression variation between genotypes

  Deltaa Sig+b Sig-c Total Falsed FDR

additive

0.5 4911 3967 8878 901 10.15%

1 2674 1736 4410 215 4.88%

1.5 1626 923 2549 70 2.76%

1.8 1249 676 1925 39 2.03%

2.5 690 334 1024 13 1.24%

dominant

0.5 1511 3190 4701 767 16.31%

1 405 1521 1926 186 9.65%

1.5 157 811 968 67 6.93%

1.8 92 575 667 40 5.99%

2.5 41 270 311 14 4.65%

maternal 

0.5 5998 95 6093 735 12.06%

1 2046 8 2054 151 7.37%

1.5 480 0 480 49 10.29%

1.8 163 0 163 28 17.33%

2.5 41 0 41 9 22.84%

Page 17: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Mea

n g

ene

inte

nsi

ty

Van d

omina

nt

Col do

mina

nt

over

dom

inan

tF1

v do

min

ant

F1c

dom

inan

t

Mat

erna

l pa

tern

al

The pattern of gene expression inheritance

Col Van F1v F1c

Page 18: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

The pattern of gene expression inheritance

Page 19: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Enrichment in GO functional categories

GO enrichment for additive dominant maternal effect genes

Defense response genes are highly expressed in F1 hybrid lines, while many growth related pathway are down-regulated

Page 20: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

Page 21: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Default expression status of exon and intron

Exons: correction for gene expression

corrected by gene mean

corrected by a gene median

splicing index (Meanexon/Meangene)

Introns: direct comparison

Exon/intron probe Intensity ~ additive + dominant + maternal + ε

Page 22: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Differential exon splicing

Exon probe Intensity ~ additive + dominant + maternal + ε

  Deltaa Sig+b Sig-c Total Falsed FDR

 

corrected by gene mean

 

0.3 287 190 477 559 117%

0.4 177 129 306 205 67.0%

0.5 127 109 236 97 41.0%

0.6 92 86 178 55 30.8%

0.7 77 69 146 34 23.4%

Corrected by gene median

0.3 523 280 803 556 69.2%

0.4 328 172 500 203 40.6%

0.5 223 120 343 96 28.0%

0.6 154 76 230 54 23.5%

0.7 123 52 175 34 19.3%

Splicing index

0.3 407 235 642 425 66.0%

0.4 292 175 467 132 28.0%

0.5 230 143 373 50 13.0%

0.6 178 104 282 21 7.50%

0.7 148 86 234 10 4.30%

Page 23: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Differential intron splicing

Intron probe Intensity ~ additive + dominant + maternal + ε

Deltaa Sig+b Sig-c Total Falsed FDR

0.3 561 1034 1595 332 20.8%

0.4 405 523 928 85 9.17%

0.5 316 352 668 28 4.26%

0.6 239 220 459 12 2.61%

0.7 202 155 357 7 1.91%

0.8 176 120 296 5 1.53%

Page 24: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Differential exon splicing is predominantly additive in F1 hybrids

Page 25: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Some dominant effect in differential intron splicing in F1 hybrids

Page 26: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Comparison for enrichment in known alternatively spliced exons

    Threshold 1 Threshold 2

    Called Not called Called Not called

Corrected by gene mean

Known 28 991 7 1012

Not known 397 55145 90 55452

Fold enrichment 3.92 4.26

p-value 5.97E-09 1.90E-03

Corrected by gene median polish

Known 24 995 6 1013

Not known 430 55112 85 55457

Fold enrichment 3.09 3.86

p-value 3.60E-06 6.14E-03

Splicing index

Known 24 1093 5 1112

Not known 537 72328 88 72777

Fold enrichment 2.96 3.72

p-value 6.84E-06 1.36E-02

Page 27: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

AT1G21350

AT1G34180

AT1G76170

AT1G29120

AT1G51350

AT1G80960

AT1G07350

Experimental determined FDR for differential splicing

 

# of significant

calls

estimated FDR

# of tested # of

confirmedexperimental

FDR

Exon (corrected by mean)

477 117% 45 22 51.1%

111 20.8% 18 10 44.4%

Exon (corrected by median)

500 40.6% 40 21 47.5%

103 15.60% 17 10 41.2%

Exon (splicing

index)

642 66.0% 50 23 54.0%

102 1.00% 20 10 50.0%

intron459 2.61% 65 38 41.5%

195 1.15% 58 33 43.1%

Page 28: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

Page 29: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Enrichment of differentially spliced genes in chloroplast thylakoid

enrichment of differentially spliced genes

Page 30: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Chloroplast thylakoid

Page 31: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Differrentially spliced genes which are located in chloroplast thylakoid

Photosynthesis related genes

AT5G38660 APE1 (Acclimation of Photosynthesis to Environment) mutant has altered acclimation responses

Page 32: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

AT1G07350 transformer serine/arginine-richribonucleoprotein putative

AT1G55310 SC35-like splicing factor 33 kD(SCL33)

AT2G29210 splicing factor PWIdomain-containing protein

AT5G04430 KH domain-containing proteinNOVA putative

Splicing regulator tend to be differentially spliced

Page 33: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

Page 34: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Generalized tiling array HMM

3-state HMM Discrete distribution for emission probability Transition probability counts for probe spacing Baum-Welch parameter estimation

(by Jake Byrnes)

Page 35: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

An example of HMM detected segments

Page 36: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

A nice model also needs better array

Array density is not enough to distinguish exon/intron boundaries

Probe quality

Page 37: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Differential segments

>=3 continuous probes with posterior probability >0.99.

Differentially expressed genes

annotated genes for which ≥33% of their probes reside within the observed differential segments.

Differentially spliced genes

annotated genes for which <33% of probes resided within the differential segment, or annotated genes containing ≥2 differential segments with different states.

Novel gene boundaries

differential segments with >= 5 probes extending beyond annotated gene boundary

Novel transcripts

differential segments with >= 5 probes and outside any annotated gene boundary.

Page 38: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Length distribution of segments called by HMM

Page 39: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Comparison of annotation-based analysis and HMM

    Col > Van Van > Col Total

Annotation

differential expressiona 1626 923 2549

differential exonic splicingb 287 190 477

differential intronic splicingc 202 155 357

HMM

differential expressiond 1654 962 2616

differential splicinge 874 530 1404

un-annotated transcriptf 34 42 76

un-annotated 5'g 30 19 49

un-annotated 3'g 28 8 36

Page 40: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Comparison of annotation-based analysis and HMM

 

AnnotationExpression(Col>Van)

Expression(Van>Col)

Splicing(Col>Van)

Splicing(Van>Col)

HMM1654 962 921 550

Expression(Col>Van)

1626 1270   225  

Expression(Van>Col)

923   727 132

Splicing(Col>Van)

441 181 47  

Splicing(Van>Col)

300   90 38

Page 41: Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang

Acknowledgements

Justin Borevitz

Yan Li

Christos Noutsos

Geoff Morris

Andy Cal

Jake Byrnes

Josh Rest