alternative splicing from ests
DESCRIPTION
Alternative Splicing from ESTs. Eduardo Eyras Bioinformatics UPF – February 2004. Intro ESTs Prediction of Alternative Splicing from ESTs. Transcription. exons. introns. pre-mRNA. Splicing. Mature mRNA. Translation. Peptide. 5’. 3’. 3’. 5’. 5’ CAP. AAAAAAA. Different Splicing. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/1.jpg)
Alternative Splicing from ESTs
Eduardo EyrasBioinformatics UPF – February 2004
![Page 2: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/2.jpg)
Intro
ESTs
Prediction of Alternative Splicing from ESTs
![Page 3: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/3.jpg)
AAAAAAA5’ CAPMature mRNA
Splicing
5’
3’
3’
5’
pre-mRNA
Transcriptionexons
introns
Translation
Peptide
![Page 4: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/4.jpg)
AAAAAAA5’ CAPMature mRNA
Different Splicing
5’
3’
3’
5’
pre-mRNA
Transcriptionexons
introns
Translation
Different Peptide
![Page 5: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/5.jpg)
Alt splicing as a mechanism of gene regulation
Functional domains can be added/subtracted protein diversity
Can introduce early stop codons, resulting in truncated proteins or unstable mRNAs
It can modify the activity of the transcription factors, affecting the expression of genes
It is observed nearly in all metazoans
Estimated to occur in 30%-60% of human
![Page 6: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/6.jpg)
Forms of alternative splicing
Exon skipping / inclusion
Alternative 3’ splice site
Alternative 5’ splice site
Mutually exclusive exons
Intron retention
Constitutive exon Alternatively spliced exons
![Page 7: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/7.jpg)
How to study alternative splicing?
![Page 8: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/8.jpg)
ESTs (Expressed Sequence Tags)
Single-pass sequencing of a small (end) piece of cDNA
Typically 200-500 nucleotides long
It may contain coding and/or non-coding region
![Page 9: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/9.jpg)
ESTsCells from a specific organ, tissue or developmental stage
AAAAAA 3’5’
AAAAAA 3’5’
TTTTTT5’3’
AAAAAA 3’5’
TTTTTT5’3’
TTTTTT5’3’
AAAAAA 3’5’
TTTTTT5’3’
mRNA extraction
RNA
DNA
Double stranded cDNA
Add oligo-dT primer
Reverse transcriptase
Ribonuclease H
DNA polimerase Ribonuclease H
![Page 10: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/10.jpg)
ESTs
AAAAAA 3’5’
TTTTTT5’3’Clone cDNA into a vector
Multiple cDNA clones5’ EST
3’ EST
Single-pass sequence reads
![Page 11: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/11.jpg)
Splice variants
Genomic
Primary transcript
Splicing
cDNA clones(double stranded)
EST sequences (Single-pass sequence reads) 5’ 3’ 5’ 3’
Sampling the Transcriptome with ESTs
oligo-dT primer
Reverse transcriptase
![Page 12: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/12.jpg)
Large scale EST-sequencing coupled to Genome sequencing
![Page 13: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/13.jpg)
EST sequencing
Is fast and cheap
Gives direct information about the gene sequence
Partial information
Resulting ESTs Known gene(DB searches) Similar to known gene
ContaminantNovel gene
![Page 14: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/14.jpg)
Number of public entries: 20,039,613
Summary by organism
Homo sapiens (human) 5,472,005Mus musculus + domesticus (mouse) 4,056,481Rattus sp. (rat) 583,841Triticum aestivum (wheat) 549,926Ciona intestinalis 492,511Gallus gallus (chicken) 460,385Danio rerio (zebrafish) 450,652Zea mays (maize) 391,417Xenopus laevis (African clawed frog) 359,901…
dbEST release 20 February 2004
![Page 15: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/15.jpg)
EST lengths
Human EST length distribution (dbEST Sep. 2003 )
~ 450 bp
![Page 16: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/16.jpg)
ESTs provide expression data
eVOC Ontologies http://www.sanbi.ac.za/evoc/
Anatomical System
Cell Type
The tissue, organ or anatomical system from which the sample was prepared. Examples are digestive, lung and retina.
Pathology
The precise cell type from which a sample was prepared. Examples are: B-lymphocyte, fibroblast and oocyte.
Developmental Stage
The pathological state of the sample from which the sample was prepared.Examples are: normal, lymphoma, and congenital.
Pooling
The stage during the organism's development at which the sample was prepared. Examples are: embryo, fetus, and adult.
Indicates whether the tissue used to prepare the library was derived from single or multiple samples. Examples are pooled, pooled donor and pooled tissue.
J Kelso et al. Genome Research 2002
![Page 17: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/17.jpg)
ESTs provide expression data
eVOC Ontologies http://www.sanbi.ac.za/evoc/
Anatomical System
Cell Type Pathology Developmental Stage Pooling
…nervous
brain cerebellum …
Library 1 Library 2 …
ESTs ESTs
![Page 18: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/18.jpg)
Linking the expression vocabulary to gene annotations
ESTs
GenesV Curwen et al. Genome Research (2004)
![Page 19: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/19.jpg)
Gene expression vocabulary
![Page 20: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/20.jpg)
Normalized vs. non-normalized libraries
![Page 21: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/21.jpg)
The down side of the ESTs
Cannot detect lowly/rarely expressed genes or non-expressed sequences (regulatory)
Random sampling: the more ESTs we sequence the less new useful sequences we will get
![Page 22: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/22.jpg)
Using ESTs to study Alternative Splicing
![Page 23: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/23.jpg)
ESTs aligned to the genome
EST
True matchbest in genome
ParalogProcessed
pseudogene
GT AGPolyA
It defines the location of exons and intronsWe can verify the splice sites of introns check the correct strand of spliced ESTsIt helps preventing chimerasIt can avoid putting together ESTs from paralogous genesWe can prevent including pseudogenes in our analysis
*Stop
Must Clip poly A tails before aligning
![Page 24: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/24.jpg)
Alternative Exons/ 3´ PolyA sites from ESTs
ESTs can also provide information about potential alternative splicing when aligned to the genome (and when aligned to mRNA data)
![Page 25: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/25.jpg)
Aligning ESTs to the Genome
Many ESTs Fast programs, Fast computers
Nearly exact matches Coverage >= 97%Percent_id >= 97%
Splice sites: GT—AG, AT—AC, GC—AG
![Page 26: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/26.jpg)
Genomics as a Technology
Development of special software:fast versus accurate alignment
Development of special technology:efficient use of computer farms (~2000 CPUs)
![Page 27: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/27.jpg)
Recovering full transcripts from ESTs
![Page 28: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/28.jpg)
Recover the mRNA from the ESTs
![Page 29: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/29.jpg)
The Problem
What are the transcripts represented in this set of mapped ESTs?
ESTs
Genome
![Page 30: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/30.jpg)
Transcript predictions
ESTs
Predict Transcripts from ESTs
Merge ESTs according to splicing structure compatibility
![Page 31: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/31.jpg)
Redundant ESTsConsider 2 ESTs in a Genomic Cluster with more ESTS
xz
z gives redundant splicing information, we could keep only x x
zw
However, the relation with other ESTs in the cluster is important: a third EST, w, is compatible with z but not with x.--> keep all relations
x + z
x + zz + w
![Page 32: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/32.jpg)
Extension of the exon structureConsider 2 ESTs in a Genomic Cluster with more ESTS
xy
y extends x, we can assume that they are from the same mRNA
xzw
Our success will depend on the coverage of the exons.However, ESTs are 3’and 5’ biased (ESTs like z not so frequent), hence we will have fragmentation.
x + y
![Page 33: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/33.jpg)
Representation
Extension
Inclusion zx
y
x
For every 2 ESTs in a Genomic Cluster, we decide if they represent equivalent splicing structures
The compatibility relation is a graph:
xy
xz
E Eyras et al. Genome Research (2004)
![Page 34: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/34.jpg)
Criteria of “merging”
Allow internal mismatches
Allow intron mismatches
Allow edge-exon mismatches
mismatches
Is this intron real?
![Page 35: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/35.jpg)
Transitivity
Extension
Inclusion wz
y
x
w
x
This reduces the number of comparisons needed
xyz
xzw
![Page 36: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/36.jpg)
ClusterMerge graph
z
x
x
y
y
z
w
Each node defines an inclusion sub-tree
Extensions form acyclic graphs
yxz
xyzw
E Eyras et al. Genome Research (2004)
![Page 37: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/37.jpg)
Mergeable sets
1
32
4
65
Example
7
![Page 38: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/38.jpg)
Mergeable sets
1
32
4
65
Example
7
1
4
2
6
5
3
7
![Page 39: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/39.jpg)
Mergeable sets
1
32
4
65
Example
7
1
4
2
6
5
3
7
Leaves
Root
![Page 40: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/40.jpg)
Mergeable sets
1
32
4
65
Example
7
1
4
2
6
5
3
7
Lists produced: (1,2,3,5,6,7) ( 1,2,3,4,5,7)
Leaves
Root
![Page 41: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/41.jpg)
Deriving the transcripts from the lists
Internal Splice Sites: external coordinates of the 5’ and 3’ exons are not allowed to contribute
![Page 42: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/42.jpg)
Deriving the transcripts from the lists
Splice Sites: are set to the most common coordinate
5’ and 3’ coordinates: are set to the exon coordinate that extends the potential UTR the most
![Page 43: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/43.jpg)
Single exon transcripts
Reject resulting single exon transcripts when using ESTs
![Page 44: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/44.jpg)
Alternative splicing and comparative genomics
![Page 45: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/45.jpg)
Conservation of Alternative Splicing
Degree of conservation: 30-60%
Methods:
1.- compare single events
2.- Cross-alignment of full transcripts
![Page 46: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/46.jpg)
Exon Skipping Events
Introns flanking alternatively spliced (skipped) exons have high sequence conservation.Higher on average than constitutive inrons.
R Sorek & G Ast. Genome Research 13:1631-1637, 2003
![Page 47: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/47.jpg)
Sequences regulating the (Alternative) splicing
Overrepresented sequences in conserved introns (between human and mouse) may beInvolved in the regulation of alternative splicing.
Overrepresented: found in these introns more often than expected at random AND not foundin intronic sequences flanking constitutive exons (and upstream of skipped ones)
R Sorek & G Ast. Genome Research (2003) 13:1631-1637
ConservedAlternative
ExonFlankingIntrons
Overrepresented hexamer (downstream)
![Page 48: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/48.jpg)
Sequences regulating the (Alternative) splicing
Not all types of events are equally conserved.Introns flanking alternative 5´and 3´exons, and retained introns, have higher sequence conservation.
Sugnet CW, Kent WJ, Ares M Jr, Haussler D. Pac Symp Biocomput. 2004;:66-77
ConservedAlternative
ExonFlankingIntrons
Overrepresented hexamer
![Page 49: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/49.jpg)
Frame preservation
Frame preserving Constitutive exons Alternative exons
All exons 39.7% (Human)39.5% (Mouse)
41.6% (Human)44.7% (Mouse)
ConservedExon
40.9% (Human)38% (Mouse)
51.8% (Human)51.9% (Mouse)
A Resch et al. Nucleic Acids Research 2004, 32 (4) 1261-1269
![Page 50: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/50.jpg)
Predicting alternative exons
![Page 51: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/51.jpg)
Features Differentiating Between Alternatively splice and Constitutively spliced exons
Alternative exons
Constitutive exons
Average size 87 128
length = mutliple of 3 73% 37%
Average human-mouse exon conservation 94% 89%
(A) Exons with upstream intron conserved in mouse
92% 45%
(B) Exons with downstream intron conserved in mouse
82% 35%
(A) + (B) 77% 17%
R Sorek et al. Genome Research (2004) 14:1617-1623
(A), (B) : conservation is considered if at least there 12 consecutive matches over 100bp of the intron
![Page 52: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/52.jpg)
Build a classifier to make predictions
• Rule: Set of conditions over the parameters:
e.g. “at least 99% conservation with mouse AND divisible by 3, etc…”
• Try all the possible combinations of parameters
• Select the rule that would correctly identify a maximum number of true
alternative exons minimizing the number of false positives
At least 95% identity with mouse orthologous exon
Exon size is a multiple of 3
An upstream intronic alignment of at least 15bp with at least 85% identity
A downstream intronic exact alignment of at least 12bp
R Sorek et al. Genome Research (2004) 14:1617-1623
This rule achieved 31% sensitivity and no false positives in a set of known exons:
![Page 53: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/53.jpg)
SummaryAlternative splicing is a mechanism to generate function diversity
We can study alternative splicing using ESTs (Expressed Sequence Tags)
EST data is fragmented and full of noise: need to be processed
Some alternative splicing is conserved across species (Human-Mouse)
Prediction of alternative (conserved) exons is possible (a classifier) but no ab initio
Evolution of alternative splicing?
![Page 54: Alternative Splicing from ESTs](https://reader035.vdocument.in/reader035/viewer/2022062305/56815d83550346895dcb917d/html5/thumbnails/54.jpg)
THE END