genomic identification of structural rnas using …...genomic identification of structural rnas...
TRANSCRIPT
Genomic identification of Structural RNAs using phylo-SCFGs
Jakob Skou Pedersen
Bioinformatics Center, University of Copenhagen
Structural RNA identification problem
independently transcribed ncRNAs
ncRNA protein-coding gene
Structural RNA: any transcribed region with functional structure.
ncRNAs co-transcribed with protein-coding genes
Such as:
Highly diverse settRNA Xist (~20 kb long)miRNA
UTR RNase P
Little single sequence signal:Lack of common nucleotide biasesLack of common sequence motifs
Evolutionary signal
Structure functionally important
Primary sequence subs tolerated
Signal Unprecedented comparative data
Introduction to phylogenetic models
Captures:
Nucleotide biases
Patterns of substitution
Evolutionary sequence correlations
Correlated changes (multi nucleotide models)
: Continuous time Markov chain acting on
branches of phylogenetic tree
Alignment column
Substitution rates of Markov chain
Felsenstein 81:
Transition prob.:
Single-nucleotide model
• 4x4 rate matrix
• Marginal average of di
nucleotide matrix
• Fast substitution rate
-nucleotide model
16x16 rate matrix
Learned from data
Favors pairing di-nucs
Slow substitution rate
EvoFold Phylogenetic models
-nucleotide model
16x16 rate matrix
Learned from data
Favors pairing di-nucs
Slow substitution rate
Single-nucleotide model
• 4x4 rate matrix
• Marginal average of di
nucleotide matrix
• Fast substitution rate
EvoFold SCFGsStructural model: Non-structural model:
Structure derivation
Structure
Single nucleotide phylogenetic model
Di-nucleotide phylogenetic model
fold
Single nucleotide phylogenetic model
Di-nucleotide phylogenetic model
fold
Single nucleotide phylogenetic model
Di-nucleotide phylogenetic model
fold
Algorithms and training
Algorithms: Traditional SCFG algorithms (CYK and insideoutside) combined with Felsenstein 81.
Training of EvoFold:: Rfam structures mapped onto genomic alignments
Complexities:For an n long alignment with msequences:
Space:
vertebrates & drosophilidsInput: conserved segments
sub-fold
……(((((((..))).(((….))))))).………………(((((….)))))..sub-fold
Output: sub-folds
Sensitivity: 43%Performance
Experimentally studied vertebrate cases
HAR1Expression in developing neocortex
A
Editing in Mouse Brain
I/M siteGABRA3 A-to-I RNA editing substrate
High confidence subset from Drosophila screen
Selection criteria:Min. two compensatory substitutions#compensatory subs > 2 x #contradictory subs
Predictions (394 total) Genomic background
UTR structures
Gene involved in biogenesis and assembly of the ribosome
(by homology to RPL24)
High fraction of UTR predictions on transcribed strand (5’UTR: 80% & 3
Significant enrichment of genes regulatory roles.
Acknowledgments
GABRA3 study: Johan Ohlson, Marie Ohman (Stockholm University), and David Haussler (UCSC)
HAR1 study: Katherine S. Pollard (UCSC), Sofie R. Salama (UCSC), Nelle Lambert (ULB), Marie-Alexandra Lambot (ULB), Sandra Coppens (ULB), Sol Katzman (UCSC), Bryan King (UCSC), Courtney Onodera (USCS), Adam Siepel (Cornell), Andrew D. Kern (UCSC), Colette Dehay (Lyon), Haller Igel (UCSC), Manuel Ares, Jr (UCSC), Pierre Vanderhaeghen (ULB)
EvoFold: Gill Bejerano (Stanford), Adam Siepel (Cornell), Kate Rosenbloom (UCSC), Kerstin Lindblad-Toh (Broad), Eric S. Lander (Broad), Jim Kent (UCSC), Webb Miller (Penn State), and David Haussler (UCSC)
Fly screen: Manolis Kellis (MIT), David Haussler (UCSC), Drosophila Sequencing and Analysis Consortium
Structure predictions
UU
GCC
UAU
UGU
GUC
GCC
A -> I (G)U
UCA
U•
GG
U A C
•
New case of A-to-I RNA editing
A
Genomic
cDNA
I/M
Mouse Brain
Slide with structures, perhaps browser screenshot.
Highlight collaborationsRNA editing
regulation
Slide with structures, perhaps browser screenshot.
Highlight collaborationsRNA editing
regulation
Intronic hairpin in RDL flanked by Ato-I edited exons
SpenSpen function: Transcription co-factor and involved in neuronal cell fate, survival, and axon guidance. It has three RNA recognition motifs.
Staufen hairpin similar to known localization element in Orb
Stau
Orb
Hairpin in highly expressed intergenic region
Structure can be extended
EvoFold str.RNAfold str.
Alignment & full structure