ribosearch ben daniel arielkirshner naomi instructor : dr. danny barash adaya cohen
Post on 21-Dec-2015
218 views
TRANSCRIPT
RiboSearchRiboSearch
Ben Daniel ArielBen Daniel Ariel Kirshner NaomiKirshner Naomi
Instructor : Dr. Danny BarashInstructor : Dr. Danny BarashAdaya CohenAdaya Cohen
IntroductionIntroduction
Biological IntroductionBiological Introduction Method LayoutMethod Layout ““The merge strategy”The merge strategy” Results and ConclusionsResults and Conclusions
RNARNARNARNAA single-stranded nucleic acid made up of 4 nucleotides :
Purines : adenine (A), guanine (G)
Pyramidines: cytosine (C), and uracil (U).
WC pairs:
A-U G-C
IntroductionIntroduction BiologicalBiological
Old scheme Old scheme
Protein carry out all biological Protein carry out all biological functionsfunctions
RNA : only a stage between DNA to RNA : only a stage between DNA to protein with no catalytic functionprotein with no catalytic function
DNA RNA Protein
Biological introductionBiological introduction
New schemeNew scheme Since the discovery of self-splicing RNAs in the Since the discovery of self-splicing RNAs in the
early 1980’s, a number of new structural and early 1980’s, a number of new structural and catalytic RNAs have been discovered.catalytic RNAs have been discovered.
Recent studies focusing on non-coding and Recent studies focusing on non-coding and small RNAs have led to discovery of RNA small RNAs have led to discovery of RNA molecules that posses essential regulatory molecules that posses essential regulatory functionsfunctions
DNA RNA Protein
RNA Secondary StructureRNA Secondary Structure
The secondary structure of many RNAs is usually The secondary structure of many RNAs is usually more conserved than their sequencemore conserved than their sequence
a. Hairpinb. Internal loopc. Bulge loopd. Junctione. Stem (double strand)f. pseudoknot
RiboswitchRiboswitch
RNA control elements that regulates gene RNA control elements that regulates gene expression, without the participation of expression, without the participation of proteins proteins
Utilize a unique mechanism where by small Utilize a unique mechanism where by small molecules bind to aptamer/box region molecules bind to aptamer/box region causing a conformational switch causing a conformational switch
Were found initially in 5’ UTR of bacteria with Were found initially in 5’ UTR of bacteria with successive discoveries in prokaryotessuccessive discoveries in prokaryotes
There are evidence suggesting riboswitches There are evidence suggesting riboswitches could be found in eukaryotes.could be found in eukaryotes.
Aptamer Coding section 3’5’
Expression platform
5 ’UTR 3 ’UTR
Riboswitch Riboswitch mechanismmechanism
Guanine bind to aptamer region with cause Guanine bind to aptamer region with cause conformational change in the expression platform, conformational change in the expression platform, which regulates the guanine metabolism.which regulates the guanine metabolism.
G-boxG-box
Regulates genes related Regulates genes related to purine metabolism and to purine metabolism and transporttransport
Binds purinesBinds purines Consists of 2 hairpins and Consists of 2 hairpins and
1 internal junction1 internal junction
RiboSearchRiboSearchGoal Goal Finding G-box in Finding G-box in
eukaryotic genomeseukaryotic genomes
MethodMethod Combining existing Combining existing
search methods into search methods into one overall packageone overall package
Search MethodsSearch Methods
Whiffer – CS department, BGUWhiffer – CS department, BGU RNAMotif – Macke RNAMotif – Macke et alet al. , 2001 . , 2001 RNAProfile – Pavesi RNAProfile – Pavesi et alet al. , 2004 . , 2004 STRSTR22 – CS department, BGU – CS department, BGU
WhifferWhiffer
Input Input Pattern that consists of : Pattern that consists of :
Sequence informationSequence information Variable gaps Variable gaps Base pairing brackets representing WC pairsBase pairing brackets representing WC pairs
OutputOutput Candidates locations that meet constraints Candidates locations that meet constraints
imposed by the methodimposed by the method
[ <<<<2 ]TA ]5[ GTNTCTAC ]3[ <<<<< ]3[ CCNNNAA ]3[ <<<<< ]5[ <<<<
WhifferWhifferMethodMethod Uses simple matching ,based on the Uses simple matching ,based on the
constraints ,as opposed to dynamic constraints ,as opposed to dynamic programming.programming.
RNAMotifRNAMotif
InputInput Database of nucleotide sequencesDatabase of nucleotide sequences Description file that consists of:Description file that consists of:
Descriptor sectionDescriptor section Score section (optional)Score section (optional)
OutputOutput Candidates that meet the conditions of the Candidates that meet the conditions of the
descriptor and the scoring schemedescriptor and the scoring scheme
RNAMotifRNAMotif
descr h5 (minlen=6, maxlen=8)
ss (minlen=4, maxlen=6) h3score { gcnt = 0; glen = 0; for( i = 1; i <= NSE; i++ ){
llen=length( se]i[ ); glen=glen+llen;
for( j = 1; j <= glen; j++ ){ b = se]i,j,1[; if( b == "g" || b == "c" ) gcnt++;
{{SCORE = 1.0 * gcnt / glen; if( SCORE < .4 ) REJECT; }
Sample descriptor file :
h5 h3
ss
RNAMotifRNAMotif
MethodMethod Two-stage algorithmTwo-stage algorithm
Stage I : Compilation stageStage I : Compilation stage Analyzing the specific motif, called a descriptor Analyzing the specific motif, called a descriptor
and converting it into a search tree based on the and converting it into a search tree based on the helical nesting of the motifhelical nesting of the motif
RNAMotifRNAMotif
MethodMethod Two-stage algorithmTwo-stage algorithm
Stage II : DFSStage II : DFS Depth first search of the tree that was created by Depth first search of the tree that was created by
the compilation stagethe compilation stage Each time a complete solution to the descriptor is Each time a complete solution to the descriptor is
found, the candidate is passed to an optional found, the candidate is passed to an optional score section for scoring and rankingscore section for scoring and ranking
In absence of score section the candidate is In absence of score section the candidate is acceptedaccepted
RNAProfileRNAProfile
InputInput
Number of distinct hairpins Number of distinct hairpins a motif has to containa motif has to contain
Set of unaligned RNA Set of unaligned RNA sequences expected to sequences expected to share a common motifshare a common motif
RNAProfileRNAProfile
OutputOutput
Regions that are most conserved Regions that are most conserved throughout the sequences, according to throughout the sequences, according to sequence of the regionssequence of the regions Secondary structure that can be formed Secondary structure that can be formed
according to base-pairing and according to base-pairing and thermodynamic rulesthermodynamic rules
RNAProfileRNAProfile
MethodMethod
Two phasesTwo phases Phase I : Phase I : Extracting a set of candidate regions from each Extracting a set of candidate regions from each
input sequence, whose predicted optimal secondary input sequence, whose predicted optimal secondary structure contains the number of hairpins given as structure contains the number of hairpins given as inputinput
Phase II : Phase II : The regions selected are compared with each other The regions selected are compared with each other
to find the group of most similar ones, formedto find the group of most similar ones, formed by a by a region taken from each sequenceregion taken from each sequence
Method SummeryMethod Summery
Whiffer Whiffer Combines sequence and structure similarityCombines sequence and structure similarity Very high specifity – potential candidates may be Very high specifity – potential candidates may be
ruled outruled out
RNAMotifRNAMotif Similarity based mostly on structural elements, Similarity based mostly on structural elements,
according to the descriptoraccording to the descriptor
RNAProfileRNAProfile Similarity based on both sequence and structureSimilarity based on both sequence and structure Recommended as a post-processing stepRecommended as a post-processing step
The merge strategyThe merge strategy
Query:Sequence
Structure (bracket notation)
Whiffer RNAMotif
Candidates
Input
Parsing
Parsing
(((..((((…)))).))
Candidates
Filtering
RNAProfile
Final candidates
Post processing
1. The location contained within a gene
2. The gene is relevant to the requested function (purine metabolism)
Final candidates
Sequence alignment
Biological experiments
Results – prokaryoteResults – prokaryoteBacillus HaloduransBacillus Halodurans
WhifferWhifferRNAMotifRNAMotifMergeMerge
CandidatesCandidates447777
True True positivespositives
442244
False False positivespositives
005533
False False negativesnegatives
002200
Results – eukaryoteResults – eukaryoteArabidopsis ThalianaArabidopsis Thaliana
WhifferWhifferRNAMotifRNAMotif
Run #1Run #1
RNAMotifRNAMotif
Run #2Run #2
MergeMerge
CandidatesCandidates0030307000070000--
Final Final candidatescandidates
000017171111
Results – eukaryoteResults – eukaryoteArabidopsis ThalianaArabidopsis Thaliana
Most promising candidatesMost promising candidates
Arabidopsis Thaliana
c2__11199940_11199996
queryGBox CGTGGATATGGCACGCAAGTTTCTACCGGGCACCGTAAATGTCCGACTAT 50c2__11199940_11199996_ --TTCAGGTC-CATCTTTGGCTAGACCGAAGTCAGATAATTTGGCGTTAT 47 * * * ** * * **** * * *** * ***
queryGBox G-------- 51c2__11199940_11199996_ AGTCCTGAA 56
c3_20894864_20894920c3_20894864_20894920
c3_sequencesGGATGAGGAACCAATTGACCCTGGATTTCAAGATT-TACAAAAGAACGTA 49queryGBox -------------CGTGGATATGGCACGCAAGTTTCTACCGGGCACCGTA 37 ** *** **** ** *** * ****
c3_sequences AGCATCC------- 56queryGBox AATGTCCGACTATG 51 * ***
RiboSearch - ConclusionsRiboSearch - Conclusions
Filters false positivesFilters false positives Sequences are by far less conserved Sequences are by far less conserved
within eukaryotes than prokaryoteswithin eukaryotes than prokaryotes The merge strategy is essential in The merge strategy is essential in
eukaryotic genomes searcheukaryotic genomes search
Our thanksOur thanks
Dr. Danny BarashDr. Danny Barash
Adaya CohenAdaya Cohen