so meets rnao
DESCRIPTION
SO meets RNAO. Karen Eilbeck University of Utah RNAO Consortium Meeting May 28-29 2007. What SO is. How SO is used How SO is managed Where do SO and RNAO meet How SO and RNAO can work together If we have time - a demo of OBO-Edit. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/1.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
SO meets RNAO
Karen Eilbeck
University of Utah
RNAO Consortium Meeting
May 28-29 2007
![Page 2: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/2.jpg)
• What SO is.
• How SO is used
• How SO is managed
• Where do SO and RNAO meet
• How SO and RNAO can work together
• If we have time - a demo of OBO-Edit
![Page 3: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/3.jpg)
The Sequence Ontology describes the features of
biological sequence• Genome sequence
• Annotation of regions
• Coordinates
• Need to agree on meaning of terms. E.g. Does the CDS include the stop codon?
![Page 4: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/4.jpg)
An annotation captures what we know about a gene
Ann
otat
ions
evid
ence
3 Alternate transcripts of Glut1 gene
5’ UTRStart codon
Coding exon
Transposon within intron
![Page 5: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/5.jpg)
Structure of the ontology• SO is structured into a directed acyclic
graph. transcript exon
processed transcript
primary transcript
intron
clip
splice site
polyA site
protein coding primary transcript
nc primary transcript
mRNAncRNA CDS
UTR
PP
P
P
P
P
P
i
P
ii
ii
five_prime_UTR three_prime_UTR
i itRNA rRNA
ii
i
d
![Page 6: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/6.jpg)
GFF3• SO is used to ‘type’ the features and
relationships.
Id type start end strand attributesctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;Parent=gene00001 ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001;Name=EDEN.1 ctg123 . mRNA 1050 9000 . + . ID=mRNA00002;Parent=gene00001;Name=EDEN.2 ctg123 . mRNA 1300 9000 . + . ID=mRNA00003;Parent=gene00001;Name=EDEN.3 ctg123 . exon 1300 1500 . + . ID=exon00001;Parent=mRNA00003 ctg123 . exon 1050 1500 . + . ID=exon00002;Parent=mRNA00001,mRNA00002ctg123 . exon 3000 3902 . + . ID=exon00003;Parent=mRNA00001,mRNA00003 ctg123 . exon 5000 5500 . + . ID=exon00004;Parent=mRNA00001,mRNA00002,mRNA00003 ctg123 . exon 7000 9000 . + . ID=exon00005;Parent=mRNA00001,mRNA00002,mRNA00003
relationshipsterms
![Page 7: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/7.jpg)
Why we made SO
• Standardize vocabulary used in genomics.
• Clarify the relationships between the terms.
• Make genomics data more computable by adding semantics to the sequence. Its not just about sequence similarity.
![Page 8: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/8.jpg)
What is the scope of SO?
• Features that can be located on a sequence with coordinates. exon, promoter, binding_site
• Properties of these features:– Sequence attributes
• Maternally_imprinted
– Consequences of mutation• mutation_affecting_editing
– Chromosome variation• aneuploid
![Page 9: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/9.jpg)
The SO community
• Model Organism DB– SGD– (MGI)– FlyBase– WormBase– DictyBase– Pombe
• GMOD• Comparative
genomics• MGED Ontology• NLP
![Page 10: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/10.jpg)
Genome annotation unification
• The model organism databases use SO to type their features.
• The GFF3 file format for annotation, the Chado db schema and DAS2 annotation protocol rely on SO to type features.
![Page 11: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/11.jpg)
Genomic analysis
• The Comparative Genomics Library written in Perl uses SO based annotations to perform complex analysis over multiple genomes.– Yandell M, Mungall CJ, Smith C, Prochnik S,
Kaminker J, Hartzell G, Lewis S, Rubin GM. 2006. Large-Scale Trends in the Evolution of Gene Structures within 11 Animal Genomes. PLoS Comput Biol. 2:e15
![Page 12: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/12.jpg)
Genome data integration
• Multiple genomes are organized using SO:– Flymine, – Gramene, – the BRCs
![Page 13: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/13.jpg)
NLP/text mining
• Recently SO have been used for some new projects - – Semantic enrichment by the Royal Society
of Chemistry.– Anaphora resolution by the NLIP group in
Cambridge.
![Page 14: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/14.jpg)
How SO is managed
• SO uses CVS to manage and version the ontology.
• There is a mailing list for developers to get things off their chest.
• There is a tracker for term suggestions• There are workshops when we get a
critical mass for a given problem. We want to do more workshops.
• SO is expressed in OBO format.
![Page 15: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/15.jpg)
Example of OBO format• http://www.geneontology.org/GO.format.obo-1_2.sht
ml
[Term]id: SO:0000587name: group_I_introndef: "Group I catalytic introns are large self-splicing ribozymes. They catalyse their own excision from mRNA, tRNA and rRNA precursors in a wide range of organisms. The core secondary structure consists of 9 paired regions (P1-P9). These fold to essentially two domains, the P4-P6 domain (formed from the stacking of P5, P4, P6 and P6a helices) and the P3-P9 domain (formed from the P8, P3, P7 and P9 helices). Group I catalytic introns often have long ORFs inserted in loop regions." [http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00028]subset: SOFAis_a: SO:0000188 ! intron
![Page 16: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/16.jpg)
OBO and OWL
• http://purl.org/obo/owl/SO
• Mapping OBO and OWL http://www.bioontology.org/wiki/index.php/OboInOwl:Main_Page
![Page 17: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/17.jpg)
Navigate SO using OBO-Edit
Structure of the ontology
Search the ontology
Details for selected term
All parents of the term
![Page 18: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/18.jpg)
Annotating with SO and RNAO
The nanos translational control element represses translation in somatic cells by a Bearded box-like motif. ・ Duchow HK, Brechbiel JL, Chatterjee S, Gavis ER. Developmental Biology Volume 282, Issue 1, 1 June 2005, Pages 207-217
• AGAGGGCGAATCCAGCTCTGGAGCAGAGGCTCTGGCAGCTTTTGCAGCGTTTATATAACATGAAATATATATACGCATTCCGATCAAAGCTGGGTTAACCAGATAGATAGATAGTAACGTTTAAATAGCGCCTGGCGCGTTCGATTTTAAAGAGATTTAGAGCGTTATCCCGTGCCTATAGATCTTATAGTATAGACAACGAACGATCACTCAAATCCAAGTCAATAATTCAAGAATTTATGTCTGTTTCTGTGAAAGGGAAACTAATTTTGTTAAAGAAGACTTACAATATCGTAATACTTGTTCAATCGTCGTGGCCGATAGAAATATCTTACAATCCGAAAGTTGATGAATGGAATTGGTCTGCAACTGGTCGCCTTCATTTCGTAAAATGTTCGCTTGCGGCCGAAAAATTTCGATATATCTACAATTGATCTACAATCTTTACTAAATTTTGAAAAAGGAACACTTTGAATTTCGAACTGTCAATCGTATCATTAGAATTTAATCTAAATTTAAATCTTGCTAAAGGAAATAGCAAGGAACACTTTCGTCGTCGGCTACGCATTCATTGTAAAATTTTAAATTTTGACATTCCGCACTTTTTGATAGATAAGCGAAGAGTATTTTTATTACATGTATCGCAAGTATTCATTTCAACACACATATCTATATATATATATATATATATATATATATATATATATATATATGTTATATATTTATTCAATTTTGTTTACCATTGATCAATTTTTCACACATGAAACAACCGCCAGCATTATATAATTTTTTTATTTTTTTAAAAAATGTGTACACATATTCTGAAAATGAAAAATTCAATGGCTCGAGTGCCAAATAAAGAAATGGTTACAATTTAAGG
Translational control element
![Page 19: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/19.jpg)
Overlap with RNAO
• SO provides regions of sequence - start and stop coordinates with regards to the whole sequence - i.e. assembly / chromosome– Transcripts and parts of transcripts– Some secondary structure – Some motifs– Results of algorithms such as blast
![Page 20: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/20.jpg)
SO names features
![Page 21: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/21.jpg)
Secondary structure
• This part of SO needs work.
• Any volunteers?
![Page 22: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/22.jpg)
Divergent from RNAO
• Where do SO and RNAO differ dramatically?– Multiple sequence alignments. SO does
not provide a solution to this. It does however provide the terms to describe the results of sequence similarity searches.
– Numerical results. SO has not needed to use values so far.
![Page 23: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/23.jpg)
RNAO working groups
• Motif identification/annotation
• RNA interaction
• Biochemical-structure mapping
• Multiple sequence alignment
• Backbone conformation
• Base stacking
![Page 24: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/24.jpg)
Working together
• Remain 2 separate ontologies.
• Give SO annotators option of ‘importing’ RNAO terms using the OBO programs
• SO and RNAO work together to align key terms in their ontologies.
![Page 25: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/25.jpg)
SO is still evolving
• RNAO could use the SO features to describe regions of sequence
• SO could reference RNAO for detailed annotation of structure and biochemical features.
![Page 26: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/26.jpg)
Multiple ontologies in OBO
• 2 options.1. The ontologies reference each other:
• Will always need to load both ontologies
2. There is a mapping file that you can load to import external terms.• Maintain separate ontologies and keep
mapping up to date.
http://obofoundry.org/wiki/index.php/Mappings
![Page 27: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/27.jpg)
Example: Importing terms from SCOR.
• 1. Made an OBO file from a subset of SCOR terms
• 2. Work out where there is overlap
• 3. Make OBO mapping file between the two ontologies
• 4. Load all 3 files at once.
![Page 28: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/28.jpg)
format-version: 1.2date: 16:05:2007 15:26saved-by: kareneilbeckauto-generated-by: OBO-Edit 1.100
[Term]id: SC:0000000name: hairpin_loop
[Term]id: SC:0000001name: diloopis_a: SC:0000000 ! hairpin_loop
[Term]id: SC:0000002name: triloopis_a: SC:0000000 ! hairpin_loop
…
format-version: 1.2date: 24:05:2007 10:37saved-by: kareneilbeck
import: so-xp.oboimport: scor2.obo
id: SC:0000015 hairpin loopis_a: SO:0000715 is_a RNA motif
id: SC:0000016 internal loopis_a: SO:0000715 is_a RNA motif
id: SC:0000035 tertiary interactionis_a: SO:0000122 is_a RNA sequence
secondary structure
scor.obo mapping file
![Page 29: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/29.jpg)
OBO-Edit DEMO
• Fingers crossed…
![Page 30: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/30.jpg)
Possible action items
• A SO-RNAO mailing list for discussion of collaboration
• Phone/skype/webinars at intervals to keep track of progress.
![Page 31: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/31.jpg)
• GFF3 http://www.sequenceontology.org/gff3.shtml
• Apollo http://www.fruitfly.org/annot/apollo/
• SO http://www.sequenceontology.org
• OBO-Edit http://sourceforge.net/projects/geneontology
• OBO foundry http://www.obofoundry.org
• GO-perl http://www.godatabase.org/dev/go-perl/doc/go-perl-doc.html
Resources
![Page 32: SO meets RNAO](https://reader035.vdocument.in/reader035/viewer/2022062803/56814865550346895db572eb/html5/thumbnails/32.jpg)
Acknowledgements• SO is funded as part fo the Gene
Ontology Consortium, via the NIH P41-HG002274
• People:– Suzi Lewis and Michael Ashburner - the vision– Chris Mungall - programming infrastructure– John Richter - made OBO-Edit