bioinformatics enabling knowledge generation from agricultural omics data
DESCRIPTION
bioinformatics enabling knowledge generation from agricultural omics data. AgBase:. Fiona McCarthy. Summary. ‘omics’ technologies: the ‘data deluge’ organising data: bioinformatics and biocuration data sharing and analysis: bio-ontologies from data to knowledge - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/1.jpg)
bioinformatics enabling knowledge generation
from agricultural omics data
Fiona McCarthy
AgBase:
![Page 2: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/2.jpg)
Summary ‘omics’ technologies: the ‘data deluge’ organising data: bioinformatics and
biocuration data sharing and analysis: bio-ontologies from data to knowledge making sense of agricultural data
![Page 3: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/3.jpg)
Databases and Biological Data The number of databases has increased
Sequence repositories: NCBI, EMBL, DDJB Model Organism Databases (MODs) Specialist biological databases or ‘knowledge
databases’ (eg, InterPro, interaction databases, gene expression data)
Need to connect information in different databases
Databases are increasing in size and complexity
![Page 4: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/4.jpg)
0
5000
10000
15000
20000
25000
‘00 ‘01 ‘02 ‘03 ‘04 ‘05 ‘06 ‘07 ‘08 ‘09
No.
0
2
4
6
8
10
12
14
16
18
70 75 80 85 90 95 00 05
No. x 106
![Page 5: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/5.jpg)
Generating Biological Data Amount of biological data is increasing
exponentially Completed and ongoing genome
sequencing projects High throughput “omics” technologies
New sequencing technologies Existing microarrays Proteomics
![Page 6: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/6.jpg)
![Page 7: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/7.jpg)
Biocomputing Technologies enable ‘omics’ technologies
to move from large database/consortiums into individual laboratories
Managing this data: acquire store access analyze visualize share
![Page 8: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/8.jpg)
NIH WORKING DEFINITION OF BIOINFORMATICS ANDCOMPUTATIONAL BIOLOGY
Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.
Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.
![Page 9: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/9.jpg)
Bioinformatics Managing data
different file formats linking between different databases
Adding value multiple levels of information from one ‘omics’
data set re-analysis linking data sets
Organizing annotating data biocuration - annotation
![Page 10: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/10.jpg)
Annotation ANNOTATE: to denote or demarcate Genome annotation is the process of
attaching biological information to genomic sequences. It consists of two main steps:
1. identifying functional elements in the genome: “structural annotation”
2. attaching biological information to these elements: “functional annotation”
![Page 11: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/11.jpg)
Community Annotation Researchers are the domain experts – but
relatively few contribute to annotation time 'reward' & 'employer/funding agency recognition' training – easy to use tools, clear instructions
Required submission Community annotation
Groups with special interest do focused annotation or ontology development
As part of a meeting/conference or distributed (eg. wikis)
Students!
![Page 12: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/12.jpg)
Biocuration biocurators are biologists who are trained
to annotate biological data (using database structures, bio-ontologies, etc).
databases use biocuration to enhance value of biological data “knowledge databases”
but how to ensure data consistency between databases?
![Page 13: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/13.jpg)
What Are Ontologies?“An ontology is a controlled vocabulary of well defined terms with specified relationships between those terms, capable of interpretation by both humans and computers.” Bio-ontologies are used to capture biological
information in a way that can be read by both humans and computers annotate data in a consistent way allows data sharing across databases allows computational analysis of high-throughput
“omics” datasets Objects in an ontology (eg. genes, cell types, tissue
types, stages of development) are well defined.
The ontology shows how the objects relate to each other
![Page 14: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/14.jpg)
Ontologies
digital identifier(computers)
description(humans)
relationships between terms
Gene Ontology version 1.1348 (27/07/2010):
32,091 terms, 99.3% defined
19,169 biological process 2,745 cellular component 8,736 molecular function
1,441 obsolete terms (not included in figures above)
![Page 15: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/15.jpg)
![Page 16: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/16.jpg)
Relationships: the True Path Rule Why are relationships between terms
important? TRUE PATH RULE: all attributes of
children must hold for all parents so if a protein is annotated to a term, it
must also be true for all the parent terms this enables us to move up the ontology
structure from a granular term to a broader term
Premise of many GO anaylsis tools
![Page 17: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/17.jpg)
Genomic AnnotationStructural Annotation: Open reading frames (ORFs) predicted during genome
assembly predicted ORFs require experimental confirmation
Functional Annotation: annotation of gene products = Gene Ontology (GO)
annotation initially, predicted ORFs have no functional literature
and GO annotation relies on computational methods (rapid)
functional literature exists for many genes/proteins prior to genome sequencing
Gene Ontology annotation does not rely on a completed genome sequence
![Page 18: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/18.jpg)
Functional annotation usingGene Ontology
Nomenclature(species’ genome nomenclature committees)
Other annotations
using other bio-ontologies e.g.
AnatomyOntology
Structural Annotationincluding Sequence Ontology
Genomic Annotation
![Page 19: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/19.jpg)
http://obo.sourceforge.net/
Gene Ontology Plant Ontology
Sequence OntologyTrait Ontology
Expression/Tissue OntologiesInfectious Disease Ontology
Cell Ontology
![Page 20: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/20.jpg)
bio-ontologies (Open Biomedical Ontologies) computational pipelines (‘breadth’)
for computational annotations useful for gene products without published information
manual biocuration (‘depth’) requires trained biocurators community annotation efforts each species has its own body of literature
biocuration co-ordination MODs? Consortium? Community? biocuration prioritization co-ordination with existing Dbs, annotation, nomenclature
initiatives data updates
Bio-ontology requirements
![Page 21: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/21.jpg)
Gene Ontology (GO) de facto method for functional annotation Assigns functions based upon Biological
Process, Molecular Function, Cellular Component
Widely used for functional genomics (high throughput)
Many tools available for gene expression analysis using GO
http://www.geneontology.org
![Page 22: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/22.jpg)
Plant Ontology (PO) describes plant structures and growth and
developmental stages Currently used for Arabidopsis, maize, rice – more
being added (soybean, tomato, cotton, etc) Plant Structure: describes morphological and
anatomical structures representing organ, tissue and cell types
Growth and developmental stages: describes (i) whole plant growth stages and (ii) plant structure developmental stages
http://www.plantontology.org/
![Page 23: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/23.jpg)
Use GO for…….1. Determining which classes of gene products
are over-represented or under-represented. 2. Grouping gene products.3. Relating a protein’s location to its function.4. Focusing on particular biological pathways
and functions (hypothesis-testing).
![Page 24: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/24.jpg)
OntologiesPathways & Networks
GO Cellular Component
GO Biological Process
GO Molecular Function
BRENDA
Pathway Studio 5.0
Ingenuity Pathway Analyses
Cytoscape
Interactome Databases
Functional Understanding
![Page 25: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/25.jpg)
http://www.agbase.msstate.edu/
![Page 26: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/26.jpg)
1. Provides structural annotation for agriculturally important genomes
2. Provides functional annotation (GO)3. Provides tools for functional modeling4. Provides bioinformatics & modeling
support for research community
![Page 27: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/27.jpg)
Avian Gene Nomenclature
![Page 28: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/28.jpg)
![Page 29: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/29.jpg)
GO & PO: literature annotation for rice, computational annotation for rice, maize, sorghum, Brachypodia
1. Literature annotation for Agrobacterium tumefaciens, Dickeya dadantii, Magnaporthe grisea, Oomycetes
2. Computational annotation for Pseudomonas syringae pv tomato, Phytophthora spp and the nematode Meloidogyne hapla.
Literature annotation for chicken, cow, maize, cotton;
Computational annotation for agricultural species & pathogens.
literature annotation for human; computational annotation for UniProtKB entries (237,201 taxa).
![Page 30: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/30.jpg)
![Page 31: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/31.jpg)
Comparing AgBase & EBI-GOA Annotations
computational
manual - sequence
manual - literature
Gen
e P
rod
uct
s an
no
tate
d
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
AgBaseChick
EBI-GOAChick
AgBaseCow
EBI-GOACow
Project
Complementary to EBI-GOA: Genbank proteins not represented in UniProt & EST sequences on arrays
![Page 32: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/32.jpg)
AgBase EBI GOA
EBI-IntAct
Roslin
HGNC
UCL-Heart project
MGI
Reactome
Contribution to GO Literature Biocuration
Chicken
Cow
< 0.50%
< 1.50%
97.82%
88.78%
![Page 33: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/33.jpg)
AgBase Biocurators
AgBasebiocuration
interface
AgBase database
‘sanity’ check
‘sanity’ check& GOC QC
EBI GOA Project
GO Consortiumdatabase
‘sanity’ check& GOC
QC ‘sanity’ check
GO analysis tools Microarray developers
UniProt dbQuickGO browserGO analysis toolsMicroarray developers
Public databases AmiGO browserGO analysis toolsMicroarray developers
AgBase Quality Checks & Releases
‘sanity’ check: checks to ensure all appropriate information is captured, no obsolete GO:IDs are used, etc.
![Page 34: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/34.jpg)
Quality improvement Microarray annotations
![Page 35: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/35.jpg)
![Page 36: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/36.jpg)
IITA Crops cowpea – “reduced representation” sequencing underway soybean - preliminary assembly banana - sequencing in progress yam - genome sequencing for Dioscorea alata – EST development (IITA & VSU) cassava - genome sequencing in progress maize - genome sequencing completed; other subspecies being sequenced
![Page 37: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/37.jpg)
Cowpea 54,123 genome sequences 187,483 ESTs Annotated via homology to Arabidopsis &
other plants GO annotation via homology – availability?
![Page 38: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/38.jpg)
Soybean NCBI: 1,459,639 ESTs, 34,946 proteins,
2,882 genes UniProt: 12,837 proteins (EBI GOA
automatic GO annotation) UniGene assemblies available multiple microarrays available
![Page 39: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/39.jpg)
![Page 40: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/40.jpg)
![Page 41: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/41.jpg)
Banana
7,102 genome sequences 14,864 ESTs 1,399 NCBI proteins; 680 UniProt Musa acuminata (sweet banana): 3,898
GO annotations to 491 proteins Musa acuminata AAA Group (Cavendish
banana): 579 annotations to 96 proteins
![Page 42: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/42.jpg)
Plantain Musa ABB Group (taxon:214693) - cooking
banana or plantain 11,070 ESTs, 112 proteins 173 GO annotations to 53 proteins functional genomics based on banana?
![Page 43: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/43.jpg)
Yams
55577 Dioscorea rotundata white yam55571 Dioscorea alata water yam29710 Dioscorea cayenensis yellow yam
Dioscorea (taxon:4672) & subspecies NCBI: 31 ESTs, 623 proteins Genome sequencing for Dioscorea alata – EST
development (IITA & VSU) 183 GO annotations to 25 proteins
![Page 44: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/44.jpg)
Cassava ESTs: 80,631 NCBI proteins: 568, UniProt:253 2,251 GO annotations assigned to 218 proteins 2 Euphorbia esula (leafy spurge) /cassava arrays
![Page 45: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/45.jpg)
Maize Zea mays (taxon:4577) Genome sequencing completed by
Washington University – other subspecies being sequenced
Active GO annotation project - 131,925 GO annotations to 20,288 proteins
![Page 46: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/46.jpg)
![Page 47: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/47.jpg)
![Page 48: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/48.jpg)
![Page 49: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/49.jpg)
AgBase Collaborative Model How can we help you? Can make GO annotations public via the
GO Consortium Have computational pipelines to do rapid,
first pass GO annotation (including transcript/EST sequences)
Provide bioinformatics support for collaborators
Developing new tools Training/support for modeling data
![Page 50: bioinformatics enabling knowledge generation from agricultural omics data](https://reader036.vdocument.in/reader036/viewer/2022062808/568152fb550346895dc11c0d/html5/thumbnails/50.jpg)
Dr Susan Bridges
Divya Pedinti
Dr Teresia Buza
Philippe Chouvarine
Cathy Grisham
Lakshmi Pillai
Hui WangSeval Ozkan