bits: overview of important biological databases beyond sequences
DESCRIPTION
Module 4 Other relevant biological data sources beyond sequences Part of training session "Basic Bioinformatics concepts, databases and tools" - http://www.bits.vib.be/trainingTRANSCRIPT
![Page 1: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/1.jpg)
Basic bioinformatics concepts, databases and tools
Module 4
Beyond the sequences
Dr. Joachim Jacob
http://www.bits.vib.be
Updated Nov 2011http://dl.dropbox.com/u/18352887/BITS_training_material/Link%20to%20mod4-intro_H1_2011_otherRelevantData.pdf
![Page 2: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/2.jpg)
Module 4 broadens our view
![Page 3: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/3.jpg)
To understand life, we need not only sequences, but many other concepts
Bioinformatics is also storing and analyzing− gene information: variations, isoforms,...
− Expression data
− 3D protein structure data
− Interaction data
− Pathways and network
“Storing all relevant biological data”
![Page 4: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/4.jpg)
Schematic view II
GeneA sequence annotations – gene expr – pathway – struct,...
GeneB sequence annotations – gene expr – pathway – struct,...
GeneC sequence annotations – gene expr – pathway – struct,...
analysis
Primary databaseOther sequence databases
results
Additional information sources
results
![Page 5: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/5.jpg)
The indispensable databases
Gene Ontology – structuring KEGG – biochemical pathways PDB – Structure of proteins Intact – Interaction data dbSNP – database of genomic variation Expression sources – Microarray data
![Page 6: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/6.jpg)
Gene Ontology structures the way we communicate about life
http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax.pdf
http://www.arabidopsis.org/help/tutorials/go1.jsp
Gene translation Protein synthesisProtein production
![Page 7: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/7.jpg)
Gene Ontology structures life
http://www.geneontology.org/
Agreement on standardized keywords (often referred to as 'controlled vocabularies'), describing all natural processes in an hierarchical way (ontology).
Keywords are assigned to genes based different evidence
Keywords are ordered in a hierarchical tree-like structure ( 'directed acyclic graphs')
Three GO 'trees' exists, describing:
"Biological Process"
"Cellular Component"
"Molecular Function"
http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax.pdf
http://www.arabidopsis.org/help/tutorials/go1.jsp
![Page 8: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/8.jpg)
A gene can be given different GO terms
Example, cytochrome c:
molecular function: oxidoreductase activity,
biological process: oxidative phosphorylation and induction of cell death,
cellular component: mitochondrial matrix and mitochondrial inner membrane.
In each tree, the terms are organised in a directed acyclic graph: a network consisting of parents and child-terms (as nodes) and lines between them as relationships.
![Page 9: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/9.jpg)
![Page 10: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/10.jpg)
Different evidence codes can assign a degree of confidence to the assignment
http://www.geneontology.org/GO.evidence.shtml
Evidence codes can be grouped by: Experimental (e.g. IDA – inferred from direct assay)
Computational analysis
Author statement
Curator statement
Inferred from electronic annotation (IEA)
If available, each annotation has also a reference
![Page 11: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/11.jpg)
Different evidence codes can assign a degree of confidence to the assignment
![Page 12: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/12.jpg)
Gene Ontology structures all genes according to their biological significance
The GO structure and the terms can be browsed by a browser called AmiGO.
The Quick Go from EBI has some nice visualisation
Excellent GO-wiki for all your questions
![Page 13: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/13.jpg)
GO can be used to retrieve all gene (products) related to one specific term
You can search broad, e.g. Amigo search for Diabetes leads to following GO term
http://amigo.geneontology.org/
![Page 14: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/14.jpg)
GO can be used to retrieve all gene (products) related to one specific term
Amigo search for Diabetes
![Page 15: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/15.jpg)
GO can be used to retrieve all gene (products) related to one specific term
Amigo search for Diabetes
![Page 16: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/16.jpg)
GO is also useful to analyze and compare different gene lists
A lot of tools on GO are available on website.
http://www.geneontology.org/GO.tools.shtml
![Page 17: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/17.jpg)
Some things to know about GO
For analyses, one can make use of 'shrinked' GO sets, the so-called GO-slims
– GO slims are a subset of biologically more relevant GO terms (available per species)
– GO ontologies can be downloaded in .obo format.
Not all information is captured by GO and need to be retrieved in other databases
Metabolic pathways: KEGG, …
Phenotype/diseases
• Mapping files exists e.g. kegg2go
http://www.geneontology.org/GO.slims.shtml
![Page 18: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/18.jpg)
Biological pathways databases organise genes by molecular reactions
3 important databases on biological pathways
http://www.kegg.jp/
http://www.reactome.org/ - EBI
http://metacyc.org
![Page 19: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/19.jpg)
Proteins with enzymatic function receive an Enzyme Commission (EC) number
http://www.chem.qmul.ac.uk/iubmb/enzyme/
EC 6 Ligases
EC 5 Isomerases
EC 4 Lyases
EC 3 Hydrolases
EC 2 Transferases
EC 1 Oxidoreductases
![Page 20: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/20.jpg)
IntAct database contains interaction information of proteins
http://www.ebi.ac.uk/intact
Three types of interactions stored Protein-protein Protein-dna Protein-small molecule
![Page 21: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/21.jpg)
IntAct database represents all interactions as binary: caution!
![Page 22: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/22.jpg)
Interaction networks can be analysed on your computer using Cytoscape
Cytoscape training material on the BITS website
![Page 23: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/23.jpg)
PDB hosts 3-dimensional structural data on molecules
![Page 24: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/24.jpg)
PDB hosts 3-dimensional structural data on molecules
PDB = Protein DataBankhttp://www.pdb.org/pdb/home/home.do
Only structures resolved through NMR and X-ray (or other accurate techniques)
Proteins DNA RNA Ligands
Understanding PDB data: tutorial
![Page 25: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/25.jpg)
PDB files can be read by a lot of different tools to display the structure
Every entry in PDB contains its own PDB accession number (often 1 digit and three letters)
The PDB file contains 3D coordinates from every single atom in the structure, together with variability of that position (last two digits)
http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203817:protein-structure-analysis-training&catid=81:training-pages&Itemid=190
![Page 26: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/26.jpg)
PDB files can be read by a lot of different tools to display the structure
Tools to visualize (and some to analyze structures) (see BITS wiki)
http://www.bits.vib.be/wiki/index.php/Protein_structure
![Page 27: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/27.jpg)
To find a structure for your protein sequence is to search for similarity
Homology modeling
Similarity on sequence level projected to a structure Blast your query against PDB db by cblast , or at expasy
PSI-BLAST - can detect sequences with similar structures (twilight zone!)
If still no success: 3D-jury (a meta approach, including fold recognition and local structure prediction)
Similarity on structural level: aligning structures VAST (structure)
Distance mAtrix aLIgnment DALI
http://www.ii.uib.no/~slars/bioinfocourse/PDFs/structpred_tutorial.pdfhttp://consurf.tau.ac.il/pe/protexpl/psbiores.htm
BITS training on protein structure analysis
Tools at EBI
![Page 28: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/28.jpg)
Structural information is used to classify proteins
SCOP
Groups proteins based on evolutionary, domain architecture and structural information.
CATH
Manually curated classification on protein domains
Database cross-references in PDB entry
http://scop.mrc-lmb.cam.ac.uk/scop/http://www.cathdb.info/
![Page 29: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/29.jpg)
dbSNP is a public-domain archive for simple genetic polymorphisms
Single Nucleotide Polymorphism database (NCBI)
Each dbSNP entry has a code rsxx (RefSNP) or ssxx (submitted SNP) single-base nucleotide substitutions (also known as
single nucleotide polymorphisms or SNPs),
small-scale multi-base deletions or insertions (also called deletion insertion polymorphisms or DIPs)
retroposable element insertions and microsatellite repeat variations (also called short tandem repeats or STRs).
Synchronized with new genome builds
![Page 30: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/30.jpg)
Expression data can be sequence-based or hybridisation-based
Sequence-based (ESTs - RNA seq - SAGE)
Digital gene expression/northern
Microarray databases – hybridisation based: GEO: gene expression omnibus (NCBI)
− Platform: GPLxxxxxxx
− Experiment: GSExxxxxx (= several samples)
− Sample: GSMxxxxxxxx
− Some experiments are curated: GDSxxxxx (online analysis possible)
ArrayExpress (EBI)
![Page 31: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/31.jpg)
Example of expression data at GEO
![Page 32: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/32.jpg)
Example of expression data at GEO
![Page 33: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/33.jpg)
Example of expression data at GEO
![Page 34: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/34.jpg)
Example at ArrayExpress
![Page 35: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/35.jpg)
Example at ArrayExpress
![Page 36: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/36.jpg)
Entrez interconnects the databases at NCBI for easy querying
UniGene : sequences grouped by gene PopSet : sequence alignments for population
studies and phylogeny Structure : 3D structures (PDB) Genome : genomic maps of chromosomes and
plasmids UniSTS (Sequence Tagged Sites) PubMed : literature abstracts (MEDLINE,…) OMIM (Online Mendelian Inheritance in Man) :
literature reviews, Mesh (Medical Subject Headings) : keywords Taxonomy
![Page 37: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/37.jpg)
Finding relevant data
![Page 38: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/38.jpg)
Summarizing most important links to discover everything you need ...
Protein dataInterpro (heavily integrated with EBI resources)
http://www.interpro.org
Gene dataEntrez at NCBI : 'Entrez Gene'
http://www.ncbi.nlm.nih.gov/Entrez/
Ebeye Search at EBI : excellent for cross-species
http://www.ebi.ac.uk/ebisearch/
![Page 39: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/39.jpg)
Hold back your horses!
Phew, where do I place this all?
![Page 40: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/40.jpg)
Bioinformatics is all about different data, as versatile as life itself
Due to the strong cross-references between different databases, new databases and relevant info are rapidly integrated in existing databases.
You can discover them by taking time to read the entries.
![Page 41: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/41.jpg)
New tools are emerging everyday to enable you to browse all data sources...
BioGPS, all in one window!
![Page 42: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/42.jpg)
New tools are emerging everyday to enable you to browse all data sources...
![Page 43: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/43.jpg)
Integrative resources are increasingly being organised on a species basis
EMAGE database of in situ gene expression in mouse
OMIM Database of diseases in man
Websites providing an interface to integrate all this data is increasingly important
Often organized on a species basis− TAIR
− Flybase
− Wormbase
![Page 44: BITS: Overview of important biological databases beyond sequences](https://reader033.vdocument.in/reader033/viewer/2022051411/546e7548af795967298b5726/html5/thumbnails/44.jpg)
The organizing biological data information by species
By species, why?
There is one biological information resource which stays
more or less unchanged per species ...