iind sem class1

Upload: rajesh-thipparaboina

Post on 05-Apr-2018

237 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 IInd Sem Class1

    1/56

    Introduction to Bioinformatics

    Bioinformatics is a modern discipline integrating differentbranches of science i.e. Biology, Chemistry & Information technology.

    Informatics related to Biological and Medical sciences:

    Bioinformatics

    Structural Bioinformatics

    Medical Informatics

    Chemoinformatics

    Pharmacy Informatics

    Clinical Informatics

  • 7/31/2019 IInd Sem Class1

    2/56

    Bioinformatics has a strong interdisciplinary character.

    It can be considered to be a confluence of Biology,

    Computer Science, Information Technology,

    Mathematics, Chemistry, Physics, and Medicine with

    the objectives of developing tools to analyze biological,

    biochemical, biophysical data and to generate new

    knowledge in these areas. It is a fact that persons

    trained and skilled in these multifarious ways do not

    exist, and if this area is to develop in our country these

    persons will have to be trained and produced.

  • 7/31/2019 IInd Sem Class1

    3/56

    In other wordsBioinformatics is

    The combination of biology and information technology.

    It is a branch of science that deals with the computerbased analysis of large biological data sets.

    It incorporates the development of databases to storeand search data, and of statistical tools and algorithmsto analyze and determine relationships between

    biological sets, such as macromolecular sequences,structures, expression profiles and biochemical

    pathways.

  • 7/31/2019 IInd Sem Class1

    4/56

    DNA RNA Protein synthesis

  • 7/31/2019 IInd Sem Class1

    5/56

    Development of

    New scientific methods,

    Algorithms for managing large amounts of sequence and structuraldata

    As the full genome sequences of many species, data from structural

    genomics, micro-arrays, and proteomics became available, integration of

    these data to a common platform require sophisticated bioinformatics

    tools. {Sequence-Structure-Function }.

    Organizing these data into knowledgeable databases and developingappropriate software tools for analyzing the same are going to be majorchallenges.

    India as a major player in IT industry, has the potential to develop suchresources at an affordable cost.

    COMPUTERS IN BIOLOGY

  • 7/31/2019 IInd Sem Class1

    6/56

    Targetprotein

    sequence

    Largescale

    Docking

    Homologymodeling of

    target protein

    Crystalstructure of

    targetprotein

    Virtual library ofcompounds orQSAR analysis

    Confirmusing Crystallo-graphy, Kinetic

    analysis

    Leadidentification

    & Leadoptimization

    Compounddevelopment

    (Drug)

    Fig: Schematic outline of the application of SB (homology modeling) and X-ray

    crystallography (structural molecular biology) in drug discovery process.

    Structural Bioinformatics in Drug Discovery

  • 7/31/2019 IInd Sem Class1

    7/56

    Table : Some important structural bioinformatics databases/ resources/ tools:

    S.No.Database and its importance URL

    1. National Center for BiotechnologyInformation (NCBI): Provides ageneral search for nucleotidesequences, protein sequences,biomolecule 3D structures,

    genomes, taxonomy or literature.

    http://www.ncbi.nlm.nih.gov/Entrez/

    2. Structural Genomics TargetDatabase (sgtdb): 3-D models of allsequences under investigation bystructural genomics centers.

    http://spam.sdsc.edu/

    3. Structure Comparison Database(CE): Pair-wise structurecomparisons based on theCombinatorial Extension (CE)Algorithm for both a representativeset and complete set of protein

    structures; includes alignments.

    http://cl.sdsc.edu/ce.html

    COMPUTERS IN BIOLOGY

  • 7/31/2019 IInd Sem Class1

    8/56

    4. CKAAP DB:Database ofstructures with Conserved KeyAmino Acid Positions.

    http://ckaaps.sdsc.edu/perl/browser.pl

    5. Protein Data Bank (PDB): Thesingle worldwide source ofprimary structural data onbiological macromoleculesdetermined experimentally.

    http://www.rcsb.org/pdb

    6. Extended GO Annotation of PDBChains: Use of structurecomparison to extend thecoverage of GO terms in the PDB.

    http://spdc.sdsc.edu/

    7. The PDBbind database is

    designed to provide a collectionof experimentally measuredbinding affinity data (Kd, Ki, andIC50) exclusively for the protein-ligand complexes available inPDB.

    http://www.pdbbind.org/

    COMPUTERS IN BIOLOGY

  • 7/31/2019 IInd Sem Class1

    9/56

    BioinformaticsInformation Resources And Networks

  • 7/31/2019 IInd Sem Class1

    10/56

    Outline

    Bioinformatics Information Resources And Networks

    EMBnet European Molecular Biology Network DBs and Tools

    NCBI National Center For Biotechnology Information

    DBs and Tools

    Nucleic Acid Sequence Databases

    Protein Information Resources

    Metabolic Databases

    Mapping Databases

    Databases concerning Mutations

    Literature Databases

  • 7/31/2019 IInd Sem Class1

    11/56

    EMBnet EuropeanMolecular Biology Network

    Founded in 1988

    Network that links European laboratories that use

    biocomputing and bioinformatics in molecular biologyresearch

    is a science-based group of collaborating nodes throughoutEurope and nodes outside Europe

    provides information, services and training to the users

    efforts to increase the availability and

    accessibility of data resources and

    computing tools

    increase knowledge and proficiency in bioinformaticsthrough education and training

    http://www.embnet.org/http://www.embnet.org/http://www.embnet.org/http://www.embnet.org/http://www.embnet.org/
  • 7/31/2019 IInd Sem Class1

    12/56

    EMBnet - Nodes

    Specialist

    Nodes(9)

    Associate

    Nodes(11)

    NationalNodes

    (18)

    EMBnet(41 nodes)

    governmental

    academic, industrialresearch centers

    Biocomputing centers fromnon European countries

  • 7/31/2019 IInd Sem Class1

    13/56

    EMBnet - Nodes

    Appointed by thegovernments

    Provide on-lineservices, user supportand training

    National NodesVienna Biocenter - Austria BEN - Belgium

    CSC - Finland INFOBIOGEN - France

    DKFZ - Germany HEN - Hungary

    INCBI - Ireland INN - Israel

    IEN-AdR - Italy CMBI - Netherlands

    Bio - Norway IBB - Poland

    PEN - Portugal GeneBee - Russia

    CNB-CSIC - Spain BMC - Sweden

    SIB - Switzerland SEQNET - UK

  • 7/31/2019 IInd Sem Class1

    14/56

    Specialist Nodes

    MIPS

    ICGEB

    Pharmarcia

    F.Hoffmann La Roche

    EBI

    HGMP - RC

    Sanger

    UCL

    EMBnet - Nodes

    Academic, industrialor research centers inspecific areas ofbioinformatics

    Largely responsiblefor maintenance ofbiological databasesand software

    Hinxton

    Hall(Cambridge UK)

    Important key specialist

    node and home of:EMBL, SWISS-PROT andTrEMBL databases

    Munich Information Center for protein sequences

  • 7/31/2019 IInd Sem Class1

    15/56

    EMBnet - Nodes

    Centers from nonEuropean countries

    Associate Nodes

    IBBM - Argentina ANGIS - Australia

    CBI - China CIGB - Cuba

    CDFD - India SANBI South Africa

    EMBnet - Brazil CBR - Canada

    EMBnet - Chile EBMnet - Colombia

    CIFN - MEXICO

  • 7/31/2019 IInd Sem Class1

    16/56

    EMBnets Mission

    Assist in biotechnological and bioinformaticsrelated research

    Provide training and education

    Exploit network infrastructures

    Investigate and develop new technologies

    Bridge between commercial and academic sectors

  • 7/31/2019 IInd Sem Class1

    17/56

    Who are EMBnets Users?

    > 40,000 registered users from all over theworld as well as a larger number ofInternet users

    All scientists working in Life Sciences,from undergraduate students to top levelscientists, in academia as well as industry,

    can get support from EMBnet

  • 7/31/2019 IInd Sem Class1

    18/56

    EMBnets SRS

    Sequence Retrieval System - SRS

    result of a research project with theEMBnet to interrogating all resourcesgathered together

    SRS is a network browser for DBs inmolecular Biology

    SRS allows any flat-file DB to beindexed to any other

    queries across a range ofdifferent DB types via a singleinterface

    independent of underlying datastructures or query languages

    SpecialistNodes

    AssociateNodes

    NationalNodes

    EMBnet

    htt // bl h id lb d 8000/ 5/

    http://srs.embl-heidelberg.de:8000/srs5/http://srs.embl-heidelberg.de:8000/srs5/
  • 7/31/2019 IInd Sem Class1

    19/56

    http://srs.embl-heidelberg.de:8000/srs5/

    Sequence Retrieval SystemNetwork Browser forDatabanksin Molecular Biology

    Data BankRele

    aseNo Entries Indexing Date Group

    Availa

    bility

    SWISSPROT 163235 10-Jun-2005 Sequence ok

    SWISSNEW 81134 22-Mar-2006 Sequence ok

    NRDB 2269647 29-Mar-2006 Sequence ok

    SWALL 3022528 22-Mar-2006 Sequence ok

    UNIPROT_SPROT 212425 22-Mar-2006 Sequence ok

    UNIPROT_TREMBL 2666963 23-Mar-2006 Sequence ok

    TREMBLNEW 624819 12-Dec-2005 Sequence ok

    TREMBL 2576118 04-Oct-2005 Sequence ok

    http://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pageliblist+-color+yellowWeavehttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SWISSPROThttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SWISSNEWhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+NRDBhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SWALLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+UNIPROT_SPROThttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+UNIPROT_TREMBLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TREMBLNEWhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TREMBLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TREMBLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TREMBLNEWhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+UNIPROT_TREMBLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+UNIPROT_SPROThttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SWALLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+NRDBhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SWISSNEWhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SWISSPROThttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pageliblist+-color+yellowWeave
  • 7/31/2019 IInd Sem Class1

    20/56

    SPTREMBL 1449374 16-Jun-2005 Sequence ok

    SPTREMBLNEW 143140 17-Jun-2005 Sequence ok

    REMTREMBL 92182 20-Jun-2005 Sequence ok

    PIR 283416 16-Jun-2005 Sequence ok

    WORMPEP 19538 16-Jun-2005 Sequence ok

    DROSOPHILA 14100 16-Jun-2005 Sequence ok

    EMBLNEW 4035816 21-Nov-2005 Sequence ok

    EMBL 20343598 30-Dec-2005 Sequence ok

    EMBLEST 31990232 06-Jan-2006 Sequence ok

    EMBLWGS 11106060 24-Sep-2005 Sequence ok

    GENBANK 19233264 18-Nov-2005 Sequence okGENBANKEST 31008556 23-Feb-2006 Sequence ok

    REFSEQP 8006 16-Jun-2005 Sequence ok

    SUBTILIST 1 16-Jun-2005 Sequence ok

    Data Bank No Entries Indexing Date GroupAvaila

    bility

    Availa

    http://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SPTREMBLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SPTREMBLNEWhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+REMTREMBLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PIRhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+WORMPEPhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+DROSOPHILAhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+EMBLNEWhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+EMBLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+EMBLESThttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+EMBLWGShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+GENBANKhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+GENBANKESThttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+REFSEQPhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SUBTILISThttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SUBTILISThttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+REFSEQPhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+GENBANKESThttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+GENBANKhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+EMBLWGShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+EMBLESThttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+EMBLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+EMBLNEWhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+DROSOPHILAhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+WORMPEPhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PIRhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+REMTREMBLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SPTREMBLNEWhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+SPTREMBL
  • 7/31/2019 IInd Sem Class1

    21/56

    PROSITE 1935 22-Mar-2006 SeqRelated ok

    PROSITEDOC 1407 22-Mar-2006 SeqRelated ok

    BLOCKS 4034 16-Jun-2005 SeqRelated ok

    EPD 1375 16-Jun-2005 SeqRelated okENZYME 4173 16-Jun-2005 SeqRelated ok

    PRINTS 865 16-Jun-2005 SeqRelated ok

    TFSITE 4342 07-Apr-2003 TransFac ok

    TFFACTOR 1799 07-Apr-2003 TransFac ok

    TFCELL 816 07-Apr-2003 TransFac ok

    TFCLASS 27 07-Apr-2003 TransFac ok

    TFMATRIX 246 07-Apr-2003 TransFac ok

    TFGENE 1035 07-Apr-2003 TransFac ok

    PDB 34927 08-Feb-2006 Protein3DStruct ok

    DSSP 30832 22-Nov-2005 Protein3DStruct ok

    HSSP 30369 08-Feb-2006 Protein3DStruct ok

    PDBFINDER 35701 28-Mar-2006 Protein3DStruct ok

    NRL3D 6063 16-Jun-2005 Protein3DStruct ok

    FLYGENES 7556 16-Jun-2005 Genome ok

    FLYREFS 0 07-Apr-2003 Genome ok

    OMIM 17004 18-Oct-2005 Mutations okREPTILIA 8364 18-Jan-2006 Others ok

    Data Bank No Entries Indexing Date GroupAvaila

    bility

    http://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PROSITEhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PROSITEDOChttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+BLOCKShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+EPDhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+ENZYMEhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PRINTShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFSITEhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFFACTORhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFCELLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFCLASShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFMATRIXhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFGENEhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PDBhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+DSSPhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+HSSPhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PDBFINDERhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+NRL3Dhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+FLYGENEShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+FLYREFShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+OMIMhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+REPTILIAhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+REPTILIAhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+OMIMhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+FLYREFShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+FLYGENEShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+NRL3Dhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PDBFINDERhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+HSSPhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+DSSPhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PDBhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFGENEhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFMATRIXhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFCLASShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFCELLhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFFACTORhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+TFSITEhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PRINTShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+ENZYMEhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+EPDhttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+BLOCKShttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PROSITEDOChttp://srs.embl-heidelberg.de:8000/srs5bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+PROSITE
  • 7/31/2019 IInd Sem Class1

    22/56

    NCBI National Center ForBiotechnology Information

    Leading Americaninformation provider

    Established in 1988as a division of theNational Library ofMedicine (NLM) Located on the

    campus of the

    National Institute ofHealth (NIH Rockville/Maryland)

    Mission: Development of new information

    technologies to aid ourunderstanding of the molecularand genetic processes that

    underlie health and disease Creation of systems for storing and

    analysing biological information Development of advanced methods

    of computer-based informationprocessing

    Facilitation of user access to DBsand software Co-ordination of efforts to gather

    biotechnology informationworldwide

  • 7/31/2019 IInd Sem Class1

    23/56

    NCBI

    Since 1992 maintenance of GenBank andcollaboration with international nucleotide DBs: EMBLand DDBJ (Japan)

    Providing the Entrez that facilitates to access biologicalDBs (similar to SRS that is provided by the EMBnet)

  • 7/31/2019 IInd Sem Class1

    24/56

  • 7/31/2019 IInd Sem Class1

    25/56

    NCBI - Responsibilities

    administers research on biomedical problems at the molecularlevel using mathematical and computational methods

    maintains collaborations with several NIH (National Institutes ofHealth) institutes, academia, industry, and other governmentalagencies

    promotes scientific communication by sponsoring meetings,workshops, and lecture series supports training on basic and applied research in

    computational biology for postdoctoral fellows through the NIHIntramural Research Program

    engages members of the international scientific community ininformatics research and training through the Scientific Visitors

    Program develops, distributes, supports, and coordinates access to a

    variety of databases and software for the scientific and medicalcommunities

    develops and promotes standards for databases, datadeposition and exchange, and biological nomenclature

    N l i A id S

  • 7/31/2019 IInd Sem Class1

    26/56

    Nucleic Acid Sequence

    Databases

    Nucleic acid sequence Databases

    EMBL (Europe)GenBank (USA)

    DDBJ (Japan)

    ENSEMBL (project between EMBL - EBI and the Sanger Institute)

    dbEST (division of GenBank)

    GSDB (division of GenBank)

    the principal nucleic acid sequence databases are GeneBank,

    EMBL and DDBJ, which each collect a portion of the total sequencedata reported world-wide, and exchange new and updated entrieson a daily basis

    source: http://www3.ebi.ac.uk/Services/DBStats/

    http://www.ensembl.org/http://www.ensembl.org/http://www3.ebi.ac.uk/Services/DBStats/http://www3.ebi.ac.uk/Services/DBStats/
  • 7/31/2019 IInd Sem Class1

    27/56

    Nucleic Acid Sequence Databases - EMBLThis morning the EMBL Database contained 127,450,085,130 nucleotides in

    69,666,551 entries.Breakdown by entry type:

    Entry TypeEntries Nucleotides

    Standard 56,843,150 61,498,109,356Constructed (CON) 497,187 n/a

    Third Party Annotation (TPA) 4,884 334,827,880Whole Genome Shotgun (WGS) 12,318,618 64,837,183,592

    p

    The EMBL Nucleotide Sequence Database (also known as EMBL-Bank)constitutes Europe's primary nucleotide sequence resource. Main sourcesfor DNA and RNA sequences are direct submissions from individualresearchers, genome sequencing projects and patent applications. The

    database is produced in an international collaboration with GenBank (USA)and the DNA Database of Japan (DDBJ). Each of the three groups collects aportion of the total sequence data reported worldwide, and all new andupdated database entries are exchanged between the groups on a dailybasis.

    http://www3.ebi.ac.uk/Services/DBStats/http://www.ebi.ac.uk/embl/Submission/index.htmlhttp://www.ebi.ac.uk/embl/Contact/collaboration.htmlhttp://www.ebi.ac.uk/embl/Contact/collaboration.htmlhttp://www.ebi.ac.uk/embl/Submission/index.htmlhttp://www.ebi.ac.uk/embl/Submission/index.htmlhttp://www.ebi.ac.uk/embl/Submission/index.htmlhttp://www3.ebi.ac.uk/Services/DBStats/
  • 7/31/2019 IInd Sem Class1

    28/56

    Nucleic Acid SequenceDatabases - EMBL

    Total nucleotides(current 127,450,085,130)

    Number of entries(current 69,666,551)

    Ref: EMBL Nucleotide Sequence Database:developments in 2005,

    Nucleic Acids Research, 2006, Vol. 34, D10D15

  • 7/31/2019 IInd Sem Class1

    29/56

    Nucleic Acid Sequence

    Databases - EMBLBy nucleotide count

    Homo

    sapiens

    Mus

    musculus

    Rattus

    norvegicus

    Pan

    troglodytes

    Bostaurus

    Canisfamiliaris

    Monodelphisdomestica

    Daniorerio

    Macacamulatta

    Loxodontaafricana

    Other

  • 7/31/2019 IInd Sem Class1

    30/56

    Nucleic Acid SequenceDatabases GenBank

    GenBank which is produced at NCBI, is splitinto smaller, discrete divisions.

    This facilitates fast, specific searches byrestricting queries to particular database subsets

    During 1992-1997, the level of EST and STS

    data within GenBank grew 10-fold.

    the overall sequence information contributed bysuch partial data was still less than that of higher

    quality sequences in the other major divisions

    Specialised Genomic

  • 7/31/2019 IInd Sem Class1

    31/56

    Specialised GenomicResources

    In addition to the comprehensive DNA sequence DBs,there is a variety of more specialised genomicresources.

    These so called boutique DBs bring focus to species-

    specific genomics and to particular sequencingtechniques.

    Specialised Genomic Resources

    SGD Saccharomyces Genome Database

    UniGene - gene-oriented clusters from GenBank

    TIGR - Databases of The Institute for GenomicResearch

    ACeDB A C.elegans DataBase

  • 7/31/2019 IInd Sem Class1

    32/56

    Specialised GenomicDatabases

    SGD (SaccharomycesGenome Database) SGDTM is a scientific databaseof the molecular biology and genetics of the yeast Saccharomyces cerevisiae.http://genome-www.stanford.edu/Saccharomyces

    AceDB (A C. elegansDataBase)http://www.acedb.org(c.elegans)

    FlyBase (A Database of DrosophilaGenes & Genomes)(http://flybase.bio.indiana.edu(fruit fly)

    MGD(Mouse Genome Database)http://www.informatics.jax.org(Mouse)

    http://genome-www.stanford.edu/Saccharomyceshttp://www.acedb.org/http://flybase.bio.indiana.edu/http://www.informatics.jax.org/http://www.informatics.jax.org/http://flybase.bio.indiana.edu/http://www.acedb.org/http://genome-www.stanford.edu/Saccharomyceshttp://genome-www.stanford.edu/Saccharomyceshttp://genome-www.stanford.edu/Saccharomyces
  • 7/31/2019 IInd Sem Class1

    33/56

    Protein Information Resources

    The primary structure of a protein is its amino acid sequence

    The second structure of a protein corresponds to regions of localregularity (e.g., -helices and -strands).

    The tertiary structure of a protein arises from the packing of itssecondary structure elements, which may form discretedomains within a fold.

    Levels of protein sequence and structural organisation:

    primary

    tertiary

    secondary

  • 7/31/2019 IInd Sem Class1

    34/56

    ACDEFGHIKLMNPQRSTVWY

    primary structure

    Principles of Protein Structure

  • 7/31/2019 IInd Sem Class1

    35/56

    Protein Information Resources

    Levels of protein sequence and structural organisation:

    primary

    secondary

    tertiary domain module

    motif

    sequence

    @.*,#a,b,c

    [AS]-[IL]2-X[DE]-R-[FYW]2-H

    AVILDRYFH

    structuredatabase

    secondarydatabase

    primary

    database

  • 7/31/2019 IInd Sem Class1

    36/56

    Primary Protein Databases

    Protein sequence DatabasesSWISS-PROT - Protein knowledgebase

    TrEMBL - Computer-annotated supplement to Swiss-Prot

    PIRProtein Information Resource

    MIPSMunich Information Centre for Protein Sequences

    NRL-3D - produced by PIR

    The primary structure of a protein is its amino acid sequence these are stored in primary databases as linear alphabetsthat denote the constituent residues

    http://www.expasy.org/sprot/http://www.expasy.org/sprot/http://pir.georgetown.edu/home.shtmlhttp://mips.gsf.de/http://www-nbrf.georgetown.edu/pirwww/search/textnrl3d.htmlhttp://www-nbrf.georgetown.edu/pirwww/search/textnrl3d.htmlhttp://www-nbrf.georgetown.edu/pirwww/search/textnrl3d.htmlhttp://www-nbrf.georgetown.edu/pirwww/search/textnrl3d.htmlhttp://mips.gsf.de/http://pir.georgetown.edu/home.shtmlhttp://www.expasy.org/sprot/http://www.expasy.org/sprot/http://www.expasy.org/sprot/http://www.expasy.org/sprot/
  • 7/31/2019 IInd Sem Class1

    37/56

    Protein Sequence Databases

    Swiss-Prot contains 197,228sequence entries, comprising71,501,181 amino acidsabstracted from 135,257references

    Total number of speciesrepresented in Swiss-Prot:9,520

    The average sequence lengthin Swiss-Prot is 362 amino

    acids. Swiss-Prot is the most highly

    annotated protein sequenceDB

    No. Frequ. Species

    1 13049 Homo sapiens (Human)

    2 10132 Mus musculus (Mouse)

    3 5189 Saccharomyces cerevisiae(Baker's yeast)

    4 4847 Escherichia coli

    5 4669 Rattus norvegicus (Rat)

    6 3665Arabidopsis thaliana (Mouse-ear cress)

    8 2863 Schizosaccharomycespombe (Fission yeast)

    7 2814 Bacillus subtilis

    9 2750 Caenorhabditis elegans

    10 2286Drosophila melanogaster(Fruit fly)

    Table of the most represented species

    C S

  • 7/31/2019 IInd Sem Class1

    38/56

    Composite Protein SequenceDatabases

    Composite databases amalgamate a variety ofdifferent primary databases

    They render sequence searching much more

    efficient, because they obviate the need tointerrogate multiple resources

    Different composite databases use differentprimary sources and different redundancy

    criteria in their amalgamation procedures

    C i P i S

  • 7/31/2019 IInd Sem Class1

    39/56

    Composite Protein SequenceDatabases

    NRDBNatural Resource DB

    OWL MIPSX SP+TrEMBLSwissProt TrEMBL

    PDB SWISS-PROT PIR1-4 SWISS-PROT

    SWISS-PROT PIR MIPSOwn TrEMBL

    PIR GenBank MIPSTrn

    GenPept NRL-3D MIPSH

    SWISS-PROTupdate PIRMOD

    GenPeptupdate NRL-3D

    SWISS-PROT

    EMTrans

    GBTrans

    Kabat

    PseqIP

    http://www.nrdb.co.uk/http://www.hgmp.mrc.ac.uk/Bioinformatics/Databases/owl-help.htmlhttp://mips.gsf.de/http://www.hgmp.mrc.ac.uk/Bioinformatics/Databases/trembl-help.htmlhttp://www.hgmp.mrc.ac.uk/Bioinformatics/Databases/trembl-help.htmlhttp://mips.gsf.de/http://www.hgmp.mrc.ac.uk/Bioinformatics/Databases/owl-help.htmlhttp://www.nrdb.co.uk/
  • 7/31/2019 IInd Sem Class1

    40/56

    Secondary databases

    Secondary databases contain pattern data, i.e., diagnosticsignatures for protein families. These signatures encode themost highly conserved features of multiply aligned sequences,

    which are often crucial to the structure or function of the protein The second structure of a protein corresponds to regions of

    local regularity (e.g., -helices and -strands).

    Which, in sequence alignments, are often apparent as well-conserved motifs

    patterns are regular expressions, fingerprints, blocks, profiles,etc.

  • 7/31/2019 IInd Sem Class1

    41/56

    Secondary databases

    SecondaryDB

    Primarysource

    Storedinformation

    PROSITE SWISS-PROT Regular expressions

    (patterns)Profiles SWISS-PROT Weighted matrices

    (profiles)

    PRINTS OWL Aligned motifs(fingerprints)

    BLOCKS PROSITE/PRINTS Aligned motifs(blocks)

    IDENTIFY BLOCKS/PRINTS Fuzzy regularexpressions(patterns)

    http://www.expasy.org/prosite/http://www.expasy.org/prosite/
  • 7/31/2019 IInd Sem Class1

    42/56

    Secondary databases TRANSFAC

    http://transfac.gbf.de EPD

    http://www.epd.isb-sib.ch InterPro

    http://www.ebi.ac.uk/interpro/ PROSITE

    http://www.expasy.ch/prosite BLOCKS

    http://blocks.fhcrc.org PRINTS

    ftp://ftp.seqnet.dl.ac.uk/pub/database/prints PFAM

    http://www.sanger.ac.uk/Software/Pfam/index.shtml ProDom

    http://www.toulouse.inra.fr/prodom.html InterPro

    http://www.ebi.ac.uk/interpro GeneCards

    http://bioinformatics.weizmann.ac.il/cards ENSEMBL

    http://www.ensembl.org EcoCyc

    http://ecocyc.panbio.com/ecocyc/ecocyc.html

    http://transfac.gbf.de/http://www.epd.isb-sib.ch/http://www.epd.isb-sib.ch/http://www.epd.isb-sib.ch/http://www.ebi.ac.uk/interpro/http://www.expasy.ch/prositehttp://blocks.fhcrc.org/http://blocks.fhcrc.org/http://blocks.fhcrc.org/ftp://ftp.seqnet.dl.ac.uk/pub/database/printshttp://www.sanger.ac.uk/Software/Pfam/index.shtmlhttp://www.toulouse.inra.fr/prodom.htmlhttp://www.ebi.ac.uk/interprohttp://www.ebi.ac.uk/interprohttp://www.ebi.ac.uk/interprohttp://bioinformatics.weizmann.ac.il/cardshttp://www.ensembl.org/http://ecocyc.panbio.com/ecocyc/ecocyc.htmlhttp://ecocyc.panbio.com/ecocyc/ecocyc.htmlhttp://www.ensembl.org/http://bioinformatics.weizmann.ac.il/cardshttp://www.ebi.ac.uk/interprohttp://www.toulouse.inra.fr/prodom.htmlhttp://www.sanger.ac.uk/Software/Pfam/index.shtmlftp://ftp.seqnet.dl.ac.uk/pub/database/printshttp://blocks.fhcrc.org/http://www.expasy.ch/prositehttp://www.ebi.ac.uk/interpro/http://www.epd.isb-sib.ch/http://www.epd.isb-sib.ch/http://www.epd.isb-sib.ch/http://transfac.gbf.de/
  • 7/31/2019 IInd Sem Class1

    43/56

    Secondary databases

    There is some overlap in content between the secondarydatabases

    PDBsum alone has 35,291 entries

    Pattern DB growth is slow because the addition ofdetailed family annotation is very time consuming.

    PROSITE and PRINTS are the only comprehensively,manually annotated secondary DBs

    To address the annotation bottleneck, the secondarydatabase curators are together created a unifieddatabase of protein families known as InterPro

  • 7/31/2019 IInd Sem Class1

    44/56

    Structure Classification DBs

    Contain 3D structures available fromcrystallographic and spectroscopic studies

    Structure Classification Databases

    PDBsum Protein Data Bank

    CATH Class, Architecture, Topology, Homology

    SCOP Structural Classification of Proteins

  • 7/31/2019 IInd Sem Class1

    45/56

    Structure Classification DBs

    PDBhttp://www.rcsb.org

    SCOPhttp://scop.mrc-lmb.cam.ac.uk/scop

    CATHhttp://www.biochem.ucl.ac.uk/bsm/cath

    DSSPhttp://www.sander.ebi.ac.uk/dssp

    FSSPhttp://www.ebi.ac.uk/dali/fssp

    HSSPhttp://www.sander.ebi.ac.uk/hssp

    http://www.rcsb.org/http://scop.mrc-lmb.cam.ac.uk/scophttp://www.biochem.ucl.ac.uk/bsm/cathhttp://www.sander.ebi.ac.uk/dssphttp://www.ebi.ac.uk/dali/fssphttp://www.sander.ebi.ac.uk/hssphttp://www.sander.ebi.ac.uk/hssphttp://www.ebi.ac.uk/dali/fssphttp://www.sander.ebi.ac.uk/dssphttp://www.biochem.ucl.ac.uk/bsm/cathhttp://scop.mrc-lmb.cam.ac.uk/scophttp://scop.mrc-lmb.cam.ac.uk/scophttp://scop.mrc-lmb.cam.ac.uk/scophttp://www.rcsb.org/
  • 7/31/2019 IInd Sem Class1

    46/56

    Metabolic Databases

    KEGG(Kyoto Encyclopedia of Genes and Genomes)http://www.genome.ad.jp/kegg

    ENZYME (Enzyme nomenclature database)http://www.expasy.ch/enzyme

    BRENDA (Enzyme Information System)http://www.brenda.uni-koeln.de

    EMP(Enzymes and Metabolic Pathways database)http://www.empproject.com

    A number of metabolic databases are available electronically some with features for querying and visualizing metabolicpathways and regulatory networks.

    http://www.genome.ad.jp/kegghttp://www.expasy.ch/enzymehttp://www.brenda.uni-koeln.de/http://www.empproject.com/http://www.empproject.com/http://www.brenda.uni-koeln.de/http://www.brenda.uni-koeln.de/http://www.brenda.uni-koeln.de/http://www.expasy.ch/enzymehttp://www.genome.ad.jp/kegg
  • 7/31/2019 IInd Sem Class1

    47/56

    Mapping Databases

    OMIM (Online Mendelian Inheritance in Man)http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim

    GDB (The GDB Human Genome Database)http://www.gdb.org

    RHDBhttp://corba.ebi.ac.uk/RHdb

    D t b i

    http://www.gdb.org/http://corba.ebi.ac.uk/RHdbhttp://corba.ebi.ac.uk/RHdbhttp://www.gdb.org/
  • 7/31/2019 IInd Sem Class1

    48/56

    Databases concerningMutations

    dbSNPhttp://www.ncbi.nlm.nih.gov/SNP

    HGBASE

    http://hgbase.cgr.ki.se The SNP Consortium (TSC)

    http://snp.cshl.org

    HAEMAhttp://europium.csc.mrc.ac.uk/usr/WWW/WebPages/database.dir/quiz.dir/intrquiz.htm

    http://www.ncbi.nlm.nih.gov/SNPhttp://hgbase.cgr.ki.se/http://snp.cshl.org/http://europium.csc.mrc.ac.uk/usr/WWW/WebPages/database.dir/quiz.dir/intrquiz.htmhttp://europium.csc.mrc.ac.uk/usr/WWW/WebPages/database.dir/quiz.dir/intrquiz.htmhttp://europium.csc.mrc.ac.uk/usr/WWW/WebPages/database.dir/quiz.dir/intrquiz.htmhttp://europium.csc.mrc.ac.uk/usr/WWW/WebPages/database.dir/quiz.dir/intrquiz.htmhttp://snp.cshl.org/http://hgbase.cgr.ki.se/http://www.ncbi.nlm.nih.gov/SNP
  • 7/31/2019 IInd Sem Class1

    49/56

    LiteratureDatabases

    PubMedhttp://www.ncbi.nlm.nih.gov/entrez/query

    Bioinformatics Onlinehttp://www.bioinformatics.oupjournals.org

    Naturehttp://www.nature.com

    Sciencehttp://www.sciencemag.org

    http://www.ncbi.nlm.nih.gov/entrez/queryhttp://www.bioinformatics.oupjournals.org/http://www.nature.com/http://www.sciencemag.org/http://www.sciencemag.org/http://www.nature.com/http://www.bioinformatics.oupjournals.org/http://www.ncbi.nlm.nih.gov/entrez/query
  • 7/31/2019 IInd Sem Class1

    50/56

    Database tools for displaying andannotating genomic sequence data

    Viewerformat

    URL

    Artemis www.sanger.ac.uk/Software/Artemis

    ACeDB www.acedb.org/Tutorial/brief-tutorial/shtml

    Apollo www.ensembl.org/apollo

    EnsEMBL www.ensembl.org

    NCBI mapviewer

    www.ncbi.nlm.nih.gov

    GoldenPath genome.ucsc.edu

  • 7/31/2019 IInd Sem Class1

    51/56

  • 7/31/2019 IInd Sem Class1

    52/56

    Common formats

    There are several conventions forrepresenting nucleic acid and proteinsequences, of which the following arewidely used

    NBRF/PIR

    FASTA

    GDE

    These formats have limited facilities forcomments, which must include a uniqueidentifier code and sequence accession

    number

    Formats for multiple sequence

  • 7/31/2019 IInd Sem Class1

    53/56

    Formats for multiple sequencealignment

    There are separate formats for

    multiple sequence alignmentrepresentation, of which thefollowing are popular

    MSF

    PHYLIP

    ALN

  • 7/31/2019 IInd Sem Class1

    54/56

    Files of structural data

    Structural data are maintained as flat filesusing the PDB format

    Such files contain orthogonal atomic co-

    ordinates together with annotations,comments and experimental details

    http://www.pdb.org

  • 7/31/2019 IInd Sem Class1

    55/56

    Submission of sequences

    Sequences may be submitted to any of thethree primary databases using the toolsprovided by the database curators

    Such tools include WebIn and BankIt,which can be used over the Internet, andSequin, a stand-alone application

    http://www.ebi.ac.uk/embl/Submission/webin.html

    http://www.ncbi.nlm.nih.gov/BankIt/

    http://www.ebi.ac.uk/embl/Submission/webin.htmlhttp://www.ncbi.nlm.nih.gov/BankIt/http://www.ncbi.nlm.nih.gov/BankIt/http://www.ebi.ac.uk/embl/Submission/webin.html
  • 7/31/2019 IInd Sem Class1

    56/56

    Database interrogation

    All the databases discussed above can besearched by sequence similarity

    However, detailed text-based searches of theannotations are also possible using tools suchas Entrez

    The simplest way to cross-reference betweenthe primary nucleotide sequence databases andSWISS-PROT is to search by accessionnumber, as this provides an unambiguousidentifier of genes and their products