bio information –an overview

Upload: mukesh-kumar

Post on 05-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Bio information An Overview

    1/43

    What is Bioinformatics?

    Bioinformatics is an emerging scientific

    discipline representing the combined

    power of biology, mathematics, andcomputers

  • 8/2/2019 Bio information An Overview

    2/43

  • 8/2/2019 Bio information An Overview

    3/43

    Bioinformatics includes

    Sequence analysis used by geneticists, cellbiologists, molecular biologists, etc.

    Molecular modeling used bycrystallographers, cell biologists,biochemists, etc.

    Molecular phylogeny/evolution Ecology and population studies

    Medical informatics

  • 8/2/2019 Bio information An Overview

    4/43

    Three important sub-disciplines within bioinformatics

    involving computational biology would include:

    the development and implementation of tools that

    enable efficient access and management of

    different types of information the analysis and interpretation of various types of

    data including nucleotide and amino acid

    sequences, protein domains, and protein structures

    the development of new algorithms and statistics

    with which to assess relationships among

    members of large data sets

    http://www.library.csi.cuny.edu/~davis/Bioinformatics/Bioinformatics/dataanal.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/dataanal.htmlhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/Bioinformatics/datatypes.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/datatypes.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/datatypes.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/datatypes.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/datatypes.htmlhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/Bioinformatics/datatypes.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/dataanal.htmlhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/Bioinformatics/dataanal.html
  • 8/2/2019 Bio information An Overview

    5/43

  • 8/2/2019 Bio information An Overview

    6/43

    GenBank Data

    Year Base Pairs Sequences

    1982 680338 606

    1983 2274029 2427

    1984 3368765 4175

    1985 5204420 5700

    1986 9615371 9978

    1987 15514776 14584

  • 8/2/2019 Bio information An Overview

    7/43

    1988 23800000 20579

    1989 34762585 28791

    1990 49179285 39533

    1991 71947426 55627

    1992 101008486 78608

    1993 157152442 143492

    1994 217102462 215273

  • 8/2/2019 Bio information An Overview

    8/43

  • 8/2/2019 Bio information An Overview

    9/43

  • 8/2/2019 Bio information An Overview

    10/43

  • 8/2/2019 Bio information An Overview

    11/43

    Analysis of sequence information: comutational

    Biology

    Finding the genes in the DNA sequences of various

    Organism.

    Developing methods to Predict the structure and/ or

    function of newly discovered proteins and structural RNA

    sequences.

    Clustering protein sequences into families of related

    sequences and the development of protein models

    Aligning similar proteins and generating phylogenetic trees

    to examine evolutionary relationships

  • 8/2/2019 Bio information An Overview

    12/43

  • 8/2/2019 Bio information An Overview

    13/43

  • 8/2/2019 Bio information An Overview

    14/43

  • 8/2/2019 Bio information An Overview

    15/43

    Goals of Bioinformatics and Sequence Analysis

    can be subdivided into

    1. Sequence entry, assembly, and

    management

    2. Nucleotide sequence analysis

    3. Protein sequence analysis

    4. Multiple sequence analysis

    5. Additional and integrated analyses

  • 8/2/2019 Bio information An Overview

    16/43

    Sequence Entry and Editing

  • 8/2/2019 Bio information An Overview

    17/43

  • 8/2/2019 Bio information An Overview

    18/43

    Sequence Assembly

  • 8/2/2019 Bio information An Overview

    19/43

  • 8/2/2019 Bio information An Overview

    20/43

    Nucleotide Sequence Analysis

    Sequence Similarity Analysis

    o

    o Query: 298

    CCGGGGACCTGCGGCGGGTCGCCTGCCCAGCCCCCGAA

    o || | || | | |||| | || |||| ||| | |||||||||||||

    CCCGGGAACCTGCGGTGGTCCGCCCGCCCAGCCCCAGTG

  • 8/2/2019 Bio information An Overview

    21/43

  • 8/2/2019 Bio information An Overview

    22/43

    Gene discovery: coding regions, exon, and gene

    prediction

    ORF M

  • 8/2/2019 Bio information An Overview

    23/43

    1 to 1647 length = 1647

    3 to 80 length = 78

    326 to 409 length = 84

    1064 to 1174 length = 111

    C 1650 to 1249 length = 402

    C 1649 to 1509 length = 141

    C 660 to 511 length = 150

    C 584 to 507 length = 78

    C 510 to 283 length = 228

    C 452 to 321 length = 132

    C 149 to 39 length = 111

    C 135 to 4 length = 132

    ORF List (C= complementary strand)

    ORF Map

  • 8/2/2019 Bio information An Overview

    24/43

  • 8/2/2019 Bio information An Overview

    25/43

    Protein Sequence Analysis

    Sequence Similarity Analysis

    Protein sequences can be analyzed in ways similar tonucleotide sequences. Some common types of analysesare database similarity searching (to identify protein

    sequence database entries similar to a given protein) andsequence comparison (for example, to align two proteinsequences and identify common regions).

    Compare sequence similarity of A and B below.

    Query = sequence use to search database

    Sbjct = sequence aligned to in database

    Letters in between the two are identities and + =

    conservative amino acid substitution

  • 8/2/2019 Bio information An Overview

    26/43

  • 8/2/2019 Bio information An Overview

    27/43

    Prediction of protein properties

    Predict molecular weight

    Predict isoelectric point (pI)

    Predict extinction coefficient

    Protease recognition sites

  • 8/2/2019 Bio information An Overview

    28/43

    Search for Known Motifs

    Motif searching is also very useful in protein

    sequences, to recognize specific amino acid

    patterns with functional significance. A numberof databases of protein motifs such as the

    PROSITE database have been created either from

    literature surveys or directly from sequence

    databases, for the purpose of identifying proteinsand domains or particular functional sites.

  • 8/2/2019 Bio information An Overview

    29/43

    Predict Secondary Structure

    The function of a protein is

    strongly dependent on its three-

    dimensional structure

    propensities of various aminoacids (or stretches of amino

    acids) to form or break

    particular secondary structure

    elements

  • 8/2/2019 Bio information An Overview

    30/43

  • 8/2/2019 Bio information An Overview

    31/43

  • 8/2/2019 Bio information An Overview

    32/43

  • 8/2/2019 Bio information An Overview

    33/43

    Protein tertiary structure prediction

    Predicting the tertiary (three-dimensional) structure of aprotein from its sequence is still far from a trivial task, andusually involves combining the information from a range

    of sources - database searches, comparisons with similarsequences whose structure is known, motifs known tocorrespond to particular structural elements, and secondarystructure information.

    PDB Files = 1tim, 2act, 3rn3,, 1mbn. 1est

    There are several approaches to building a 3 dimensionalmodel for a protein including homology modeling,profiling, and threading (see supplement).

    http://www.library.csi.cuny.edu/~davis/Bioinformatics/1timMono.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/2act.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/3rn3.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/1mbn.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/1est.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/suppl1.htmhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/suppl1.htmhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/1est.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/1mbn.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/3rn3.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/2act.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/1timMono.pdb
  • 8/2/2019 Bio information An Overview

    34/43

  • 8/2/2019 Bio information An Overview

    35/43

    Multiple Sequence Analysis

    A whole new set of questions can be asked whenthe sequences of related genes from differentorganisms are available. For example, conserved

    regions can be identified, either as an indication oftheir functionality, or as targets for PCRexperiments, or for designing probes fordiagnostic tests.

    The first step in multiple sequence analysis is toalign the related sequences together into a multiplesequence alignment, that is, an alignment of morethan 2 sequences.

  • 8/2/2019 Bio information An Overview

    36/43

  • 8/2/2019 Bio information An Overview

    37/43

  • 8/2/2019 Bio information An Overview

    38/43

    Challenges in bioinformatics

    Explosion of information Need for faster, automated analysis to process large

    amounts of data

    Need for integration between different types of

    information (sequences, literature, annotations, proteinlevels, RNA levels etc)

    Need for "smarter" software to identify interestingrelationships in very large data sets

    Lack of "bioinformaticians"

    Software needs to be easier to access, use andunderstand

    Biologists need to learn about the software, its

    limitations, and how to interpret its results

  • 8/2/2019 Bio information An Overview

    39/43

  • 8/2/2019 Bio information An Overview

    40/43

    Diagnostics

    DNA probes for infectious disease

    DNA probes for inherited disease

    Analysis of gene expression

    Analysis of protein expression

  • 8/2/2019 Bio information An Overview

    41/43

    Therapeutics

    Recombinant gene products

    Novel drug targets

    Rational drug design

    Gene therapy

    http://www.scitech.com.au/software/Scanalytics/IPLab.pdf
  • 8/2/2019 Bio information An Overview

    42/43

    http://www.scitech.com.au/software/Scanalytics/IPLab.pdfhttp://www.scitech.com.au/software/Scanalytics/IPLab.pdfhttp://www.scitech.com.au/software/Scanalytics/IPLab.pdf
  • 8/2/2019 Bio information An Overview

    43/43