bio information –an overview

8/2/2019 Bio information An Overview

1/43

What is Bioinformatics?

Bioinformatics is an emerging scientific

discipline representing the combined

power of biology, mathematics, andcomputers


2/43


3/43

Bioinformatics includes

Sequence analysis used by geneticists, cellbiologists, molecular biologists, etc.

Molecular modeling used bycrystallographers, cell biologists,biochemists, etc.

Molecular phylogeny/evolution Ecology and population studies

Medical informatics


4/43

Three important sub-disciplines within bioinformatics

involving computational biology would include:

the development and implementation of tools that

enable efficient access and management of

different types of information the analysis and interpretation of various types of

data including nucleotide and amino acid

sequences, protein domains, and protein structures

the development of new algorithms and statistics

with which to assess relationships among

members of large data sets
http://www.library.csi.cuny.edu/~davis/Bioinformatics/Bioinformatics/dataanal.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/dataanal.htmlhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/Bioinformatics/datatypes.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/datatypes.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/datatypes.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/datatypes.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/datatypes.htmlhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/Bioinformatics/datatypes.htmlhttp://www.ncbi.nlm.nih.gov/Education/Bioinformatics/dataanal.htmlhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/Bioinformatics/dataanal.html


5/43


6/43

GenBank Data

Year Base Pairs Sequences

1982 680338 606

1983 2274029 2427

1984 3368765 4175

1985 5204420 5700

1986 9615371 9978

1987 15514776 14584


7/43

1988 23800000 20579

1989 34762585 28791

1990 49179285 39533

1991 71947426 55627

1992 101008486 78608

1993 157152442 143492

1994 217102462 215273


8/43


9/43


10/43


11/43

Analysis of sequence information: comutational

Biology

Finding the genes in the DNA sequences of various

Organism.

Developing methods to Predict the structure and/ or

function of newly discovered proteins and structural RNA

sequences.

Clustering protein sequences into families of related

sequences and the development of protein models

Aligning similar proteins and generating phylogenetic trees

to examine evolutionary relationships


12/43


13/43


14/43


15/43

Goals of Bioinformatics and Sequence Analysis

can be subdivided into

1. Sequence entry, assembly, and

management

2. Nucleotide sequence analysis

3. Protein sequence analysis

4. Multiple sequence analysis

5. Additional and integrated analyses


16/43

Sequence Entry and Editing


17/43


18/43

Sequence Assembly


19/43


20/43

Nucleotide Sequence Analysis

Sequence Similarity Analysis

o

o Query: 298

CCGGGGACCTGCGGCGGGTCGCCTGCCCAGCCCCCGAA

o || | || | | |||| | || |||| ||| | |||||||||||||

CCCGGGAACCTGCGGTGGTCCGCCCGCCCAGCCCCAGTG


21/43


22/43

Gene discovery: coding regions, exon, and gene

prediction

ORF M


23/43

1 to 1647 length = 1647

3 to 80 length = 78

326 to 409 length = 84

1064 to 1174 length = 111

C 1650 to 1249 length = 402

C 1649 to 1509 length = 141

C 660 to 511 length = 150

C 584 to 507 length = 78

C 510 to 283 length = 228

C 452 to 321 length = 132

C 149 to 39 length = 111

C 135 to 4 length = 132

ORF List (C= complementary strand)

ORF Map


24/43


25/43

Protein Sequence Analysis

Sequence Similarity Analysis

Protein sequences can be analyzed in ways similar tonucleotide sequences. Some common types of analysesare database similarity searching (to identify protein

sequence database entries similar to a given protein) andsequence comparison (for example, to align two proteinsequences and identify common regions).

Compare sequence similarity of A and B below.

Query = sequence use to search database

Sbjct = sequence aligned to in database

Letters in between the two are identities and + =

conservative amino acid substitution


26/43


27/43

Prediction of protein properties

Predict molecular weight

Predict isoelectric point (pI)

Predict extinction coefficient

Protease recognition sites


28/43

Search for Known Motifs

Motif searching is also very useful in protein

sequences, to recognize specific amino acid

patterns with functional significance. A numberof databases of protein motifs such as the

PROSITE database have been created either from

literature surveys or directly from sequence

databases, for the purpose of identifying proteinsand domains or particular functional sites.


29/43

Predict Secondary Structure

The function of a protein is

strongly dependent on its three-

dimensional structure

propensities of various aminoacids (or stretches of amino

acids) to form or break

particular secondary structure

elements


30/43


31/43


32/43


33/43

Protein tertiary structure prediction

Predicting the tertiary (three-dimensional) structure of aprotein from its sequence is still far from a trivial task, andusually involves combining the information from a range

of sources - database searches, comparisons with similarsequences whose structure is known, motifs known tocorrespond to particular structural elements, and secondarystructure information.

PDB Files = 1tim, 2act, 3rn3,, 1mbn. 1est

There are several approaches to building a 3 dimensionalmodel for a protein including homology modeling,profiling, and threading (see supplement).
http://www.library.csi.cuny.edu/~davis/Bioinformatics/1timMono.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/2act.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/3rn3.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/1mbn.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/1est.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/suppl1.htmhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/suppl1.htmhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/1est.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/1mbn.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/3rn3.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/2act.pdbhttp://www.library.csi.cuny.edu/~davis/Bioinformatics/1timMono.pdb


34/43


35/43

Multiple Sequence Analysis

A whole new set of questions can be asked whenthe sequences of related genes from differentorganisms are available. For example, conserved

regions can be identified, either as an indication oftheir functionality, or as targets for PCRexperiments, or for designing probes fordiagnostic tests.

The first step in multiple sequence analysis is toalign the related sequences together into a multiplesequence alignment, that is, an alignment of morethan 2 sequences.


36/43


37/43


38/43

Challenges in bioinformatics

Explosion of information Need for faster, automated analysis to process large

amounts of data

Need for integration between different types of

information (sequences, literature, annotations, proteinlevels, RNA levels etc)

Need for "smarter" software to identify interestingrelationships in very large data sets

Lack of "bioinformaticians"

Software needs to be easier to access, use andunderstand

Biologists need to learn about the software, its

limitations, and how to interpret its results


39/43


40/43

Diagnostics

DNA probes for infectious disease

DNA probes for inherited disease

Analysis of gene expression

Analysis of protein expression


41/43

Therapeutics

Recombinant gene products

Novel drug targets

Rational drug design

Gene therapy
http://www.scitech.com.au/software/Scanalytics/IPLab.pdf


42/43
http://www.scitech.com.au/software/Scanalytics/IPLab.pdfhttp://www.scitech.com.au/software/Scanalytics/IPLab.pdfhttp://www.scitech.com.au/software/Scanalytics/IPLab.pdf


43/43

bio information –an overview

Documents