8024 bio info

8/3/2019 8024 Bio Info

1/28

Introduction to Bioinformatics

8/3/2019 8024 Bio Info

2/28

What is Bioinformatics?

Bioinformatics is a relatively newinterdisciplinary field that integrates computerscience, mathematics, biology, and informationtechnology to manage, analyze, and understand

biological, biochemical and biophysicalinformation.

Bioinformatics is a computational science andthe subset of larger field of ComputationalBiology.

8/3/2019 8024 Bio Info

3/28

What is Bioinformatics?

Bioinformatics is the use of computers to studybiology

Bioinformatics is the science of usinginformation to understand biology

Bioinformatics is integration of informationtechnology (IT) and biology

Bioinformatics is the development of

computational methods for studying structure,function and evolution of genes, proteins andwhole genomes

8/3/2019 8024 Bio Info

4/28

8/3/2019 8024 Bio Info

5/28

Some Terminology

Cell is a primary unit of life

Cell consists of molecules, chemical

reactions and a copy of the genome

for that organism

All life on this planet depends on

three types of molecules: DNA,RNA and proteins

8/3/2019 8024 Bio Info

6/28

Some Terminology

DNA Holds information on how cell works

RNA

Acts to transfer short pieces of information to

different parts of cell Provide templates to synthesize into protein

Proteins

Form enzymes that send signals to other cells

and regulate gene activity Form bodys major components (e.g. hair, skin,

etc.)

8/3/2019 8024 Bio Info

7/28

DNA - Deoxyribonucleic Acid

Genetic material Consists of two long strands

Each strand is made of:

Phosphates

Sugar

Nucleotides

A (adenine)

G (guanine) C ( cytosine)

T (thymine)

8/3/2019 8024 Bio Info

8/28

DNADouble Helix Structure

8/3/2019 8024 Bio Info

9/28

The Central Dogma of Molecular

Biology

Information has been transferred from DNA(information storage molecule) to RNA

(information transfer molecule) to a specificprotein (a functional, non-coding product)

DNA RNA Protein

transcription translation

8/3/2019 8024 Bio Info

10/28

More Terminology Transcription of DNA

DNA transcribed into RNA

RNA exits as a single-strand unit and as a double-helix

as well RNA consist of A, C, G and U (uracil)

Types of RNA

Messenger RNAmRNA

Transfer RNAtRNA

Ribosomal RNArRNA

8/3/2019 8024 Bio Info

11/28

More Terminology

Translation of Messenger RNA (mRNA):

mRNA is translated into protein

Proteins:

linear polymers built from amino acids

The transfer of information from DNA to specificprotein via RNA takes place according to thegenetic code.

The RNA sequence is divided into blocks of three

letters This block is called CODON

Each codon corresponds to the specific amino acid

8/3/2019 8024 Bio Info

12/28

More Terminology

Four different nucleotides are used to build DNAand RNA moleculesA, G, C, T and A, G, C, U

20 different amino acids are used in proteinsynthesis

Four nucleotides can be arranged in 64 differentcombinations of three.

There are 64 = 4*4*4 different codons

Some codons are redundant and some havespecial functionto terminate the translationprocess

8/3/2019 8024 Bio Info

13/28

Why is bioinformatics important?

Traditionally, research was carried out entirely at theexperimental laboratory but the huge increase in thedata in the genomic era has seen a need to incorporatecomputers into this research process

There are three central biological processes aroundwhich bioinformatics tools must be developed:

DNA sequence determines protein sequence

Protein sequence determines protein structure

Protein structure determines protein function

8/3/2019 8024 Bio Info

14/28

Major research areas

Sequence analysis- A comparison of genes withina species or between different species can showsimilarities between protein functions, or relationsbetween species

The comparison of sequences in order to find similar and

dissimilar in compared sequences (sequence alignment)

Identification of gene-structures, reading frames,distributions of introns and exons and regulatoryelements

Revealing the evolution and genetic diversity oforganisms.

8/3/2019 8024 Bio Info

15/28

Computational evolutionary biology-Evolutionary biology is the study of the origin anddescent of species, as well as their change over time.Informatics has assisted evolutionary biologists inseveral key ways; it has enabled researchers to:

trace the evolution of a large number of organisms bymeasuring changes in their DNA, rather than throughphysical taxonomy or physiological observations alone,

build complex computational models of populations topredict the outcome of the system over time

track and share information on an increasingly largenumber of species and organisms

8/3/2019 8024 Bio Info

16/28

Prediction of protein structure- Proteinstructure prediction is another important application ofbioinformatics.

In the genomic branch of bioinformatics, homology isused to predict the function of a gene: if the sequence of

gene A, whose function is known, is homologous to thesequence of gene B, whose function is unknown, onecould infer that B may share A's function.

MODELLER is one of the best software for Homology

modelling. Protein Data Bank is the data base for 3D co-ordinates of a protein.

8/3/2019 8024 Bio Info

17/28

Drug Designing- Drug design is the approach of

finding drugs by design, based on their biological targets. Computer-assisted drug design uses computational

chemistry to discover, enhance, or study drugs andrelated biologically active molecules

Phylogenetics- Predicting the genetic or evolutionaryrelation of set of organisms. Mitochondrial SNPs andMicrosatellites ( DNA repeats) are mostly used inPhylogenetics.MEGA,PAUPare PAUP* are some of the

important software's. Maximum Parsimony andMaximum Likelyhood are mostly used methods.
http://www.megasoftware.net/http://paup.csit.fsu.edu/http://paup.csit.fsu.edu/http://www.megasoftware.net/

8/3/2019 8024 Bio Info

18/28

Biological databases: why?

Need for storing and communicating largedatasets has grown

Make biological data available to

scientists.

To make biological data available incomputer-readable form.

8/3/2019 8024 Bio Info

19/28

Type of data

nucleotide sequences

protein sequences

proteins sequence patterns or motifs

macromolecular 3D structure gene expression data

metabolic pathways

8/3/2019 8024 Bio Info

20/28

Different classifications of databases

Primary or derived databases

Primary databases: experimental results directlyinto database

Secondary databases: results of analysis ofprimary databases

Aggregate of many databases

Links to other data items

Combination of data

Consolidation of data

8/3/2019 8024 Bio Info

21/28

Nucleotide sequence databases EMBL, GenBank, and DDBJ are the three

primary nucleotide sequence databases

EMBL www.ebi.ac.uk/embl/

GenBank www.ncbi.nlm.nih.gov/Genbank/

DDBJ www.ddbj.nig.ac.jp

8/3/2019 8024 Bio Info

22/28

Genbank

An annotated collection of all publiclyavailable nucleotide and proteins

Set up in 1979 at the LANL (Los Alamos).

Maintained since 1992 NCBI (Bethesda).

http://www.ncbi.nlm.nih.gov

8/3/2019 8024 Bio Info

23/28

EMBL Nucleotide Sequence DB

An annotated collection of all publiclyavailable nucleotide and proteinsequences

Created in 1980 at the EuropeanMolecular Biology LaboratoryinHeidelberg.

Maintained since 1994 by EBI-Cambridge.

http://www.ebi.ac.uk/embl.html

8/3/2019 8024 Bio Info

24/28

DDBJDNA Data Bank of Japan

An annotated collection of all publiclyavailable nucleotide and proteinsequences

Started, 1984 at the National Institute ofGenetics(NIG) in Mishima.

Still maintained in this institute a team

lead by Takashi Gojobori. http://www.ddbj.nig.ac.jp

8/3/2019 8024 Bio Info

25/28

Other NCBI nucleic acids DBs

EST database: A collection of expressed sequence tags, or short, single-pass sequencereads from mRNA (cDNA).

GSS database: A database of genome survey sequences, or short, single-pass genomicsequences.

HomoloGene: A gene homology tool that compares nucleotide sequences between pairs oforganisms in order to identify putative orthologs.

HTG database: A collection of high-throughput genome sequences from large-scalegenome sequencing centers, including unfinished and finished sequences.

SNPs database: A central repository for both single-base nucleotide substitutions andshort deletion and insertion polymorphisms.

RefSeq: A database of non-redundant reference sequences standards, including genomicDNA contigs, mRNAs, and proteins for known genes. Multiple collaborations, both withinNCBI and with external groups, supports data-gathering efforts.

STS database: A database of sequence tagged sites, or short sequences that areoperationally unique in the genome.

UniSTS: A unified, non-redundant view of sequence tagged sites (STSs).

UniGene: A collection of ESTs and full-length mRNA sequences organized into clusters,each representing a unique known or putative human gene annotated with mapping andexpression information and cross-references to other sources.
http://www.ncbi.nlm.nih.gov/dbEST/index.htmlhttp://www.ncbi.nlm.nih.gov/dbGSS/index.htmlhttp://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/HTGS/http://www.ncbi.nlm.nih.gov/SNP/http://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/dbSTS/index.htmlhttp://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/dbSTS/index.htmlhttp://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/SNP/http://www.ncbi.nlm.nih.gov/HTGS/http://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/dbGSS/index.htmlhttp://www.ncbi.nlm.nih.gov/dbEST/index.html

8/3/2019 8024 Bio Info

26/28

Bioinformatics Tools BLAST:

The Basic Local Alignment Search Tool (BLAST) for comparinggene and protein sequences against others in public databases,now comes in several types including PSI-BLAST, PHI-BLAST, and

BLAST 2 sequences. Specialized BLASTs are also available

FASTAA database search tool used to compare a nucleotide or peptidesequence to a sequence database. It was the first widely usedalgorithm for database similarity searching. The program looks for

optimal local alignments by scanning the sequence for smallmatches called "words"

8/3/2019 8024 Bio Info

27/28

ClustalwClustalW is a general purpose multiple sequence alignment program

for DNA or proteins. It produces biologically meaningful multiplesequence alignments of divergent sequences, calculates the bestmatch for the selected sequences, and lines them up so that theidentities, similarities and differences can be seen.

RasMolIt is a powerful research tool to display the structure of DNA,proteins, and smaller molecules. Protein Explorer, a derivative ofRasMol, is an easier to use program.

DeepView (also knows as Swiss-PdbViewer)For seeing and exploring macromolecular models in threedimensions, and for manual and semiautomated homology modeling

8/3/2019 8024 Bio Info

28/28

conclusion

Bioinformatics in India is at an early stage ofdevelopment. But at 4 to 5 centers in thecountry, one sees mature understanding of

the needs of this sector and world classdevelopment of tools and applications. Thesecenters will ensure that Indias traditional

strengths in IT are leveraged to place us onpar with the developed countries.

8024 bio info

Documents