8024 bio info
TRANSCRIPT
-
8/3/2019 8024 Bio Info
1/28
Introduction to Bioinformatics
-
8/3/2019 8024 Bio Info
2/28
What is Bioinformatics?
Bioinformatics is a relatively newinterdisciplinary field that integrates computerscience, mathematics, biology, and informationtechnology to manage, analyze, and understand
biological, biochemical and biophysicalinformation.
Bioinformatics is a computational science andthe subset of larger field of ComputationalBiology.
-
8/3/2019 8024 Bio Info
3/28
What is Bioinformatics?
Bioinformatics is the use of computers to studybiology
Bioinformatics is the science of usinginformation to understand biology
Bioinformatics is integration of informationtechnology (IT) and biology
Bioinformatics is the development of
computational methods for studying structure,function and evolution of genes, proteins andwhole genomes
-
8/3/2019 8024 Bio Info
4/28
-
8/3/2019 8024 Bio Info
5/28
Some Terminology
Cell is a primary unit of life
Cell consists of molecules, chemical
reactions and a copy of the genome
for that organism
All life on this planet depends on
three types of molecules: DNA,RNA and proteins
-
8/3/2019 8024 Bio Info
6/28
Some Terminology
DNA Holds information on how cell works
RNA
Acts to transfer short pieces of information to
different parts of cell Provide templates to synthesize into protein
Proteins
Form enzymes that send signals to other cells
and regulate gene activity Form bodys major components (e.g. hair, skin,
etc.)
-
8/3/2019 8024 Bio Info
7/28
DNA - Deoxyribonucleic Acid
Genetic material Consists of two long strands
Each strand is made of:
Phosphates
Sugar
Nucleotides
A (adenine)
G (guanine) C ( cytosine)
T (thymine)
-
8/3/2019 8024 Bio Info
8/28
DNADouble Helix Structure
-
8/3/2019 8024 Bio Info
9/28
The Central Dogma of Molecular
Biology
Information has been transferred from DNA(information storage molecule) to RNA
(information transfer molecule) to a specificprotein (a functional, non-coding product)
DNA RNA Protein
transcription translation
-
8/3/2019 8024 Bio Info
10/28
More Terminology Transcription of DNA
DNA transcribed into RNA
RNA exits as a single-strand unit and as a double-helix
as well RNA consist of A, C, G and U (uracil)
Types of RNA
Messenger RNAmRNA
Transfer RNAtRNA
Ribosomal RNArRNA
-
8/3/2019 8024 Bio Info
11/28
More Terminology
Translation of Messenger RNA (mRNA):
mRNA is translated into protein
Proteins:
linear polymers built from amino acids
The transfer of information from DNA to specificprotein via RNA takes place according to thegenetic code.
The RNA sequence is divided into blocks of three
letters This block is called CODON
Each codon corresponds to the specific amino acid
-
8/3/2019 8024 Bio Info
12/28
More Terminology
Four different nucleotides are used to build DNAand RNA moleculesA, G, C, T and A, G, C, U
20 different amino acids are used in proteinsynthesis
Four nucleotides can be arranged in 64 differentcombinations of three.
There are 64 = 4*4*4 different codons
Some codons are redundant and some havespecial functionto terminate the translationprocess
-
8/3/2019 8024 Bio Info
13/28
Why is bioinformatics important?
Traditionally, research was carried out entirely at theexperimental laboratory but the huge increase in thedata in the genomic era has seen a need to incorporatecomputers into this research process
There are three central biological processes aroundwhich bioinformatics tools must be developed:
DNA sequence determines protein sequence
Protein sequence determines protein structure
Protein structure determines protein function
-
8/3/2019 8024 Bio Info
14/28
Major research areas
Sequence analysis- A comparison of genes withina species or between different species can showsimilarities between protein functions, or relationsbetween species
The comparison of sequences in order to find similar and
dissimilar in compared sequences (sequence alignment)
Identification of gene-structures, reading frames,distributions of introns and exons and regulatoryelements
Revealing the evolution and genetic diversity oforganisms.
-
8/3/2019 8024 Bio Info
15/28
Computational evolutionary biology-Evolutionary biology is the study of the origin anddescent of species, as well as their change over time.Informatics has assisted evolutionary biologists inseveral key ways; it has enabled researchers to:
trace the evolution of a large number of organisms bymeasuring changes in their DNA, rather than throughphysical taxonomy or physiological observations alone,
build complex computational models of populations topredict the outcome of the system over time
track and share information on an increasingly largenumber of species and organisms
-
8/3/2019 8024 Bio Info
16/28
Prediction of protein structure- Proteinstructure prediction is another important application ofbioinformatics.
In the genomic branch of bioinformatics, homology isused to predict the function of a gene: if the sequence of
gene A, whose function is known, is homologous to thesequence of gene B, whose function is unknown, onecould infer that B may share A's function.
MODELLER is one of the best software for Homology
modelling. Protein Data Bank is the data base for 3D co-ordinates of a protein.
-
8/3/2019 8024 Bio Info
17/28
Drug Designing- Drug design is the approach of
finding drugs by design, based on their biological targets. Computer-assisted drug design uses computational
chemistry to discover, enhance, or study drugs andrelated biologically active molecules
Phylogenetics- Predicting the genetic or evolutionaryrelation of set of organisms. Mitochondrial SNPs andMicrosatellites ( DNA repeats) are mostly used inPhylogenetics.MEGA,PAUPare PAUP* are some of the
important software's. Maximum Parsimony andMaximum Likelyhood are mostly used methods.
http://www.megasoftware.net/http://paup.csit.fsu.edu/http://paup.csit.fsu.edu/http://www.megasoftware.net/ -
8/3/2019 8024 Bio Info
18/28
Biological databases: why?
Need for storing and communicating largedatasets has grown
Make biological data available to
scientists.
To make biological data available incomputer-readable form.
-
8/3/2019 8024 Bio Info
19/28
Type of data
nucleotide sequences
protein sequences
proteins sequence patterns or motifs
macromolecular 3D structure gene expression data
metabolic pathways
-
8/3/2019 8024 Bio Info
20/28
Different classifications of databases
Primary or derived databases
Primary databases: experimental results directlyinto database
Secondary databases: results of analysis ofprimary databases
Aggregate of many databases
Links to other data items
Combination of data
Consolidation of data
-
8/3/2019 8024 Bio Info
21/28
Nucleotide sequence databases EMBL, GenBank, and DDBJ are the three
primary nucleotide sequence databases
EMBL www.ebi.ac.uk/embl/
GenBank www.ncbi.nlm.nih.gov/Genbank/
DDBJ www.ddbj.nig.ac.jp
-
8/3/2019 8024 Bio Info
22/28
Genbank
An annotated collection of all publiclyavailable nucleotide and proteins
Set up in 1979 at the LANL (Los Alamos).
Maintained since 1992 NCBI (Bethesda).
http://www.ncbi.nlm.nih.gov
-
8/3/2019 8024 Bio Info
23/28
EMBL Nucleotide Sequence DB
An annotated collection of all publiclyavailable nucleotide and proteinsequences
Created in 1980 at the EuropeanMolecular Biology LaboratoryinHeidelberg.
Maintained since 1994 by EBI-Cambridge.
http://www.ebi.ac.uk/embl.html
-
8/3/2019 8024 Bio Info
24/28
DDBJDNA Data Bank of Japan
An annotated collection of all publiclyavailable nucleotide and proteinsequences
Started, 1984 at the National Institute ofGenetics(NIG) in Mishima.
Still maintained in this institute a team
lead by Takashi Gojobori. http://www.ddbj.nig.ac.jp
-
8/3/2019 8024 Bio Info
25/28
Other NCBI nucleic acids DBs
EST database: A collection of expressed sequence tags, or short, single-pass sequencereads from mRNA (cDNA).
GSS database: A database of genome survey sequences, or short, single-pass genomicsequences.
HomoloGene: A gene homology tool that compares nucleotide sequences between pairs oforganisms in order to identify putative orthologs.
HTG database: A collection of high-throughput genome sequences from large-scalegenome sequencing centers, including unfinished and finished sequences.
SNPs database: A central repository for both single-base nucleotide substitutions andshort deletion and insertion polymorphisms.
RefSeq: A database of non-redundant reference sequences standards, including genomicDNA contigs, mRNAs, and proteins for known genes. Multiple collaborations, both withinNCBI and with external groups, supports data-gathering efforts.
STS database: A database of sequence tagged sites, or short sequences that areoperationally unique in the genome.
UniSTS: A unified, non-redundant view of sequence tagged sites (STSs).
UniGene: A collection of ESTs and full-length mRNA sequences organized into clusters,each representing a unique known or putative human gene annotated with mapping andexpression information and cross-references to other sources.
http://www.ncbi.nlm.nih.gov/dbEST/index.htmlhttp://www.ncbi.nlm.nih.gov/dbGSS/index.htmlhttp://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/HTGS/http://www.ncbi.nlm.nih.gov/SNP/http://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/dbSTS/index.htmlhttp://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/dbSTS/index.htmlhttp://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/SNP/http://www.ncbi.nlm.nih.gov/HTGS/http://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/dbGSS/index.htmlhttp://www.ncbi.nlm.nih.gov/dbEST/index.html -
8/3/2019 8024 Bio Info
26/28
Bioinformatics Tools BLAST:
The Basic Local Alignment Search Tool (BLAST) for comparinggene and protein sequences against others in public databases,now comes in several types including PSI-BLAST, PHI-BLAST, and
BLAST 2 sequences. Specialized BLASTs are also available
FASTAA database search tool used to compare a nucleotide or peptidesequence to a sequence database. It was the first widely usedalgorithm for database similarity searching. The program looks for
optimal local alignments by scanning the sequence for smallmatches called "words"
-
8/3/2019 8024 Bio Info
27/28
ClustalwClustalW is a general purpose multiple sequence alignment program
for DNA or proteins. It produces biologically meaningful multiplesequence alignments of divergent sequences, calculates the bestmatch for the selected sequences, and lines them up so that theidentities, similarities and differences can be seen.
RasMolIt is a powerful research tool to display the structure of DNA,proteins, and smaller molecules. Protein Explorer, a derivative ofRasMol, is an easier to use program.
DeepView (also knows as Swiss-PdbViewer)For seeing and exploring macromolecular models in threedimensions, and for manual and semiautomated homology modeling
-
8/3/2019 8024 Bio Info
28/28
conclusion
Bioinformatics in India is at an early stage ofdevelopment. But at 4 to 5 centers in thecountry, one sees mature understanding of
the needs of this sector and world classdevelopment of tools and applications. Thesecenters will ensure that Indias traditional
strengths in IT are leveraged to place us onpar with the developed countries.