bioinformatics
TRANSCRIPT
BIOINFORMATICS
CHIRAG THAKKAR (MCA-37)
IIND SEM
• Introduction
• History
• Need for bioinformatics
• Computational evolutionary biology
• Success
• Software and tools
CONTENTS
• Bioinformatics is the application of Information
technology to store, organize and analyse the
vast amount of biological data.
• The stored data is available in the form of
sequences and structures of proteins and
nucleic acids (the information carrier).
• The biological information of nucleic acids is
available as sequences while the data of
proteins is available as sequences and
structures
INTRODUCTION
• Sequences are represented in single dimension
where as the structure contains the three
dimensional data of sequences.
Biologists
collect molecular data:
DNA & Protein sequences,
gene expression, etc.
Computer scientists
(+Mathematicians, Statisticians, etc.)
Develop tools, soft wares, algorithms
to store and analyze the data.
Bioinformaticians
Study biological questions by
analyzing molecular data
The field of science in which biology, computer science
and information technology merge into a single
discipline .
• By course of 10 years starting from 1981,
following events occurred…
• 579 human genes had been mapped.
• Invented a method for automated DNA
sequencing.
• The Human Genome organization (HUGO) was
founded. This is an international organization of
scientists involved in Human Genome Project.
• The first complete genome map was published
for the bacteria Haemophilus influenza .
HISTORY
• After 10 years…
• By 1991, a total of 1879 human genes had been
mapped.
• In 1993, Genethon , a human genome research
center in France Produced a physical map of the
human genome.
• After 3 years…
• Genethon published the final version of the
Human Genetic Map. This concluded the end of
the first phase of the Human Genome Project.
• Bioinformatics was fuelled by the need to createhuge databases.
• GenBank and EMBL and DNA Database ofJapan.
• They store and compare the DNA sequencedata coming from the human genome and othergenome sequencing projects.
• Today, bioinformatics enhances protein structureanalysis, gene and protein functionalinformation, data from patients, pre-clinical andclinical trials, and the metabolic pathways ofnumerous species.
• The first bioinformatics databases were constructed
a few years after the first protein sequences began
to become available.
• Now, A huge variety of divergent data resources of
different types and sizes are now available either in
the public domain information through
Internet(www.ncbi.nlm.nih.gov).
• All of the original databases were organized in a
very simple way with data entries being stored in flat
files, as a single large text file. Re-write - Later on
lookup indexes were added to allow convenient
keyword searching of header information.
• Bioinformatics uses many areas of computer
science, statistics, mathematics and engineering to process
biological data.
• Complex machines are used to read in biological data at a much
faster rate than before.
• Analyzing biological data may involve algorithms in artificial
intelligence, soft computing, data mining, image processing,
and simulation.
• The algorithms in turn depend on theoretical foundations such
as discrete mathematics, control theory, system theory, information
theory, and statistics.
• Commonly used software tools and technologies in the field
include Java, C#, XML, Perl, C, C++, Python, R, SQL, CUDA, MATL
AB, and spreadsheet
• the development of new algorithms (mathematical formulas) and
statistics with which to assess relationships among members of
NEED FOR BIOINFORMATICS
• Evolutionary biology is the study of the origin
and species, as well as their change over
time. Informatics has assisted evolutionary
biologists by enabling researchers.
COMPUTATIONAL EVOLUTIONARY
BIOLOGY
• 1) Analysis of gene expression
SUCCESS
• 2) Analysis of regulation
• One can then apply clustering algorithms to that
expression data to determine which genes are
co-expressed
• 3) Analysis of protein expression
• Bioinformatics is very much involved in making
sense of protein microarray and HT MS data.
• involves the problem of matching large amounts
of mass data against predicted masses from
protein sequence databases.
• 4) Analysis of mutations in cancer
• Bioinformaticians continue to produce
specialized automated systems to manage the
sheer volume of sequence data produced, and
they create new algorithms and software to
compare the sequencing results to the growing
collection of human genome sequences
and germline polymorphisms
• 5) Comparative genomics
• 6) High-throughput image analysis
• Computational technologies are used to
accelerate or fully automate the processing,
quantification and analysis of large amounts of
high-information-content biomedical imagery.
• accuracy, simple objective and high speed
• Open-source bioinformatics software
• Many free and open-source software tools have
existed and continued to grow up till now.
• The range of open-source software
packages includes titles such
as Bioconductor, BioPerl, Biopython, BioJava, BioR
uby, Bioclipse, EMBOSS, .NET Bio, Taverna
workbench, and UGENE.
• In order to maintain this tradition and create further
opportunities, the non-profit Open Bioinformatics
Foundation have supported the annual
Bioinformatics Open Source Conference (BOSC)
since 2000.
SOFTWARE AND TOOLS
• Web services in bioinformatics
• The main advantages is that end users do not
have to deal with software and database
maintenance overheads.
• Bioinformatics workflow management
systems
• A Bioinformatics workflow management
system is a specialized form of a workflow
management system designed specifically to
compose and execute a series of computational
or data manipulation steps, or a workflow, in a
Bioinformatics application.
• Rosalind
• Rosalind is an educational resource and web
project for learning bioinformatics
through problem solving and computer
programming.
• bioinfo.mbb.yale.edu
• www.ncbi.nlm.nih.gov
• bioinformaticsweb.net
• www.oxfordjournals.org
• www.umass.edu
REFERENCES