bioinformatics essentials stephanie tatem murphy [email protected]

35
Bioinformatics Essentials Stephanie Tatem Murphy [email protected]

Post on 21-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Bioinformatics Essentials

Stephanie Tatem Murphy

[email protected]

Page 2: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

ATGCATTTCGGTTTACGCCATATAGCTCGGGAATCATGCATCGATCGAGTAGCTAGCTAG

PNSADADNDFEDRLRAGLCDHDKEVQGLQVRCAVUEEHMHKKQQEFENIRLDAQRLEFFAYIFQKEHMKR

DNA ProteinModel organisms

Page 3: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

TGT AAT AGT TAT ATT TTCATT ATA AAT TGT GTT TGT AGA CAT CAT AAA TTT AAAACA TGG CTT TTT AAC CTGATA AAT CCT ACG AAT ATTTGT AAT AGT TAT GTT ATTGCA GTA AGT ACC GTT TGT ATT ATA AAT TGT GTT CTG

TGT AAT AGT TAT ATT TTCATT ATA AAT TGT GTT TGT AGA CAT CAT AAA TTT AAAACA TGG CTT TTT AAC CTGATA AAT CCT ACG AAT ATTTGT AAT AGT TAT GTT ATTGCA GTA AGT ACC GTT TGT ATT ATA AAT TGT GTT CTG

What is Bioinformatics?

Which genes are turned off then on ?Courtesy of Dr. Young Moo Lee UC Davis

Page 4: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001

Page 5: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Genome Transcriptome Proteome

Page 6: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Fundamental DogmaDNA

RNA

Proteins

Pathways

Phenotypes

PopulationsGenBank

EMBLDDBJ

MapDatabases

SwissPROTPIR

PDB

Gene Expression?

Clinical Data ?

Regulatory Pathways? Metabolism?

Biodiversity?

Neuroanatomy?

Development ?

Molecular Epidemiology?

Comparative Genomics?

the post-genomic era will need many more to collect, manage, and publish the coming flood of new findings.

the post-genomic era will need many more to collect, manage, and publish the coming flood of new findings.

Although a few databases already exist to distribute molecular information,

Although a few databases already exist to distribute molecular information,

Bob Robbins http://www.esp.org/rjr/canberra.pdf

Page 7: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Gene a b c d e

Art by Yelena Ponirovskaya

…ATGGCCCTGTGGATGCGCCTCCTGCCCCTG…..

DNA base sequence recipe for amino acids

Met: Ala: Leu: Trp: Met: Arg: Leu: Leu: Pro: Leu: Amino acid sequence = protein = trait

Page 8: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

The Biology Project University of Arizona

http://www.biology.arizona.eduDNA acitivity – RFLP, Inheritance http://www.biology.arizona.edu/human_bio/activities/blackett/introduction.html

DNA replication forkhttp://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/03t.html

DNA base pairinghttp://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/08t.html

DNA translationhttp://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/10t.html

The Genetic Codehttp://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/12t.htmlhttp://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/13t.html

DNA transcriptionhttp://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/15t.html

Page 9: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Bioinformatics – a Definition

bio – informatics: bioinformatics is conceptualizing biology in terms of molecules and applying “informatics techniques” to understand and organise the information associated with these molecules, on a large scale. In short, bioinformatics is a management information system for molecular biology and has many practical applications.

As submitted to the Oxford English Dictionary.

What is Bioinformatics? N. M. Luscombe, et al. Yale UniversityMethod Inform Med 4/2001

Page 10: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

The field of science in which biology, computer science, and information technology merge into a single discipline. NCBI, Aug 2001

BIOINFORMATICS

BIOLOGY

COMPUTERSCIENCE

INFORMATIONTECHNOLOGY

Bioinformatics – a Definition

Page 11: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

What’s in a name?

SequenceAnalysis

DatabaseHomologySearching

MultipleSequence

Alignment

HomologyModelingDocking

ProteinAnalysis

Proteomics

3DModeling

SampleRegistration &

TrackingIntegrated

DataRepositories

CommonVisual

Interfaces

IntellectualPropertyAuditing

Life Science Informatics

GenomeMapping

Page 12: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Bioinformatics Needs

Multidisciplinary teamsbiologists, mathematicians, computer scientists, laboratory technicians

Users and Developers to use / create scalable database infrastructurestandards to control vocabulary and annotationnew ways of visualizing, analyzing and searching datanew ways of delivering information, tools and results

Faster and larger computer systems

Page 13: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Demo Bioinformatics Company

Onconomics Corporationhttp://www.bscs.org/onco/default.htm

From nonprofit BSCS Biological Sciences Curriculum Study

Page 14: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Computer Programming 50 yrs ago DNA & Protein Structure

Personal Computers/ Internet 20 yrs ago PCR

w.w.w. Last 10 yrs Human Genome Project

All fields use computers Now Biological (art, law, communication) Research

Bioinformatics Computer Skills

Growth of Bioinformatics

www.oreilly.com

Page 15: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Why informatics?

Large size of data setsAllow students to ask questions of dataIntegrate current research into classroom

http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html

Page 16: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

>100,000 species are represented in GenBank

all species 128,941

viruses 6,137

bacteria 31,262

archaea 2,100

eukaryota 87,147

Page 17: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

The most sequenced organisms in GenBank

Homo sapiens 10.7 billion basesMus musculus 6.5bRattus norvegicus 5.6bDanio rerio 1.7bZea mays 1.4bOryza sativa 0.8bDrosophila melanogaster 0.7bGallus gallus 0.5bArabidopsis thaliana 0.5b

Updated 8-12-04GenBank release 142.0

Table 2-2Page 18

Page 18: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Online datasets for all the Life Sciences

Environment and EcologyPopulation http://www.prb.orgWater http://www.waterontheweb.org/ http://www.neptune.washington.edu/

Geography http://nhd.usgs.gov/http://data.geocomm.com/

Chemistry

Physics

Biology

Anatomy & Physiology

Earth http://www.dlese.org/educators/usingdata.html

Agriculture

Nutrition

Plant http://allometra.com/ath_fasta_mpss.shtml

Page 19: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Data mining requires a testable hypothesis generated with regard to the function or structure of a gene or protein by identifying similar sequences in better characterized organisms.

To help in uncovering phylogenetic relationships and evolutionary patterns.

www.tigr.org

Why use Bioinformatics?

Page 20: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

What is Bioinformatics? N. M. Luscombe, et al. Yale University Method Inform Med 4/2001

Page 21: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

BiotechnologyDid You or Will You Ever?

Ride in a car? Genetically engineered micro-organisms will someday be used to extract oil from rocks. Micro-organisms that break down oil spills are already in use.

Drink tap water? Genetically engineered micro-organisms will someday be used to attract and filter out harmful substances from drinking water.

Have a dog or cat? Vaccines for a number of pet diseases such as rabies will be improved by genetic engineering.

Wear brightly colored clothes? Many clothing dyes can be made less expensively with biotechnology, and will last longer.

Take vitamins? Vitamins can be made more potent and less expensively with biotechnology.

Go to the bathroom? Micro-organisms are already an important part of sewage treatment; genetic engineering will produce bacteria that are more efficient at breaking down wastes.

Page 22: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

What Good is Recombinant DNA?People with diabetes need to take a drug called insulin. In the past, this drug was extracted and purified from ground-up animal glands. It takes several pounds of cow or pig glands to produce a fraction of an ounce of insulin.

Today, the DNA with the instructions for making insulin can be spliced into a plasmid,And produced by bacteria? It’s faster, easier, and cheaper this way.

There are still many technical problems to be solved. Not all gene splices work, and some that do may fail over time.

There are also social and environmental concerns about biotechnology. Some people fear we will upset the balance of nature if “genetically engineered” organisms escape. Others fear that recombinant DNA will be used to influence human size, race, or intelligence.

The best way for people to enjoy the benefits and avoid the problems is to stay informed and up to date about what’s happening in biotechnology.

http://www.chourave.ch/init/kid/cartoon-00.html

Page 23: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

How Do You Make Recombinant DNA?

First, you need to isolate a specific bit of DNA with the instructions you want. To do this, you use restriction enzymes that break up DNA strands in specific places.

After you have DNA fragments, you sort them by size, using a gel. DNA is loaded onto the top of the gel, and then electricity is passed through it. This causes the DNA pieces to migrate down, and the small pieces travel further than the large pieces.

Next, you need to add the DNA fragment into a host. In most research, the host is a plasmid, a ring of DNA found in some bacteria.

The host DNA has to be exposed to restriction enzymes to make split ends that will attach to the fragment. After you mix the new and host DNA fragments, you need to add enzymes that will glue them together.

Page 24: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

If you used a plasmid as a host, you need to put it back into a bacterium. When the bacterium replicates itself, it will copy the new DNA too. A small population of “gene-spliced” bacteria can develop into a large population in just a few days.

How Do You Make Recombinant DNA?

http://www.gene.com/gene/research/ biotechnology

Page 25: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

What is an Enzyme?

Enzymes are molecules that speed up biological reactions.

Some characteristics of enzymes:

For example, the enzyme carbonic anhydrase enables red blood cells to pick up and dump carbon dioxide 1 million times faster than they could without it.

Enzymes increase the rate of a chemical reaction.

Enzymes don’t enter into the reaction themselves. They’re not physically changed as a result of the reaction. A single enzyme can act thousands of times.

Enzymes are highly specific. Like a wrench that will only fit a 5/16-inch bolt, each enzyme generally works with only a particular kind of molecule.

An enzyme increases the odds that two molecules will meet, so an enzyme is a “matchmaker”.

Page 26: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Why try to Design Better Enzymes?

Enzymes are fragile….they lose their shape (de-nature) if the temperature or acidity go up even a little. They also de-nature in alcohol or oils.

This is a drag! If you’re adding an enzyme to a laundry detergent you’d like it to function in hot water, with bleach!

As we understand more and more about DNA and how it is de-coded, we can re-write the instructions for making some enzymes.By altering their shapes, we may be able to make enzymes that are sturdier and able to function under harsher conditions. We may even be able to invent some completely new enzymes!

Page 27: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Examples of Enzymes

Subtilisin–This enzyme is added to laundry detergent. It breaks down proteins (like yucky egg yolk stains or gross dried blood) into tiny fragments that can be rinsed away from the fibers of the cloth.

Papain-This enzyme breaks up proteins, and is extracted from the papaya fruit. It’s now added to contact lens cleaner solution to help dissolve away gross crusty things from soft contact lenses.

Ceredase-Several thousand people in the United States have Gaucher disease (low levels of a crucial enzyme that dissolves fatty deposits in the liver, spleen and bone marrow). They suffer from bone pain, fractures, swelling and bleeding. Ceredase is a variation of the enzyme, produced in the laboratory, which can be used to treat disease.

Vianain-Originally derived from pineapples, this enzyme offers hope to burn victims. It helps prepare burned areas of skin grafts by safely dissolving damaged skin layers that would otherwise have to be removed surgically.

Page 28: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Journals & BooksPublic Library of Science - Open Access Journals

http://www.plosbiology.orgInternational Society for Computational Biology – Book Reviews

http://www.iscb.org/bioinformaticsBooks.shtmlFree Journals: Biotechniques http://www.BioTechniques.com

Genomeweb http://www.genomeweb.comBooks:The Cartoon Guide to Genetics, Larry Gonick & Mark Wheelis

ISBN 0062730991 Harper 1983Introduction to Bioinformatics, Arthur Lesk http://www.oup.com/uk/lesk/bioinf

ISBN 0199251967 Oxford 2002Fundamental Concepts of Bioinformatics, Dan Krane & Michael Raymer

ISBN 0805346333 Benjamin Cummings 2003Discovering Genomics, Proteomics, & Bioinformatics, A. Campbell & L. Heyer

ISBN 0805347224 Benjamin Cummings 2002Understanding Biotechnology, George Acquaah

ISBN 0130945005 Pearson Prentice Hall 2004Understanding Biotechnology, A. Borem, F. Santos, D. Bowen

ISBN 0131010115 Pearson Prentice Hall 2003

Page 29: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Human Genome Project

http://www.ornl.gov/sci/techresources/Human_Genome/publicat/primer2001/index.shtml

Genomics and Its Impact on Science and Society: The Human Genome Project and Beyond

U.S. Department of Energy Genome Programshttp://doegenomes.org

Page 30: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

www.ncbi.nlm.nih.gov

National Center for

BiotechnologyInformation

Page 31: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu
Page 32: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

A user’s guide to human genome

Nature Genetics www.nature.com/ng/vol 32, pg 1-79, 01 Sep 2002

Introduction: putting it together  

Question 8: How can one find all the members of a human gene family?  

Question 12: How does a user find characterized mouse mutants corresponding to human genes?  

Web resources: Internet resources featured in this guide

Page 33: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Get Schooled for Bioinformatics• Biology

– Know basics & Have sense of biological experimentation

• Computer Science– Programming C, C++, Perl, JAVA, SAS, CGI– Database construction UNIX, LINUX– Algorithm design

• Math/Statistics– Probability, Experimental design

• Ethics • “Core Bioinformatics”

– LIMS– EST clustering– Sequence analysis & annotation

Page 34: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Fundamental Dogma

DNA

RNA

Proteins

Circuits

Phenotypes

Populations

GenBankEMBLDDBJ

MapDatabases

SwissPROTPIR

PDB

Gene Expression?

Clinical Data ?

Regulatory Pathways? Metabolism?

Biodiversity?

Neuroanatomy?

Development ?

Molecular Epidemiology?

Comparative Genomics?

the post-genomic era will need many more to collect, manage, and publish the coming flood of new findings.

the post-genomic era will need many more to collect, manage, and publish the coming flood of new findings.

Although a few databases already exist to distribute molecular information,

Although a few databases already exist to distribute molecular information,

Biological Research = To enable the discovery of new biological insights as well as create a global perspective from which unifying principles in biology can be discerned. NCBI, Aug 2001

Biological Research = To enable the discovery of new biological insights as well as create a global perspective from which unifying principles in biology can be discerned. NCBI, Aug 2001

Page 35: Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu

Ultra – Conserved element -Only 6 SNP’s- mouse, rat, human

TGATCCCGGACTCTATGAATTATTGATGAGATATGAGCGTTGATTTCCCCTTTCAGGATGCAAACTCCATTATATTGTTAAAATGGCGATTTAATCGTTGAGAATAGCTTTGGTGTGGGTTTTTTCCCCCAACTCATTTGCGCCTCCTTCCTTTTCATTTAACTCTCTTAATTAAATCCTTTAACAGATTTTAATCACTTTTTGGAG