8/22/07bcb 444/544 f07 isu dobbs #2 - biological databases1 finish: lecture 1- what is...

51
8/22/07 BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22 BCB 444/544

Upload: ferdinand-walsh

Post on 12-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 1

Finish: Lecture 1- What is Bioinformatics?

Lecture 2

Biological Databases&

ISU Resources

#2_Aug22

BCB 444/544

Page 2: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 2

BCB 444/544 - Website

http://bindr.gdcb.iastate.edu/bcb544

• Updated Syllabus • Lecture & Lab Schedules

(with Homework Assignments) • Lecture PPTs & PDFs• Lab Exercises• Practice Exams• Grading Policy• Project Guidelines, etc.• Links

• Check regularly for updates!

Hyperlink

Page 3: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 3

Meets in 1304 MBB every weekEXCEPT this week:

Current schedule: Thurs 1-3 PMConflicts? See Drena

BCB 444/544 - Computer Lab

1st Lab meets in Library Rm 32

Page 4: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 4

Assignment #1: Tell us about you

Due: Today - Wed, Aug 22

1- Complete HW1_Aug20 for Drena

Page 5: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 5

Required Reading (must read before lecture)

Wed Aug 22 - for Lecture #2• Xiong Textbook:

• Chp 1 - Introduction• Chp 2 - Biological Databases

Thurs Aug 23 - for Lab #1:• Literature Resources for Bioinformatics

Andrea Dinkelman, see Lab Schedule for URL

Fri Aug 24• Genomics & Its Impact on Science & Society:

Genomics & Human Genome Project Primer see Lecture Schedule for URL

Page 6: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 6

A tutorial on genomic sequencing, gene structure, genes prediction

Howard Hughes Medical Institute (HHMI)Cold Spring Harbor Laboratory (CSHL)

Assignment #2 (& for Fun): DNA Interactive

"Genomes"

1. Take the Tour2. Read about the Project3. Do some Genome Mining with: Nothing to turn in - just do it!

http://www.dnai.org/c/index.html

Page 7: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 7

#1- What is Bioinformatics? (cont.)

Xiong: Chp 1

1 Introduction What Is Bioinformatics? Goal Scope Applications Limitations New Themes Further Reading

Page 8: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 8

1st Draft Human Genome: "Finished" in 2001

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Modified from Eric Green

Page 9: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 9

Human Genome Sequencing

Two approaches:

• Public (government) - International Consortium (mainly 6 countries, NIH-funded in US)

• Hierarchical cloning & BAC-to-BAC sequencing• Map-based assembly

• Private (industry) - Celera, Craig Venter, CEO• Whole genome random "shotgun" sequencing • Computational assembly (took advantage of public maps & sequences, too)

Guess which human genome they sequenced?Craig's

How many genes? ~ 20,000 (Science, May 2007)

Page 10: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 10

Public Sequencing:International Consortium

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Modified from Eric Green

Page 11: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 11

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Comparison of Sequenced Genome Sizes

Plants? Many have much larger genomes than human!

Modified from Eric Green

Page 12: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 12

"Complete" Human Genome Sequence: What next?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

from Eric Green

Page 13: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 13

Next Step after the Complete Sequence?

• Expression Analysis• Structural Genomics• Protein Interactions• Network Analysis• Systems Biology

Understanding Gene Function on a Genomic Scale

Evolutionary Implications of: • Intergenic Regions as "Gene Graveyard"• Introns & Exons

Modified from Mark Gerstein

Page 14: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 14

How can we begin to understand the complete Human Genome Sequence?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

from Eric Green

Page 15: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 15

Comparative Genomics: Compare entire genomes

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

from Eric Green

Page 16: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 16

Comparing Genomes: Identifying functional elements

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

from Eric Green

Page 17: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 17

Gene Expression Data: the Transcriptome

MicroArray Data

Yeast Expression Data:

• Levels for all 6,000 genes!

•Investigate how all genes respond to changes in environment or, in humans, e.g., how patterns of RNA expression change in normal vs cancerous tissue

Modified from Mark Gerstein

ISU's Biotechnology Facilities include state-of-the-art Microarray Instrumentation

Page 18: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 18

Other "Omes" Proteome, Metabolome, Glycome, etc.

ISU has state-of-the-art Proteomics Instrumentation

ISU's has state-of-the-art Metabolomics Instrumentation

Page 19: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 19

Systems Biology seeks to integrate all of these to explain the complex behaviors of whole systems (cells, organisms, ecosystems)

How are "Omes" related?

Page 20: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 20

Molecular Biology Information:Integrating Data

Understanding the function of genomes requires integration of many diverse and complex types of information:

• Metabolic pathways • Regulatory networks• Whole organism physiology• Evolution, phylogeny• Environment, ecology• Literature (MEDLINE)

Modified from Mark Gerstein

Page 21: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 21

Other Genome-Scale Experiments

Systematic Knockouts:

Make "knockout" (null) mutations in every gene - one at a time - and analyze the resulting phenotypes!

For yeast: 6,000 KO mutants!

2-hybrid Experiments:

For each (and every) protein, identify every other protein with which it interacts!

For yeast: 6000 x 6000 / 2

~ 18M interactions!!Modified from Mark Gerstein

Page 22: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 22

Storing & Analyzing Geonomic Information:

Exponential Growth of Data Coupled with Development of Fast Computer Technology

• Increases in computer speed & starage capacity have been dramatic

• Improved computing resources & more efficient algorithms have been driving forces in Bioinformatics & Computational Biology

Modified from Mark Gerstein

ISU's supercomputer "CyBlue" is among 100 most powerful computers in the world!

Page 23: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 23

Bioinformatics is born!& more Bioinformaticists are

needed!

(Internet picture adaptedfrom D Brutlag, Stanford)

Modified from Mark Gerstein

Page 24: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 24

“Informatics” techniques used in Bioinformatics

• DatabasesBuilding & querying object-

oriented & relational DBs

• String Comparison• Text search• Alignment• Significance statistics

• Patterns Finding• Machine Learning• Data Mining• Statistics• Linguistics

• Computational Geometry• Robotics• Graphics (surfaces, volumes)• Comparison & 3D matching

• Simulation & Modeling• Newtonian mechanics• Electrostatics• Numerical algorithms• Simulation• Network modeling• Population modeling

Page 25: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 25

Challenges in Organizing Information:

Redundancy and Multiplicity

• Different protein sequences can assume the same 3-D structure

• Organisms have many similar genes with redundant functions

• A single gene may have several different functions

• Genes & proteins function in complex genetic & regulatory pathways

• How do we organize all this information so that we can make sense of it?

Functional Genomics & Systems Biology:sequences <> motifs <> genes <> RNAs <> proteins <> structures <> functions <> expression levels <> pathways <> regulatory networks <> functional systems

Modified from Mark Gerstein

Page 26: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 26

One Strategy:Molecular Parts = Conserved Domains

Modified from Mark Gerstein

Page 27: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 27

"Parts List" approach to bike maintenance:

Which are the common parts (bolt, nut,washer, spring, bearing)?Which are unique parts (cogs, levers)?

How flexible and adaptable are parts mechanically?

Where are the parts

located?

Modified from Mark Gerstein

Page 28: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 28

~ 2,000 folds

~ 20,000 genes

~ 2,000 genes1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 …

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 …H. sapiens

World of macromolecular structures is also finite, providing a valuable simplification

Global surveys of a finite set of parts from different perspectives

Same logic for pathways, functions, sequence families, blocks, motifs....

T. pallidum

Modified from Mark Gerstein

Page 29: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 29

BUT, what actually happens inside cells or within whole organisms is very complex - providing a challenging complication !

Exploring the Virtual Cell at ISU

Virtual Cell projects elsewhere...

NCBI's Bookshelf - a great resource!

Page 30: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 30

So, having a list of parts is not enough!

BIG QUESTION?

SYSTEMS BIOLOGY

How do parts work together to form a functional system?

What is a system? Macromolecular complex, pathway, network, cell, tissue, organism, ecosystem…

Page 31: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 31

So, this is Bioinformatics

What is it good for?

Just a few examples…

Page 32: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 32

Designing drugs

• Understanding how proteins bind other molecules• Structural modeling & ligand docking• Designing inhibitors or modulators of key proteins

Figures adapted from Olsen Group Docking Page at Scripps, Dyson NMR Group Web page at Scripps, and from Computational Chemistry Page at Cornell Theory Center).

Modified from Mark Gerstein

Page 33: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 33

Finding homologs of "new" human genes

Modified from Mark Gerstein

Page 34: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 34

Finding WHAT? Homologs - "same genes" in different organisms

• Human vs Mouse vs Yeast • Much easier to do experiments on yeast to determine function

• Often, function of an ortholog in at least one organism is known

Best Sequence Similarity Matches to Date Between Positionally ClonedHuman Genes and S. cerevisiae Proteins

Human Disease MIM # Human GenBank BLASTX Yeast GenBank Yeast Gene Gene Acc# for P-value Gene Acc# for Description Human cDNA Yeast cDNA

Hereditary Non-polyposis Colon Cancer 120436 MSH2 U03911 9.2e-261 MSH2 M84170 DNA repair proteinHereditary Non-polyposis Colon Cancer 120436 MLH1 U07418 6.3e-196 MLH1 U07187 DNA repair proteinCystic Fibrosis 219700 CFTR M28668 1.3e-167 YCF1 L35237 Metal resistance proteinWilson Disease 277900 WND U11700 5.9e-161 CCC2 L36317 Probable copper transporterGlycerol Kinase Deficiency 307030 GK L13943 1.8e-129 GUT1 X69049 Glycerol kinaseBloom Syndrome 210900 BLM U39817 2.6e-119 SGS1 U22341 HelicaseAdrenoleukodystrophy, X-linked 300100 ALD Z21876 3.4e-107 PXA1 U17065 Peroxisomal ABC transporterAtaxia Telangiectasia 208900 ATM U26455 2.8e-90 TEL1 U31331 PI3 kinaseAmyotrophic Lateral Sclerosis 105400 SOD1 K00065 2.0e-58 SOD1 J03279 Superoxide dismutaseMyotonic Dystrophy 160900 DM L19268 5.4e-53 YPK1 M21307 Serine/threonine protein kinaseLowe Syndrome 309000 OCRL M88162 1.2e-47 YIL002C Z47047 Putative IPP-5-phosphataseNeurofibromatosis, Type 1 162200 NF1 M89914 2.0e-46 IRA2 M33779 Inhibitory regulator protein

Choroideremia 303100 CHM X78121 2.1e-42 GDI1 S69371 GDP dissociation inhibitorDiastrophic Dysplasia 222600 DTD U14528 7.2e-38 SUL1 X82013 Sulfate permeaseLissencephaly 247200 LIS1 L13385 1.7e-34 MET30 L26505 Methionine metabolismThomsen Disease 160800 CLC1 Z25884 7.9e-31 GEF1 Z23117 Voltage-gated chloride channelWilms Tumor 194070 WT1 X51630 1.1e-20 FZF1 X67787 Sulphite resistance proteinAchondroplasia 100800 FGFR3 M58051 2.0e-18 IPL1 U07163 Serine/threoinine protein kinaseMenkes Syndrome 309400 MNK X69208 2.1e-17 CCC2 L36317 Probable copper transporter

Modified from Mark Gerstein

Page 35: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 35

Comparative Genomics: Genome/Transcriptome/Proteome/Metabolome

Databases, statistics• Occurrence of a specific genes

or features in a genome • How many kinases in yeast?

• Compare Tissues• Which proteins are

expressed in cancer vs normal tissues?• Diagnostic tools• Drug target discovery

Modified from Mark Gerstein

Page 36: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 36

Molecular Recognition:Analyzing & Predicting Macromolecular

Interfaces (in DNA, RNA & protein complexes)

Drena Dobbs, GDCBJae-Hyung LeeMichael TerribiliniJeff SanderPete Zaback

Vasant Honavar, Com SFeihong WuCornelia CarageaFadi TowficJivo Sinapov

Robert Jernigan, BBMBTaner SenAndrzej Kloczkowski

Kai-Ming Ho, Physics

Page 37: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 37

Designing Zinc Finger DNA-binding Proteins to Recognize Specific Sites in Genomic DNA

Drena Dobbs, GDCBJeff SanderPete Zaback

Dan Voytas, GDCBFengli Fu

Les Miller, ComSVasant Honavar, ComS

Keith Joung, Harvard

Page 38: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 38

Structure & Function of Human Telomerase:

Predicting structure & functional sites in a clinically important but "recalcitrant" RNP

www.intl-pag.org/

Cell Biologist: Biochemist: Imagined structure:

Lingner et al (1997) Science 276: 561-567.www.chemicon.com

How would a systems biologist study telomerase?

Page 39: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 39

SUMMARY:#1- What is

Bioinformatics?

Page 40: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 40

#2- Biological Databases

Xiong: Chp 2

2 Introduction to Biological Databases What Is a Database? Types of Databases Biological Databases Pitfalls of Biological Databases Information Retrieval from Biological Databases Summary Further Reading

Page 41: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 41

What is a Database?

Duh!!

OK: skip we'll skip that!

Page 42: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 42

Types of Databases

3 Major types of electronic databases:

1- Flat files - simple text files• no organization to facilitate retrieval

2- Relational - data organized as tables ("relations")

• shared features among tables allows rapid search

3- Object-oriented - data organized as "objects"• objects associated hierarchically

Page 43: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 43

Biological Databases

Currently - all 3 types, but MANY flat files

What are goals of biological databases?

1- Information retrieval

2- Knowledge discovery

Important issue: Interconnectivity

Page 44: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 44

Types of Biological Databases

1- Primary• "simple" archives of sequences, structures, images,

etc.

• raw data, minimal annotations, not always well

curated!

2- Secondary• enhanced with more complete annotation of

sequences, structures, images, etc.

• usually curated!

3- Specialized• focused on a particular research interest or organism

• usually - not always - highly curated

Page 45: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 45

Examples of Biological Databases

1- Primary

• DNA sequences

• GenBank - US

• European Molecular Biology Lab - EMBL

• DNA Data Bank of Japan - DDBI

• Structures (Protein, DNA, RNA)

• PDB - Protein Data Bank

• NDB - Nucleic Acid Databank

Page 46: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 46

Examples of Biological Databases

2- Secondary

• Protein sequences

• Swiss-Prot, TreEMBL, PIR

• these recently combined into UniProt

3- Specialized

• Species-specific (or "taxonomic"

specific)

• Flybase, WormBase, AceDB, PlantDB

• Molecule-specific,disease-specific

Page 47: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 47

Pitfalls of Biological Databases

• Errors! &• Lack of documentation re: quality or reliability of data• Limited mechanisms for "data checking" or preventing propagation of errors (esp. annotation errors!!)• Redundancy• Inconsistency• Incompatibility (format, terminology, data types, etc.)

Page 48: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 48

Information Retrieval from Biological Databases

2 most popular retrieval systems:

• ENTREZ - NCBI

• will use a LOT - Introduced in Lab 1

• SRS - Sequence Retrieval Systems - EBI

• will use less, similar to ENTREZ

Both:

• Provide access to multiple databases

• Allow complex queries

Page 49: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 49

Web Resources: Bioinformatics & Computational Biology

• Wikipedia: Bioinformatics

• NCBI - National Center for Biotechnology Information• ISCB - International Society for Computational Biology• JCB - Jena Center for Bioinformatics• UBC - Bioinformatics Links Directory• UWa - BioMolecules• Pitt - OBRC Online Bioinformatics Resources Collection

• ISU - Bioinformatics Resources - Andrea Dinkelman• ISU - YABI = "Yet Another Bioinformatics Index"

(from BCB Lab at ISU)

Page 50: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 50

ISU Resources & Experts

ISU Research Centers & Graduate Training Programs:

• BCB Lab - (Student-Led Consulting & Resources)• BCB - Bioinformatics & Computational Biology• LH Baker Center - Bioinformatics & Biological Statistics• CIAG - Center for Integrated Animal Genomics• CILD - Computational Intelligence, Learning & Discovery• NSF IGERT Training Grant - Computational Molecular

Biology

ISU Facilities:

• Biotechnology - Instrumentation Facilities• PSI - Plant Sciences Institute • PSI Centers

Page 51: 8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases1 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22

8/22/07BCB 444/544 F07 ISU Dobbs #2 - Biological Databases 51

SUMMARY:#2- Biological Databases

BEWARE!