bioinformatics databases and applications

35
Eitan Eitan Rubin Rubin Bioinformatics & Biological Computing Unit Bioinformatics & Biological Computing Unit Department of Biological Services Department of Biological Services Bioinformatics databases and applications Eitan Rubin, December 2002

Upload: others

Post on 04-Feb-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Bioinformatics databases andapplications

Eitan Rubin, December 2002

Page 2: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Outline

• Introduction

• A day in the life of a biologist

• Major databases

• Major tools

Page 3: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Outline

• Introduction• A day in the life of a biologist

• Major databases

• Major tools

Page 4: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Life as a simple CS problem

Algorithm

Input1

Input2

Output

Page 5: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

A more realistic view

Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision

Page 6: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

A typical real-life view

Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision

Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision

Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision

Page 7: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

The life cycle of a bioinformaticsproject

• Clearly define the goals

• Define a strategy

• Run the process

• QA & optimize– Controls

– External knowledge

– Re-sampling

– Correlation

Page 8: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Outline

• Introduction

• A day in the life of a biologist• Major databases

• Major tools

Page 9: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Positional cloning of disease X

XM-417-L15XM-417-L16

Page 10: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Genome browser @ UCSCLooking at the region of interest

Gene prediction program suggest there are 6-8 genes in the region

chrX:98100000-98500000

Page 11: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

>unkown_proteinMRLTEKSEGEQQLKPNNSNAPNEDQEEEIQQSEQHTPARQRTQRADTQPSRCRLPSRRTPTTSSDRTINLLEVLPWPTEWIFNPYRLPALFELYPEFLLVFKEAFHDISHCLKAQMEKIGLPIILHLFALSTLYFYKFFLPTILSLSFFILLVLLLFIIVFILIFF

Get mRNA @ NCBI

Page 12: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

BLAST @ NCBI

Page 13: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Page 14: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Search for domains @Interpro

Page 15: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Search for domains @Interpro

Page 16: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Page 17: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Get predicted protein @ UCSC

>naharu.bMSSRKQGSQPRGQQSAEEENFKKPTRSNMQRSKMRGASSGKKTAGPQQKNLEPALPGRWGGRSAENPPSGSVRKTRKNKQKTPGNGDGGSTSEAPQPPRKKRARADPTVESEEAFKNRMEVKVKIPEELKPWLVEDWDLVTRQKQLFQLPAKKNVDAILEEYANCKKSQGNVDNKEYAVNEVVAGIKEYFNVMLGTQLLYKFERPQYAEILLAHPDAPMSQVYGAPHLLRLFVRIGAMLAYTPLDEKSLALLLGYLHDFLKYLAKNSASLFTASDYKVASAEYHRKAL

Page 18: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Page 19: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Page 20: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Page 21: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Page 22: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Page 23: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Page 24: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Page 25: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Outline

• Introduction

• A day in the life of a biologist

• Major databases• Major tools

Page 26: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

AAAAA

INSD (genbank, EMBL, DDJB)Specialized databases: Flybase, YPD, UCSC,TAIR

EPD

???

StackDB; Gencarta; Ensembl

HSSP

PDB

BIND; MINT; BRITE …

Swissprot ; interpro; LAMA; GO

Page 27: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

INSD

• Genbank, EMBL, DDJB

• CleanBank

• Divisions (EST, HTG)

• Specialized databases

Page 28: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Major tools

• Transcript modelling from ESTs– Sequencher, Staden, StackPACK

• Database searching– Blast

– BLAT

– Fasta

• Multiple Sequence Alignment– ClustalX

– MACAW

Page 29: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Major tools

• Gene prediction• (EST) assembly• Promoter Finding• ORF identification• Similarity searching• MSA• Phylogenetic analysis• Structure prediction• Docking

Page 30: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

ClustalX

• Stepwise tree-guided alignment

• “Bag full of tricks”

• Demo

Page 31: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

The effect of parameters

Modified parameters

Default parameters

Page 32: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

The effect of parameters

Page 33: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Major tools

• Gene prediction• (EST) assembly• Promoter Finding• ORF identification• Similarity searching• MSA• Phylogenetic analysis• Structure prediction• Docking

Page 34: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Similarity searching

• SW (accelerated)• BLAST

+ The NCBI environment, Fast, wide dynamicrange, availability

- DNA very bad stats, poor for proteins? Highly local FASTA

• BLAT+ Lightening fast, focused- Limited dynamic range

Page 35: Bioinformatics databases and applications

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

MSA

• ClustalX+ Fast; familiar- Global; One, not very accurate algorithm

• Macaw+ Very interactive; outstanding GUI; multiple

algorithms- Immature; runs on PCs; incompatible

• BLOCKS maker+ Fully automated; fast- Poor control; many mistakes