bioinformatics databases and applicationsrshamir/algmb/02/rubin.bioinfoapplication.pdf · eitan...

35
Eitan Eitan Rubin Rubin Bioinformatics & Biological Computing Unit Bioinformatics & Biological Computing Unit Department of Biological Services Department of Biological Services Bioinformatics databases and applications Eitan Rubin, December 2002

Upload: others

Post on 27-Mar-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Bioinformatics databases andapplications

Eitan Rubin, December 2002

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Outline

• Introduction

• A day in the life of a biologist

• Major databases

• Major tools

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Outline

• Introduction• A day in the life of a biologist

• Major databases

• Major tools

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Life as a simple CS problem

Algorithm

Input1

Input2

Output

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

A more realistic view

Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

A typical real-life view

Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision

Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision

Algorithm1

Input1

Input2

Output

Algorithm2

Algorithm3

decision

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

The life cycle of a bioinformaticsproject

• Clearly define the goals

• Define a strategy

• Run the process

• QA & optimize– Controls

– External knowledge

– Re-sampling

– Correlation

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Outline

• Introduction

• A day in the life of a biologist• Major databases

• Major tools

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Positional cloning of disease X

XM-417-L15XM-417-L16

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Genome browser @ UCSCLooking at the region of interest

Gene prediction program suggest there are 6-8 genes in the region

chrX:98100000-98500000

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

>unkown_proteinMRLTEKSEGEQQLKPNNSNAPNEDQEEEIQQSEQHTPARQRTQRADTQPSRCRLPSRRTPTTSSDRTINLLEVLPWPTEWIFNPYRLPALFELYPEFLLVFKEAFHDISHCLKAQMEKIGLPIILHLFALSTLYFYKFFLPTILSLSFFILLVLLLFIIVFILIFF

Get mRNA @ NCBI

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

BLAST @ NCBI

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Search for domains @Interpro

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Search for domains @Interpro

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Get predicted protein @ UCSC

>naharu.bMSSRKQGSQPRGQQSAEEENFKKPTRSNMQRSKMRGASSGKKTAGPQQKNLEPALPGRWGGRSAENPPSGSVRKTRKNKQKTPGNGDGGSTSEAPQPPRKKRARADPTVESEEAFKNRMEVKVKIPEELKPWLVEDWDLVTRQKQLFQLPAKKNVDAILEEYANCKKSQGNVDNKEYAVNEVVAGIKEYFNVMLGTQLLYKFERPQYAEILLAHPDAPMSQVYGAPHLLRLFVRIGAMLAYTPLDEKSLALLLGYLHDFLKYLAKNSASLFTASDYKVASAEYHRKAL

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Outline

• Introduction

• A day in the life of a biologist

• Major databases• Major tools

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

AAAAA

INSD (genbank, EMBL, DDJB)Specialized databases: Flybase, YPD, UCSC,TAIR

EPD

???

StackDB; Gencarta; Ensembl

HSSP

PDB

BIND; MINT; BRITE …

Swissprot ; interpro; LAMA; GO

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

INSD

• Genbank, EMBL, DDJB

• CleanBank

• Divisions (EST, HTG)

• Specialized databases

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Major tools

• Transcript modelling from ESTs– Sequencher, Staden, StackPACK

• Database searching– Blast

– BLAT

– Fasta

• Multiple Sequence Alignment– ClustalX

– MACAW

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Major tools

• Gene prediction• (EST) assembly• Promoter Finding• ORF identification• Similarity searching• MSA• Phylogenetic analysis• Structure prediction• Docking

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

ClustalX

• Stepwise tree-guided alignment

• “Bag full of tricks”

• Demo

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

The effect of parameters

Modified parameters

Default parameters

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

The effect of parameters

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Major tools

• Gene prediction• (EST) assembly• Promoter Finding• ORF identification• Similarity searching• MSA• Phylogenetic analysis• Structure prediction• Docking

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

Similarity searching

• SW (accelerated)• BLAST

+ The NCBI environment, Fast, wide dynamicrange, availability

- DNA very bad stats, poor for proteins? Highly local FASTA

• BLAT+ Lightening fast, focused- Limited dynamic range

Eitan Eitan RubinRubin Bioinformatics & Biological Computing UnitBioinformatics & Biological Computing UnitDepartment of Biological ServicesDepartment of Biological Services

MSA

• ClustalX+ Fast; familiar- Global; One, not very accurate algorithm

• Macaw+ Very interactive; outstanding GUI; multiple

algorithms- Immature; runs on PCs; incompatible

• BLOCKS maker+ Fully automated; fast- Poor control; many mistakes