comparing protein sequences

18
Comparing Protein Sequences Tutorial 4 Today’s menu: PAM and BLOSUM score matrices Psi-BLAST Phi-BLAST

Upload: zeheb

Post on 08-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Tutorial 4. Comparing Protein Sequences. Today’s menu: PAM and BLOSUM score matrices Psi-BLAST Phi-BLAST. PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Comparing Protein Sequences

Comparing Protein Sequences

Tutorial 4

Today’s menu:

• PAM and BLOSUM score matrices• Psi-BLAST• Phi-BLAST

Page 2: Comparing Protein Sequences

PAM & BLOSUM

• PAM matrices are based on global alignments of closely related proteins.

• The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence.

• Other PAM matrices are extrapolated from PAM1.

• BLOSUM matrices are based on local alignments.

• BLOSUM 62 is a matrix calculated from comparisons of sequences with at most 62% identity

in the blocks.

• All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins.

Page 3: Comparing Protein Sequences

PAM100 ~ BLOSUM90 Closely RelatedPAM120 ~ BLOSUM80PAM160 ~ BLOSUM60 PAM200 ~ BLOSUM52PAM250 ~ BLOSUM45 Highly Divergent

Query length Matrix Gap costs

<35 PAM30 9,1

35-50 PAM70 10,1

50-85 BLOSUM80 10,1

>85 BLOSUM62 11,1

Use Recommendations

Page 4: Comparing Protein Sequences

Example

• Query: >ADRM1_HUMAN

(A glycosylated plasma membrane protein which promotes cell adhesion

• Data Base: nr on Human genome.• Blast Program: BLASTP• Matrices: PAM30,BLOSUM45

Page 5: Comparing Protein Sequences

PAM 30 BLOSUM45

•With BLOSUM45 we found related and divergent sequences.

•With PAM30 we found only related sequences.

What differences we observe?:

Page 6: Comparing Protein Sequences

PAM 30

BLOSUM45

With BLOSUM45 we can discover interesting relations between proteins

...

Mucin-13:a glycosylated membrane protein that protects the cell by binding to pathogens

Page 7: Comparing Protein Sequences

With PAM 30

With BLOSUM45

Using different scoring matrices can produce slightlyDifferent alignments:

Page 8: Comparing Protein Sequences

A same alignment can be solved in many ways, specially when using a matrix for highly divergent sequences (BLOSUM45):

Page 9: Comparing Protein Sequences

PSI-BLAST

Position Specific Iterative BLAST

We will analyze the following Archeal uncharacterized protein: >gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNKMENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLGSVTENVIKKSNKPVLVVKRKNS

Page 10: Comparing Protein Sequences
Page 11: Comparing Protein Sequences

Threshold for initial BLAST

Search (default:10)

Threshold for inclusion in PSI-BLAST iterations

(default:0.005)

Page 12: Comparing Protein Sequences

The query itself

Orthologous sequences in two other

archaeal species

Other homologous sequences

Page 13: Comparing Protein Sequences
Page 14: Comparing Protein Sequences

...

...

...

Is MJ0577 a filament protein?

Is MJ0577 a cationic amino

transporter?

Is MJ0577 a universal stress

protein?

Page 15: Comparing Protein Sequences

Pattern Hit Initiated BLAST

PHI-BLAST

A-T-X-[AVG]R-S

Page 16: Comparing Protein Sequences

Pattern symbols

[]= For grouping up aminoacids that can happen at a given position

()= For numbers, when a residue (or group of residues) is repited

- = For separating between positions

Page 17: Comparing Protein Sequences

Making a pattern

[LIVM](2)-D-E-A-D-[RKEN]-x-[LI]

…LIDEADKTT……IMDEADEFL……LLDEADKCL……ILDEADRIL……VVDEADNFI……LVDEADKGI……LMDEADEFL……MLDEADRSI……LIDEADKML……MLDEADNWI……LVDEADRFL…

Page 18: Comparing Protein Sequences

Example>gi|71154193|sp|P0A9P6|DEAD_ECOLI Cold-shock DEAD box protein A (ATP-dependent RNA helicase deaD) MAEFETTFADLGLKAPILEALNDLGYEKPSPIQAECIPHLLNGRDVLGMAQTGSGKTAAFSLPLLQNLDP ELKAPQILVLAPTRELAVQVAEAMTDFSKHMRGVNVVALYGGQRYDVQLRALRQGPQIVVGTPGRLLDHL KRGTLDLSKLSGLVLDEADEMLRMGFIEDVETIMAQIPEGHQTALFSATMPEAIRRITRRFMKEPQEVRI QSSVTTRPDISQSYWTVWGMRKNEALVRFLEAEDFDAAIIFVRTKNATLEVAEALERNGYNSAALNGDMN QALREQTLERLKDGRLDILIATDVAARGLDVERISLVVNYDIPMDSESYVHRIGRTGRAGRAGRALLFVE NRERRLLRNIERTMKLTIPEVELPNAELLGKRRLEKFAAKVQQQLESSDLDQYRALLSKIQPTAEGEELD LETLAAALLKMAQGERTLIVPPDAPMRPKREFRDRDDRGPRDRNDRGPRGDREDRPRRERRDVGDMQLYR IEVGRDDGVEVRHIVGAIANEGDISSRYIGNIKLFASHSTIELPKGMPGEVLQHFTRTRILNKPMNMQLL GDAQPHTGGERRGGGRGFGGERREGGRNFSGERREGGRGDGRRFSGERREGRAPRRDDSTGRRRFGGDA

The DEAD box pattern: [LIVM](2)-D-E-A-D-[RKEN]-x-[LI]