tutorial 4 comparing protein sequences

22
Tutorial 4 Comparing Protein Sequences Intro to Bioinformatics 1

Upload: dessa

Post on 06-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Tutorial 4 Comparing Protein Sequences. Intro to Bioinformatics. Amino acids were not born equally. Comparing Protein Sequences. Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix Advance comparison tools Psi-BLAST Phi-BLAST. Substitution Matrix. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tutorial 4 Comparing Protein Sequences

Tutorial 4Comparing Protein Sequences

Intro to Bioinformatics

1

Page 2: Tutorial 4 Comparing Protein Sequences

Amino acids were not born equally

2

Page 3: Tutorial 4 Comparing Protein Sequences

Comparing Protein Sequences

Substitution MatricesPAM - Point Accepted MutationsBLOSUM - Blocks Substitution Matrix

Advance comparison toolsPsi-BLASTPhi-BLAST

3

Page 4: Tutorial 4 Comparing Protein Sequences

Substitution Matrix

Scoring matrix S20x20 for protein alignment (Amino-acid)

Si,j represents the gain/penalty due to substituting AAj by AAi (i – line , j – colomn)Based on likelihood this substitution is found in

natureComputed differently in PAM and BLOSUM

4

Page 5: Tutorial 4 Comparing Protein Sequences

Computing probability of Mutation (Mi,j)

PAM - Point Accepted MutationsBased on closely related proteins (X% divergence)

Matrices for comparison of divergent proteins computed

BLOSUM - Blocks Substitution MatrixBased on conserved blocks bounded in similarity (at least X% identical)

Matrices for divergent proteins are derived using appropriate X%

5

Page 6: Tutorial 4 Comparing Protein Sequences

PAM-1

Captures mutation rates between close proteins1% divergenceMi,j = AB / #A

Problematic when comparing far proteinsThe 1% divergence does not capture more sporadic mutations

PAM250 is theoretical (extrapolation based)

6

Page 7: Tutorial 4 Comparing Protein Sequences

PAM-1

7

Page 8: Tutorial 4 Comparing Protein Sequences

Captures mutation rates between divergent proteins

Why is BLOSUM62 called BLOSUM62? Basically, this is because all blocks whose members shared at least 62% identity with ANY other member of that block were averaged and represented as 1 sequence.

BLOSUM62

8

Page 9: Tutorial 4 Comparing Protein Sequences

BLOSUM62

The idea of BLOSUM matrices is to get a better measure of differences between two proteins specifically for more distantly related proteins.

Similar AA have high score

9

Page 10: Tutorial 4 Comparing Protein Sequences

PAM & BLOSUM

PAM BLOSUMBased on global alignments of closely related proteins.

Based on local alignments.

The PAM1 is calculated from comparisons of sequences with no more than 1% divergence.

BLOSUM 62 is calculated from comparisons of sequences with at least 62% identity in the blocks.

Other PAM matrices are extrapolated from PAM1.

All BLOSUM matrices are based on observed alignments. They are not extrapolated from comparisons of closely related proteins.

10

Page 11: Tutorial 4 Comparing Protein Sequences

PAM100 ~ BLOSUM90 Closely RelatedPAM120 ~ BLOSUM80PAM160 ~ BLOSUM60 PAM200 ~ BLOSUM52PAM250 ~ BLOSUM45 Highly Divergent

Query length Matrix Gap costs

<35 PAM30 9,1

35-50 PAM70 10,1

50-85 BLOSUM80 10,1

>85 BLOSUM62 11,1

Use Recommendations

11

Page 12: Tutorial 4 Comparing Protein Sequences

ExampleQuery: >ADRM1_HUMAN

(Proteasomal ubiquitin receptor)Data Base: nr on Human genome.Blast Program: BLASTPMatrices: PAM30,BLOSUM45

12

Page 13: Tutorial 4 Comparing Protein Sequences

PAM 30 BLOSUM45

•With BLOSUM45 we found related and divergent sequences.

•With PAM30 we found only related sequences.

What difference do we observe?

13

Page 14: Tutorial 4 Comparing Protein Sequences

PAM 30

BLOSUM45

With BLOSUM45 we can discover interesting relations between proteins

...

Mucin-13:a glycosylated membrane protein that protects the cell by binding to pathogens

14

Page 15: Tutorial 4 Comparing Protein Sequences

With PAM 30

With BLOSUM45

Using different scoring matrices can produce slightlyDifferent alignments:

15

Page 16: Tutorial 4 Comparing Protein Sequences

A same alignment can be solved in many ways, specially when using a matrix for highly divergent sequences (BLOSUM45):

16

Page 17: Tutorial 4 Comparing Protein Sequences

PSI-BLAST

Position Specific Iterative BLAST

We will analyze the following Archeal uncharacterized protein: >gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNKMENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLGSVTENVIKKSNKPVLVVKRKNS

17

Page 18: Tutorial 4 Comparing Protein Sequences

18

Page 19: Tutorial 4 Comparing Protein Sequences

Threshold for initial BLAST

Search (default:10)

Threshold for inclusion in PSI-BLAST iterations

(default:0.005)

19

Page 20: Tutorial 4 Comparing Protein Sequences

The query itself

Orthologous sequences in two other archaeal species

Other homologous sequences

20

Page 21: Tutorial 4 Comparing Protein Sequences

21

Page 22: Tutorial 4 Comparing Protein Sequences

...

...

...

Is MJ0577 a filament protein?

Is MJ0577 a cationic amino

transporter?

Is MJ0577 a universal stress

protein?22