©cmbi 2008 aligning sequences the most powerful weapon in the bioinformaticist’s armory is...

22
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It is a representation of a whole series of evolutionary events, which left traces in the sequences. Things that are more likely to happen during evolution should be most prominently observed in your alignment.

Post on 15-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Aligning Sequences

The most powerful weapon in the bioinformaticist’s armory is sequence alignment.

Why?

Lets’ think about an alignment.

It is a representation of a whole series of evolutionary events, which left traces in the sequences. Things that are more likely to happen during evolution should be most prominently observed in your alignment.

Page 2: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Why align sequences?

Lots of sequences with unknown structure and function. A few sequences with known structure and function

If they align, they are likely to be similar

If they are similar, then they very likely have same structure and/or function

If one of them has known structure/function, then alignment to the other yields insight about how the structure or function works

Page 3: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Sequence Alignment

The purpose of a sequence alignment is to line up all residues in the sequence that were derived from the same residue position in the ancestral gene or protein

gap = insertion or deletion

A

B

B

A

Page 4: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Alignment

To carry over information from a well studied protein sequence and its structure to a newly discovered protein sequence, we need an sequence alignment that represents the protein structures today, a structural alignment.

Page 5: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Alignment

The implicit meaning of placing amino acid residues below each other in the same column of a protein (multiple) sequence alignment is that they are at the equivalent position in the 3D structures of the corresponding proteins!!

Two very simple examples:

1) the 3 active site residues H, D, S, of the serine protease we saw earlier2) Cys-bridges:

STCTKGALKLPVCRKTSCTEG--RLPGCKR

Page 6: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Things one can do with a good alignment

Carry information from a well studied to a less well studied protein.

Such information can be:

Phosphorylation sitesGlycosylation sitesStabilizing mutationsMembrane anchorsIon binding sitesLigand binding residuesCellular localization

Typically what one finds in the FT records of Swissprot!

Page 7: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Significance of alignment

One can only transfer information if the similarity is significantly high between the two sequences.

Schneider (group of Sander) determined the “threshold curve” for transferring structural information from one known protein structure to another protein sequence:

If the sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information.

If the sequences are smaller in length, a higher percentage of identity is needed.

Structure is much more conserved than sequence!

Page 8: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Significance of alignment (2)

Page 9: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Aligning sequences by hand

Most information that enters the alignment procedure comes from the physico-chemical properties of the amino acids.

Examples: which is the better alignment (left or right)?

1) CPISRTWASIFRCW CPISRTWASIFRCWCPISRT---LFRCW CPISRTL---FRCW

2) CPISRTRASEFRCW CPISRTRASEFRCWCPISRTK---FRCW CPISRT---KFRCW

Page 10: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Aligning sequences by hand (2)

Procedure of aligning depends on information available:

1) Use “only” identity of amino acid and its physico-chemical properties. This is more or less what alignment programs do.

2) Also use explicitly the secondary structure preference of the amino acids.

3) Use 3D information if one or more of the structures in the alignment are known.

In most cases you will start with a alignment program (e.g. CLUSTAL) and then use your knowledge of the amino acids to improve the alignment, for instance by correcting the position of gaps.

Page 11: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Helix

Page 12: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

-4 -3 -2 -1 1 2 3 4 5 total - - - - H H H H H

ALA 143 148 99 58 189 205 187 241 268 1538

CYS 24 31 29 22 14 17 18 33 17 205

ASP 98 110 121 260 98 197 167 49 86 1186

GLU 91 100 71 71 152 287 269 70 147 1258

PHE 53 70 90 29 68 46 49 107 65 577

GLY 207 246 166 192 96 127 99 65 60 1258

HIS 48 50 39 46 28 36 38 24 30 339

ILE 94 81 133 19 79 45 68 161 99 779

LYS 99 98 80 46 98 105 69 80 154 829

LEU 105 111 188 50 140 84 113 281 209 1281

MET 37 20 51 13 26 22 54 61 67 351

ASN 103 83 89 206 46 62 55 37 77 758

PRO 143 136 121 99 240 78 40 0 0 857

GLN 48 58 40 38 83 93 124 76 101 661

ARG 82 63 59 51 71 75 61 114 109 685

SER 112 128 98 292 105 126 99 48 76 1084

THR 106 99 119 253 91 80 115 72 67 1002

VAL 141 107 132 37 117 74 120 208 120 1056

TRP 29 25 29 14 30 26 28 30 29 240

TYR 66 65 75 33 58 44 56 72 48 517

Helix preferences

Page 13: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Helix preferences and alignment

1) S G V S P D Q L A A L K L I L E L A L K 2) G T S L E T A L L M Q I A Q K L I A G S G V S P D Q L A A L K L I L E L A L K -1-4-4-1-4-1 3-2 1 1-2 2 -3-2 -3 2 5 1 2 2 1 5 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q I A Q K L I A G -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5

Page 14: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Helix preferences and alignment

1) S G V S P D Q L A A L K L I L E L A L K 2) G T S L E T A L L M Q I A Q K L I A G S G V S P D Q L A A L K L I L E L A L K -1-4-4-1-4-1 3-2 1 1-2 2 -3-2 -3 2 5 1 2 2 1 5 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q I A Q K L I A G -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5

Page 15: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Helix preferences and alignment S G V S P D Q L A A L K L I L E L A L K -1-4-4-1-4-1 3-2 1 1-2 2 -3-2 -3 2 5 1 2 2 1 5 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q I A Q K L I A G -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5Final alignment:

S G V S P D Q L A A L K L I L E L A L K - G T S L E T A L L M Q I A Q K L I A G

Page 16: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

A ‘real’ example of threading

If you know that in structure 1 the Ala is pointing outside and the Ser is pointing inside:

Where does the Arg in structure 2 go?

(and what will CLUSTAL choose?)

1

2

Page 17: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

An even more real example

1 2 3 4 5 6 7 8 9 10 11 A ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VALB1 VAL CYS ARG THR PRO --- --- --- GLU ALA ILEB2 VAL CYS ARG --- --- --- THR PRO GLU ALA ILE

Page 18: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

An even more real example

1 2 3 4 5 6 7 8 9 10 11 A ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VALB1 VAL CYS ARG THR PRO --- --- --- GLU ALA ILEB2 VAL CYS ARG --- --- --- THR PRO GLU ALA ILE

IVV CCC

RRR

LT-

PP- G- -

S-T

A-P

EEE

AAAV I I

Page 19: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Multiple sequence alignments can confirm or improve pair-wise sequence alignments:

CWPVAASYGRCWPT---YGRCWPTA-SYGRCWPTLGLFGR

Multiple sequence alignment

Page 20: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Multiple sequence alignment

Multiple sequence alignments can reveal structural information:

1 2 3 4ASCTRGCIKLPTCKKMGRCTGYSTCTKGALKLPVCRKMGKSSAYATSTHGCMKLPCSRRFGKCSSYTSCTEGCLRLPGCKRFGRCTSYTTCTKGLLKLPGCKRFGKSSAYASSTKGCMKLPVSRRFGRCTAY

Page 21: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Multiple sequence alignment

Multiple sequence alignments can validate PROSITE search results.In N-{P}-[ST]-{P} the N is the glycosylation site.The chance of finding N-{P}-[ST]-{P} is rather high.So how can you be sure? Look at the multiple sequence alignment:

ASLRNASTVVTIGDTITGNLTLASYHWGSIKNGSSVITLPGTMEGNLSTTTYHYATLRNASTVMEINGTITGDLTLASFHW

Page 22: ©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It

©CMBI 2008

Summary

Bioinformatics is all about obtaining information. Everything you can find in a database saves you doing experiments.

Sequence alignment is important for carrying over information between ‘similar proteins’.

To align sequences, you need to understand the amino acids.