1-month practical course genome analysis lecture 5: multiple sequence alignment
DESCRIPTION
C. E. N. T. E. R. F. O. R. I. N. T. E. G. R. A. T. I. V. E. B. I. O. I. N. F. O. R. M. A. T. I. C. S. V. U. 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment Centre for Integrative Bioinformatics VU (IBIVU) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/1.jpg)
1-month Practical CourseGenome Analysis
Lecture 5: Multiple Sequence Alignment
Centre for Integrative Bioinformatics VU (IBIVU)Vrije Universiteit AmsterdamThe Netherlands
CENTR
FORINTEGRATIVE
BIOINFORMATICSVU
E
![Page 2: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/2.jpg)
Multiple sequence alignment
![Page 3: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/3.jpg)
Multiple sequence alignmentMultiple sequence alignment Sequences can be conserved across species and perform
similar or identical functions.> hold information about which regions have high mutation rates over evolutionary time and which are evolutionarily conserved;> identification of regions or domains that are critical to functionality.
Sequences can be mutated or rearranged to perform an altered function.> which changes in the sequences have caused a change in the functionality.
Multiple sequence alignment: the idea is to take three or moresequences and align them so that the greatest number of similarcharacters are aligned in the same column of the alignment.
![Page 4: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/4.jpg)
What to ask yourselfWhat to ask yourself How do we get a multiple alignment?
(three or more sequences)
What is our aim?– Do we go for max accuracy, least computational time or the best compromise?
What do we want to achieve each time
![Page 5: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/5.jpg)
Sequence-sequence alignmentSequence-sequence alignmentsequence
sequence
![Page 6: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/6.jpg)
Multiple alignment methodsMultiple alignment methods
Multi-dimensional dynamic programming> extension of pairwise sequence alignment.
Progressive alignment> incorporates phylogenetic information to guide the alignment process
Iterative alignment> correct for problems with progressive alignment by repeatedly realigning subgroups of sequence
![Page 7: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/7.jpg)
Simultaneous multiple alignmentSimultaneous multiple alignmentMulti-dimensional dynamic programmingMulti-dimensional dynamic programming
The combinatorial explosion
2 sequences of length n n2 comparisons
Comparison number increases exponentially i.e. nN where n is the length of the sequences, and N is the
number of sequences
Impractical for even a small number of short sequences
![Page 8: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/8.jpg)
Multi-dimensional dynamic Multi-dimensional dynamic programmingprogramming (Murata et al., 1985)(Murata et al., 1985)
Sequence 1
Seq
uenc
e 2
Sequence 3
![Page 9: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/9.jpg)
The MSA approachThe MSA approach MSA (Lipman et al., 1989, PNAS 86, 4412)
MSA restricts the amount of memory by computing bounds that approximate the centre of a multi-dimensional hypercube.
Calculate all pair-wise alignment scores. Use the scores to to predict a tree. Calculate pair weights based on the tree (lower bound). Produce a heuristic alignment based on the tree. Calculate the maximum weight for each sequence pair (upper bound). Determine the spatial positions
that must be calculated to obtain the optimal alignment.
Perform the optimal alignment. Report the weight found compared
to the maximum weight previouslyfound (measure of divergence).
Extremely slow and memory intensive. Max 8-9 sequences of ~250 residues.
![Page 10: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/10.jpg)
The DCA approachThe DCA approach DCA (Stoye et al., 1997, Appl. Math. Lett. 10(2), 67-73)
Each sequence is cut in two behinda suitable cut position somewhere close to its midpoint.
This way, the problem of aligningone family of (long) sequences is divided into the two problems of aligning two families of (shorter) sequences.
This procedure is re-iterated untilthe sequences are sufficiently short.
Optimal alignment by MSA.
Finally, the resulting short alignments are concatenated.
![Page 11: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/11.jpg)
So in effect …So in effect …Sequence 1
Seq
uenc
e 2
Sequence 3
![Page 12: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/12.jpg)
Multiple alignment methodsMultiple alignment methods
Multi-dimensional dynamic programmingMulti-dimensional dynamic programming> extension of pairwise sequence alignment.> extension of pairwise sequence alignment.
Progressive alignment> incorporates phylogenetic information to guide the alignment process
Iterative alignment> correct for problems with progressive alignment by repeatedly realigning subgroups of sequence
![Page 13: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/13.jpg)
The progressive alignment methodThe progressive alignment method Underlying idea: usually we are interested in aligning
families of sequences that are evolutionary related.
Principle: construct an approximate phylogenetic tree for the sequences to be aligned and than to build up the alignment by progressively adding sequences in the order specified by the tree.
But before going into details, some facts about multiple alignment profiles …
![Page 14: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/14.jpg)
How to represent a block of sequences?How to represent a block of sequences? Historically: consensus sequence – single sequence
that best represents the amino acids observed at each alignment position. When consensus sequences are used the pair-wise DP algorithm can be used without alterations
Modern methods: Alignment profile – representation that retains the information about frequencies of amino acids observed at each alignment position.
![Page 15: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/15.jpg)
Multiple alignment profiles Multiple alignment profiles (Gribskov et al. 1987)(Gribskov et al. 1987)
Gribskov created a probe: group of typical sequences of functionally related proteins that have been aligned by similarity in sequence or three-dimensional structure (in his case: globins & immunoglobulins).
Then he constructed a profile, which consists of a sequence position-specific scoring matrix M(p,a) composed of 21 columns and N rows (N = length of probe).
The first 20 columns of each row specify the score for finding, at that position in the target, each of the 20 amino acid residues. An additional column contains a penalty for insertions or deletions at that position (gap-opening and gap-extension).
![Page 16: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/16.jpg)
Multiple alignment profilesMultiple alignment profiles
ACDWY-
ifA..fC..fD..fW..fY..Gapo, gapxGapo, gapx
Position dependent gap penalties
Core region Core regionGapped region
Gapo, gapx
fA..fC..fD..fW..fY..
fA..fC..fD..fW..fY..
![Page 17: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/17.jpg)
Profile buildingProfile building Example: each aa is represented as a frequency; gap penalties as weights.
ACDWY
Gappenalties
i0.30.100.30.30.51.0
Position dependent gap penalties
0.50000.5
00.50.20.10.21.0
![Page 18: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/18.jpg)
Profile-sequence alignmentProfile-sequence alignment
ACD……VWY
sequence
![Page 19: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/19.jpg)
Sequence to profile alignmentSequence to profile alignment
AAVVL
0.4 A0.2 L0.4 V
Score of amino acid L in sequence that is aligned against this profile position:
Score = 0.4 * s(L, A) + 0.2 * s(L, L) + 0.4 * s(L, V)
![Page 20: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/20.jpg)
Profile-profile alignmentProfile-profile alignmentACD..Y
ACD……VWY
profile
profile
![Page 21: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/21.jpg)
Profile to profile alignmentProfile to profile alignment
0.4 A0.2 L0.4 V
Match score of these two alignment columns using the a.a frequencies at the corresponding profile positions:Score = 0.4*0.75*s(A,G) + 0.2*0.75*s(L,G) + 0.4*0.75*s(V,G) ++ 0.4*0.25*s(A,S) + 0.2*0.25*s(L,S) + 0.4*0.25*s(V,S)
s(x,y) is value in amino acid exchange matrix (e.g. PAM250, Blosum62) for amino acid pair (x,y)
0.75 G0.25 S
![Page 22: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/22.jpg)
So, for scoring profiles …So, for scoring profiles … Think of sequence-sequence alignment. Same principles but more information for each position.
Reminder: The sequence pair alignment score S comes from the
sum of the positional scores M(aai,aaj) (i.e. the substitution matrix values at each alignment position minus penalties if applicable)
Profile alignment scores are exactly the same, but the positional scores are more complex
![Page 23: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/23.jpg)
Scoring a profile positionScoring a profile position
At each position (column) we have different residue frequencies for each amino acid (rows)
SO: Instead of saying S=M(aa1, aa2) (one residue pair) For frequency f>0 (amino acid is actually there) we take:
ACD..Y
Profile 1ACD..Y
Profile 2
20
i
20
jjiji )aa,M(aafaafaaS
![Page 24: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/24.jpg)
Progressive alignmentProgressive alignment1. Perform pair-wise alignments of all of the sequences;2. Use the alignment scores to produces a dendrogram using
neighbour-joining methods (guide-tree);3. Align the sequences sequentially, guided by the
relationships indicated by the tree.
Biopat (first integrated method ever)MULTAL (Taylor 1987)DIALIGN (1&2, Morgenstern 1996)PRRP (Gotoh 1996)ClustalW (Thompson et al 1994)PRALINE (Heringa 1999)T Coffee (Notredame 2000)POA (Lee 2002)MAFFT (Katoh 2002)MUSCLE (Edgar 2004)PROBCONS (Do 2005)
![Page 25: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/25.jpg)
Progressive multiple alignmentProgressive multiple alignment1213
45
Guide tree Multiple alignment
Score 1-2
Score 1-3
Score 4-5
Scores Similaritymatrix5×5
Clustering (tree-building)method Iteration possibilities
Align
![Page 26: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/26.jpg)
General progressive multiple General progressive multiple alignment techniquealignment technique (follow generated tree)(follow generated tree)
13
25
13
13
13
25
25
d
root
![Page 27: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/27.jpg)
PRALINE progressive strategyPRALINE progressive strategy13
2
13
13
13
25
254
d
4
![Page 28: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/28.jpg)
There are problems …There are problems …Accuracy is very important !!!!
Alignment errors during the construction of the MSA cannot be repaired anymore: errors made at any alignment step are propagated through subsequent progressive steps.
The comparisons of sequences at early steps during progressive alignments cannot make use of information from other sequences.
It is only later during the alignment progression that more information from other sequences (e.g. through profile representation) becomes employed in further alignment steps.
“Once a gap, always a gap”Feng & Doolittle, 1987
![Page 29: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/29.jpg)
Clustal, ClustalW, ClustalX• CLUSTAL W/X (Thompson et al., 1994) uses Neighbour Joining (NJ)
algorithm (Saitou and Nei, 1984), widely used in phylogenetic analysis, to construct a guide tree.
• Sequence blocks are represented by a profile, in which the individual sequences are additionally weighted according to the branch lengths in the NJ tree.
• Further carefully crafted heuristics include: – (i) local gap penalties – (ii) automatic selection of the amino acid substitution matrix, (iii) automatic gap penalty
adjustment– (iv) mechanism to delay alignment of sequences that appear to be distant at the time they
are considered.• CLUSTAL (W/X) does not allow iteration (Hogeweg and Hesper, 1984;
Corpet, 1988, Gotoh, 1996; Heringa, 1999, 2002)
![Page 30: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/30.jpg)
Sequence weighting dilemmaPair-wise alignment quality versus sequence identity
(Vogt et al., JMB 249, 816-831,1995)
![Page 31: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/31.jpg)
• Profile pre-processing• Secondary structure-induced
alignment• Homology-extended alignment• Matrix extension
Objective: try to avoid (early) errors
Additional strategies for multiple sequence alignment
![Page 32: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/32.jpg)
Profile pre-processing121345
Score 1-2Score 1-3
Score 4-5
12345
1
ACD..Y
PiPx
1
Key Sequence
Pre-alignment
Pre-profile
Master-slave (N-to-1) alignment
![Page 33: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/33.jpg)
Pre-profile generation121345
Score 1-2
Score 1-3
Score 4-5
ACD..Y
12345
1ACD..Y
21345
2
Pre-profilesPre-alignments
512354
ACD..Y
Cut-off
![Page 34: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/34.jpg)
Pre-profile alignment
ACD..YACD..YACD..Y
ACD..Y
ACD..Y
12345
12345
Pre-profiles
Final alignment
![Page 35: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/35.jpg)
Pre-profile alignment
1234512134531245341235
4512354
212345
Final alignment
![Page 36: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/36.jpg)
Pre-profile alignmentAlignment consistency
1234512134531245341235
4512354
1
2 2
5
Ala131A131A131L133C126A131
![Page 37: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/37.jpg)
PRALINE pre-profile generation• Idea: use the information from all query sequences to
make a pre-profile for each query sequence that contains information from other sequences
• You can use all sequences in each pre-profile, or use only those sequences that will probably align ‘correctly’. Incorrectly aligned sequences in the pre-profiles will increase the noise level.
• Select using alignment score: only allow sequences in pre-profiles if their alignment with the score higher than a given threshold value. In PRALINE, this threshold is given as prepro=1500 (alignment score threshold value is 1500 – see next two slides)
![Page 38: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/38.jpg)
Flavodoxin-cheY consistency scores(PRALINE prepro=0)
1fx1 --7899999999999TEYTAETIARQL8776-6657777777777777553799VL999ST97775599989-435566677798998878AQGRKVACFFLAV_DESVH -46788999999999TEYTAETIAREL7777-7757777777777777553799VL999ST97775599989-435566677798998878AQGRKVACFFLAV_DESDE -47899999999999999999999988776695658888777777778763YDAVL999SAW9877789877753556666669777776789GRKVAAFFLAV_DESGI -46788999999999TEGVAEAIAKTL9997-76678888777777887539DVVL999ST987776--9889546667776697776557777888888FLAV_DESSA 93677799999999999999999999988759765777888888888876399999999STW77765--99995366666777979987799999999994fxn -878779999999999999999999776666967567788888888888777999999988777776--9889577788888897773237888888888FLAV_MEGEL 9776779999999999999999997777766-665666677788899976799999999987777669--8873623344666955554557788888882fcr --87899999999999TEVADFIGK996541900300000112233355679DLLF99999855312888111224555555407777777888888888FLAV_ANASP -47899LFYGTQTGKTESVAEIIR9777653922356677777777897779999999999988843--9998555778777899998879999999999FLAV_ECOLI 997789999GSDTGNTENIAKMIQ8774222922456678889999995569999999999755553----99262225555495777767778999999FLAV_AZOVI --79IGLFFGSNTGKTRKVAKSIK99887759657577888888999777899999999999877761112222222244555-5555555778999999FLAV_ENTAG 94789999999999999999999998755229223234555555555555688899999998875521111111133477777-7777777999999999FLAV_CLOAB -86999ILYSSKTGKTERVAK9997555555057678887888887777765778899998522223--98883422344555977777777777777773chy 0122222223333335666665555555222922222222222221112163335555755553222888877674533344493332222222222222
Avrg Consist 8667778888888889999999998776554844455566666666665557888888888766544887666334445566586666556778888888Conservation 0125538675848969746963946463343045244355446543473516658868567554455000000314365446505575435547747759
1fx1 G888799955555559888888888899777----7777797787787978---555555566776555677777778888799------FLAV_DESVH G888799955555559888888888899777----7777797787787978---555555566776555677777778888799------FLAV_DESDE A88878685555555999988888889998879--8777788-98777777--8555555554433245667777777777599------FLAV_DESGI 87775977755555677777777777777778---88888887667778777775555555555542424667888887777--------FLAV_DESSA 977768777555556777777777777777767887777777778888-978985555555556536556888888888877--------4fxn 867777555555552666666666555555577887767999877777977777665555555555444466666666555798------FLAV_MEGEL 8577775666666525556777778888888689977888988776558677885544333222222212233223355557--------2fcr 877773573333333777766667777765533333333333333322833333333332244444567777777888777633------FLAV_ANASP 977773775333344777888888777777733334444444444433833333344444444444455577777788777734------FLAV_ECOLI 977743786444444777788888888888833334444444444444244444555554555775667788888888877734110000FLAV_AZOVI 97776355333333466666667777777773333444444444444482333355555555555545558888888877772311----FLAV_ENTAG 977773886555555866666666677666633333333333333322123333344444444455555665566666555582------FLAV_CLOAB 766627222222212444444444455555587882222222222222111111122222222222344443333333233399------3chy 222227222222224111355431113324578-87778997666556877776322222222222322222323344444422------
Avrg Consist 866656564444444666666666666666656665555565555555655565444443444443344455666666666666889999Conservation 73663057433334163464534444*746710000011010011000000010434744645443225474454448434301000000
Iteration 0 SP= 135136.00 AvSP= 10.473 SId= 3838 AvSId= 0.297Consistency values are scored from 0 to 10; the value 10 is represented by the corresponding amino acid (red)
![Page 39: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/39.jpg)
1fx1 -42444IVYGSTTGNTEYTAETIARQL886666666577777775667888DLVLLGCSTW77766----995476666769-77888788AQGRKVACFFLAV_DESVH -34444IVYGSTTGNTEYTAETIAREL776666666577777775667888DLVLLGCSTW77766----995476666769-77888788AQGRKVACFFLAV_DESSA -33444IVYGSTTGNTET99999888777655777668888899666686YDIVLFGCSTW77777----996466666779-88SL98ADLKGKKVSVFFLAV_DESGI -34444IVYGSTTGNTEGVA9999999999765555677777886666678DVVLLGCSTW77777----995466666779-88887688888KKVGVFFLAV_DESDE -44777IVFGSSTGNTE988777666655566777778899999777777YDAVLFGCSAW88877----997587777779-8887766777GRKVAAF4fxn -32222IVYWSGTGNTE8888888876666778888888888NI8888586DILILGCSA888888------8-8888886--66665378ISGKKVALFFLAV_MEGEL -12222IVYWSGTGNTEAMA8888888888888888555555555555485DVILLGCPAMGSE77------572222288--8888755588GKKVGLF2fcr -41456IFFSTSTGNTTEVA999998865432222765554443244779YDLLFLGAPT944411999-111112454441-8DKLPEVDMKDLPVAIFFLAV_ANASP -00456LFYGTQTGKTESVAEII987755323322427776666623589YQYLIIGCPTW55532--999843678W988899998888888GKLVAYFFLAV_AZOVI -42445LFFGSNTGKTRKVAKSIK87777434333536666665467777YQFLILGTPTLGEG862222222222355558-45666666888KTVALFFLAV_ENTAG -266IGIFFGSDTGQTRKVAKLIHQKL6664664424DVRRATR88888SYPVLLLGTPT88888644444444446WQEF8-8NTLSEADLTGKTVALFFLAV_ECOLI -51114IFFGSDTGNTENIAKMI987743311111555555588355599YDILLLGIPT954431----88355225544--44666666779KLVALFFLAV_CLOAB -63666ILYSSKTGKTERVAKLIE63333333333333333333366LQESEGIIFGTPTY63--6--------66SWE33333333333333GKLGAAF3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGGYGFVI---SDWNMPNM----------DGLEL--LKTIRADGAMSALPVLM
Avrg Consist 9334459999999999999999988776655555555666667756667889999999999767658888775555566668967777677889999999Conservation 0236428675848969746963946463344354312564565414344366588685675544550000003144654460055575345547747759
1fx1 G98879-89-999877977--7788899999999955--88888-9988887798999777778766553344588776666222266899899FLAV_DESVH G98879-89-999877977--7788899999999955--88888-9988887798999777778766553344588776666222266899899FLAV_DESSA G98878-688688888-88--88999999999999979988888887788889-89-9787777666756645577776666654466899899FLAV_DESGI G98879-898688888987--788888999GATLV7698899-9998789888-8899787878776663122477788888333276899899FLAV_DESDE AS8888-68-888888899--9999999999988888-999888889887788978887766688542222122555555553332779999994fxn GS2228-228222222222--2388888888888888888888888888888888888887778866765535577555533221288888888FLAV_MEGEL G4888--28-8888882MD--AWKQRTEDTGATVI77---------------------77222--224444222222244222112--------2fcr GLGDA5-8Y5DNFC88-88--8877777777777765444555555555544385555777774465333357799999987555333899899FLAV_ANASP GTGDQ5-GY5899999-99--99EEKISQRGG99975555544444444433284444466665555555556666676666433333899899FLAV_AZOVI GLGDQ5-885777555-55--55555788888888555555555555555554855555555555666555555888855555544442--288FLAV_ENTAG GLGDQL-NYSKNFVSA-MR--ILYDLVIARGACVVG8888EGYKFSFSAA6664NEFVGLPLDQEN88888EERIDSWLE88842242688688FLAV_ECOLI GC99549784688888987997777777778888855444444444444444114444777774455775567788888887433322100100FLAV_CLOAB STANS6366663333333333336666666666666666663333363366336663333336EDENARIFGERIANKVKQI3333336666663chy VTAEA---KKENIIAA-----------AQAGAS-------------------------GYVVK-----PFTAATLEEKLNKIFEKLGM------
Avrg Consist 9988779787777777777997788888888888866777777777767766677777676667766655455577776666433355788788Conservation 746640037154545706300354534444*745753000001010010000000010683760144442335574454448434301000000
Iteration 0 SP= 136702.00 AvSP= 10.654 SId= 3955 AvSId= 0.308
Flavodoxin-cheY consistency scores(PRALINE prepro=1500)
Consistency values are scored from 0 to 10; the value 10 is represented by the corresponding amino acid (red)
![Page 40: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/40.jpg)
Multiple alignment methodsMultiple alignment methods
Multi-dimensional dynamic programming> extension of pairwise sequence alignment.
Progressive alignment> incorporate phylogenetic information to (create an order to) guide the alignment process
Iterative alignment> correct problems with progressive alignment by repeatedly realigning subgroups of sequences
![Page 41: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/41.jpg)
Iteration
Alignment iteration: •do an alignment•learn from it•do it better next time
Bootstrapping
![Page 42: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/42.jpg)
Consistency-based iteration
Pre-profilesPre-profiles
Multiple Multiple alignmentalignmentpositionalpositionalconsistencyconsistencyscoresscores
![Page 43: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/43.jpg)
Pre-profile update iteration
Pre-profilesPre-profiles
Multiple Multiple alignmentalignment
![Page 44: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/44.jpg)
Iteration
Convergence Limit
cycle
Divergence
![Page 45: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/45.jpg)
CLUSTAL X (1.64b) multiple sequence alignment Flavodoxin-cheY
1fx1 -PKALIVYGSTTGNTEYTAETIARQLANAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRKFLAV_DESVH MPKALIVYGSTTGNTEYTAETIARELADAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRKFLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-ETTVVNVADVTAPGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPLYE-DLDRAGLKDKKFLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-DVELKNVTDVSVADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPLYD-SLENADLKGKKFLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-EVTLLNAADASAENLADGYDAVLFGCSAWGMEDLE------MQDDFLSLFE-EFNRFGLAGRKFLAV_CLOAB -MKISILYSSKTGKTERVAKLIEEGVKRSGNI-EVKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN---------ISWEMKKWID-ESSEFNLEGKLFLAV_MEGEL --MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-DVESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF-TDLAPKLKGKK4fxn ---MKIVYWSGTGNTEKMAELIAKGIIESG-K-DVNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI-EEISTKISGKKFLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT----LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS-ELDDVDFNGKLFLAV_AZOVI -AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD---ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP-KIEGLDFSGKT2fcr --KIGIFFSTSTGNTTEVADFIGKTLGAKADAP---IDVDDVTDPQALKD-YDLLFLGAPTWNTGADTERSGT----SWDEFLYDKLPEVDMKDLPFLAV_ENTAG MATIGIFFGSDTGQTRKVAKLIHQKLDGIADAP---LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN-TLSEADLTGKTFLAV_ECOLI -AITGIFFGSDTGNTENIAKMIQKQLGKDVAD----VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD-------WDDFFP-TLEEIDFNGKL3chy --ADKELKFLVVDDFSTMRRIVRNLLKELG----FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG-LELLKTIR--- . ... : . . :
1fx1 VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG----------------LRIDGDPRAARDDIVGWAHDVRGAI---------------FLAV_DESVH VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG----------------LRIDGDPRAARDDIVGWAHDVRGAI---------------FLAV_DESGI VGVFGCGDSSYTYF--CGAVDVIEKKAEELGATLVASS----------------LKIDGEPDSAE--VLDWAREVLARV---------------FLAV_DESSA VSVFGCGDSDYTYF--CGAVDAIEEKLEKMGAVVIGDS----------------LKIDGDPERDE--IVSWGSGIADKI---------------FLAV_DESDE VAAFASGDQEYEHF--CGAVPAIEERAKELGATIIAEG----------------LKMEGDASNDPEAVASFAEDVLKQL---------------FLAV_CLOAB GAAFSTANSIAGGS--DIALLTILNHLMVKGMLVYSGGVA----FGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF-----------FLAV_MEGEL VGLFGSYGWGSGE-----WMDAWKQRTEDTGATVIGTA----------------IVN-EMPDNAPECKE-LGEAAAKA----------------4fxn VALFGSYGWGDGK-----WMRDFEERMNGYGCVVVETP----------------LIVQNEPDEAEQDCIEFGKKIANI----------------FLAV_ANASP VAYFGTGDQIGYADNFQDAIGILEEKISQRGGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL------FLAV_AZOVI VALFGLGDQVGYPENYLDALGELYSFFKDRGAKIVGSWSTDGYEFESSEAVV-DGKFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL----2fcr VAIFGLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVR-DGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------FLAV_ENTAG VALFGLGDQLNYSKNFVSAMRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL-------FLAV_ECOLI VALFGCGDQEDYAEYFCDALGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA3chy AD--GAMSALPVL-----MVTAEAKKENIIAAAQAGAS----------------GYV-VKPFTAATLEEKLNKIFEKLGM-------------- . . : . .
![Page 46: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/46.jpg)
Flavodoxin-cheY: Pre-processing (prepro1500)1fx1 -PKALIVYGSTTGNT-EYTAETIARQLANAG-YEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLF-DSLEETGAQGRKVACFFLAV_DESDE MSKVLIVFGSSTGNT-ESIaQKLEELIAAGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLF-EEFNRFGLAGRKVAAfFLAV_DESVH MPKALIVYGSTTGNT-EYTaETIARELADAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLF-DSLEETGAQGRKVACfFLAV_DESSA MSKSLIVYGSTTGNT-ETAaEYVAEAFENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLY-DSLENADLKGKKVSVfFLAV_DESGI MPKALIVYGSTTGNT-EGVaEAIAKTLNSEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLY-EDLDRAGLKDKKVGVf2fcr --KIGIFFSTSTGNT-TEVADFIGKTLGA---KADAPIDVDDVTDPQALKDYDLLFLGAPTWNTG----ADTERSGTSWDEFLYDKLPEVDMKDLPVAIFFLAV_AZOVI -AKIGLFFGSNTGKT-RKVaKSIKKRFDDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFL-PKIEGLDFSGKTVALfFLAV_ENTAG MATIGIFFGSDTGQT-RKVaKLIHQKLDG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFT-NTLSEADLTGKTVALfFLAV_ANASP SKKIGLFYGTQTGKT-ESVaEIIRDEFGN---DVVTLHDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLY-SELDDVDFNGKLVAYfFLAV_ECOLI -AITGIFFGSDTGNT-ENIaKMIQKQLGK---DVADVHDIAKSS-KEDLEAYDILLLgIPTWYYGE--------AQCDWDDFF-PTLEEIDFNGKLVALf4fxn -MK--IVYWSGTGNT-EKMAELIAKGIIESG-KDVNTINVSDVNIDELL-NEDILILGCSAMGDEVL-------EESEFEPFI-EEIS-TKISGKKVALFFLAV_MEGEL MVE--IVYWSGTGNT-EAMaNEIEAAVKAAG-ADVESVRFEDTNVDDVA-SKDVILLgCPAMGSEEL-------EDSVVEPFF-TDLA-PKLKGKKVGLfFLAV_CLOAB -MKISILYSSKTGKT-ERVaKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQESEGIIFgTPTYYAN---------ISWEMKKWI-DESSEFNLEGKLGAAf3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFN--NVEEAEDGVDALNKLQAGGYGFVI---SDWNMPNM----------DGLELL-KTIRADGAMSALPVLM
T1fx1 GCGDS-SY-EYFCGA-VDAIEEKLKNLGAEIVQD---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI--------FLAV_DESDE ASGDQ-EY-EHFCGA-VPAIEERAKELgATIIAE---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL--------FLAV_DESVH GCGDS-SY-EYFCGA-VDAIEEKLKNLgAEIVQD---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI--------FLAV_DESSA GCGDS-DY-TYFCGA-VDAIEEKLEKMgAVVIGD---------------------SLKIDGD--PE--RDEIVSwGSGIADKI--------FLAV_DESGI GCGDS-SY-TYFCGA-VDVIEKKAEELgATLVAS---------------------SLKIDGE--PD--SAEVLDwAREVLARV--------2fcr GLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKS-VRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------FLAV_AZOVI GLGDQVGYPENYLDA-LGELYSFFKDRgAKIVGSWSTDGYEFESSEA-VVDGKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L--FLAV_ENTAG GLGDQLNYSKNFVSA-MRILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------FLAV_ANASP GTGDQIGYADNFQDA-IGILEEKISQRgGKTVGYWSTDGYDFNDSKA-LRNGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------FLAV_ECOLI GCGDQEDYAEYFCDA-LGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA4fxn G-----SY-GWGDGKWMRDFEERMNGYGCVVVET---------------------PLIVQNE--PDEAEQDCIEFGKKIANI---------FLAV_MEGEL G-----SY-GWGSGEWMDAWKQRTEDTgATVIGT----------------------AIVNEM--PDNA-PECKElGEAAAKA---------FLAV_CLOAB STANSIAGGSDIA---LLTILNHLMVKgMLVYSG----GVAFGKPKTHLGYVHINEIQENEDENARIfGERiANkVKQIF-----------3chy VTAEAKK--ENIIAA---------AQAGAS-------------------------GYVV-----KPFTAATLEEKLNKIFEKLGM------
GIteration 0 SP= 136944.00 AvSP= 10.675 SId= 4009 AvSId= 0.313
![Page 47: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/47.jpg)
Flavodoxin-cheY: Local Pre-processing(locprepro300)
1fx1 --PKALIVYGSTTGNTEYTAETIARQLANAGYEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPL--FDSLEETGAQGRKVACFFLAV_DESVH -MPKALIVYGSTTGNTEYTaETIARELADAGYEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPL--FDSLEETGAQGRKVACfFLAV_DESSA -MSKSLIVYGSTTGNTETAaEYVAEAFENKEIDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPL--YDSLENADLKGKKVSVfFLAV_DESGI -MPKALIVYGSTTGNTEGVaEAIAKTLNSEGMETTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPL--YEDLDRAGLKDKKVGVfFLAV_DESDE -MSKVLIVFGSSTGNTESIaQKLEELIAAGGHEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSL--FEEFNRFGLAGRKVAAf4fxn --MK--IVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLN-EDILILGCSAMGDEVL------E-ESEFEPF--IEEIS-TKISGKKVALFFLAV_MEGEL -MVE--IVYWSGTGNTEAMaNEIEAAVKAAGADVESVRFEDTNVDDVAS-KDVILLgCPAMGSEEL------E-DSVVEPF--FTDLA-PKLKGKKVGLf2fcr ---KIGIFFSTSTGNTTEVADFIGKTLGAKADAPI--DVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFL-YDKLPEVDMKDLPVAIFFLAV_ANASP -SKKIGLFYGTQTGKTESVaEIIRDEFGNDVVTLH--DVSQAEV-TDLNDYQYLIIgCPTWNIGEL--------QSDWEGL--YSELDDVDFNGKLVAYfFLAV_AZOVI --AKIGLFFGSNTGKTRKVaKSIKKRFDDETMSDA-LNVNRVSA-EDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEF--LPKIEGLDFSGKTVALfFLAV_ENTAG -MATIGIFFGSDTGQTRKVaKLIHQKLDG--IADAPLDVRRATR-EQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEF--TNTLSEADLTGKTVALfFLAV_ECOLI --AITGIFFGSDTGNTENIaKMIQKQLGKDVADVH--DIAKSSK-EDLEAYDILLLgIPTWYYGEA--------QCDWDDF--FPTLEEIDFNGKLVALfFLAV_CLOAB --MKISILYSSKTGKTERVaKLIEEGVKRSGNIEVKTMNLDAVDKKFLQESEGIIFgTPTYYA-----------NISWEMKKWIDESSEFNLEGKLGAAf3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGGYGFVI---SDWNMPNM----------DGLEL--LKTIRADGAMSALPVLM 1fx1 GCGDS--SY-EYFCGA-VD--AIEEKLKNLGAEIVQD---------------------GLRID--GDPRAARDDIVGWAHDVRGAI--------FLAV_DESVH GCGDS--SY-EYFCGA-VD--AIEEKLKNLgAEIVQD---------------------GLRID--GDPRAARDDIVGwAHDVRGAI--------FLAV_DESSA GCGDS--DY-TYFCGA-VD--AIEEKLEKMgAVVIGD---------------------SLKID--GDPE--RDEIVSwGSGIADKI--------FLAV_DESGI GCGDS--SY-TYFCGA-VD--VIEKKAEELgATLVAS---------------------SLKID--GEPD--SAEVLDwAREVLARV--------FLAV_DESDE ASGDQ--EY-EHFCGA-VP--AIEERAKELgATIIAE---------------------GLKME--GDASNDPEAVASfAEDVLKQL--------4fxn GS------Y-GWGDGKWMR--DFEERMNGYGCVVVET---------------------PLIVQ--NEPDEAEQDCIEFGKKIANI---------FLAV_MEGEL GS------Y-GWGSGEWMD--AWKQRTEDTgATVIGT---------------------AI-VN--EMPDNA-PECKElGEAAAKA---------2fcr GLGDAE-GYPDNFCDA-IE--EIHDCFAKQGAKPVGFSNPDDYDYEESKSVRD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------FLAV_ANASP GTGDQI-GYADNFQDA-IG--ILEEKISQRgGKTVGYWSTDGYDFNDSKALRN-GKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------FLAV_AZOVI GLGDQV-GYPENYLDA-LG--ELYSFFKDRgAKIVGSWSTDGYEFESSEAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L--FLAV_ENTAG GLGDQL-NYSKNFVSA-MR--ILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------FLAV_ECOLI GCGDQE-DYAEYFCDA-LG--TIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNAFLAV_CLOAB STANSIAGGSDIALLTILNHLMVKgMLVYSGGVAFGKPKTHLGYVH----------INEIQENEDENARIfGERiANkVKQIF-----------3chy VTAEA---KKENIIAA-----------AQAGAS-------------------------GYVVK-----PFTAATLEEKLNKIFEKLGM------
G
![Page 48: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/48.jpg)
• Profile pre-processing• Secondary structure-induced alignment• Homology-extended alignment• Matrix extension
Objective: integrate secondary structure information to anchor alignments and avoid errors (“Structure more conserved than sequence”)
Strategies for multiple sequence alignment
![Page 49: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/49.jpg)
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
PRIMARY STRUCTURE (amino acid sequence)
QUATERNARY STRUCTURE (oligomers)
SECONDARY STRUCTURE (helices, strands)
TERTIARY STRUCTURE (fold)
Protein structure hierarchical levels
![Page 50: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/50.jpg)
Why use (predicted) structural information
• “Structure more conserved than sequence”– Many structural protein families (e.g. globins) have family members
with very low sequence similarities. For example, globin sequences identities can be as low as 10% while still having an identical fold.
• This means that you can still observe equivalent secondary structures in homologous proteins even if sequence similarities are extremely low.
• But you are dependent on the quality of prediction methods. For example, secondary structure prediction is currently at 76% correctness. So, 1 out of 4 predicted amino acids is still incorrect.
![Page 51: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/51.jpg)
C5 anaphylatoxin -- human (PDB code 1kjs) and pig (1c5a)) proteins are superposed
Two superposed protein structures with two well-superposed helices
Red: well superposed
Blue: low match quality
![Page 52: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/52.jpg)
How to combine ss and aa info
Dynamic programmingsearch matrix
Amino acid substitution
matricesMDAGSTVILCFVHHHCCCEEEEEE
MDAASTILCGS
HHHHCCEEECC
C
H
E
H C
E Default
![Page 53: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/53.jpg)
In terms of scoring…
• So how would you score a profile using this extra information?– Same formula as in lecture 6, but you can use sec.
struct. specific substitution scores in various combinations.
• Where does it fit in?– Very important: structure is more conserved than
sequence so if structures have forgotten how to match (I.e. they are too divergent), the secondary structure elements might help the alignment.
![Page 54: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/54.jpg)
Sequences to be aligned
Predict secondary structure
HHHHCCEEECCCEEECCHH
HHHCCCCEECCCEEHHH
HHHHHHHHHHHHHCCCEEEE
CCCCCCEECCCEEEECCHH
HHHHHCCEEEECCCEECCC
Align sequences using secondary structure
Secondary structure
Multiple alignment
![Page 55: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/55.jpg)
Using predicted secondary structure1fx1 -PK-ALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACF e eeee b ssshhhhhhhhhhhhhhttt eeeee stt tttttt seeee b ee sss ee ttthhhhtt ttss tt eeeeeFLAV_DESVH MPK-ALIVYGSTTGNTEYTaETIARELADAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACf e eeeeee hhhhhhhhhhhhhhh eeeeee eeeeee hhhhhh eeeeeFLAV_DESGI MPK-ALIVYGSTTGNTEGVaEAIAKTLNSEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLYED-LDRAGLKDKKVGVf e eeeeee hhhhhhhhhhhhhh eeeeee hhhhhh eeeeeee hhhhhh eeeeeeFLAV_DESSA MSK-SLIVYGSTTGNTETAaEYVAEAFENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLYDS-LENADLKGKKVSVf eeeeee hhhhhhhhhhhhhh eeeee eeeee hhhhhhh h eeeeeFLAV_DESDE MSK-VLIVFGSSTGNTESIaQKLEELIAAGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLFEE-FNRFGLAGRKVAAf eeee hhhhhhhhhhhhhh eeeee hhhhhhhhhhheeeee hhhhhhh hh eeeee2fcr --K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKDLPVAIF eeeee ssshhhhhhhhhhhhhggg b eeggg s gggggg seeeeeee stt s s s sthhhhhhhtggg tt eeeeeFLAV_ANASP SKK-IGLFYGTQTGKTESVaEIIRDEFGND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLYSE-LDDVDFNGKLVAYf eeeee hhhhhhhhhhhh eee hhh hhhhhhheeeeee hhhhhhhhh eeeeeeFLAV_ECOLI -AI-TGIFFGSDTGNTENIaKMIQKQLGKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA--------QCDWDDFFPT-LEEIDFNGKLVALf eee hhhhhhhhhhhh eee hhh hhhhhhheeeee hhhhh eeeeeeFLAV_AZOVI -AK-IGLFFGSNTGKTRKVaKSIKKRFDDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFLPK-IEGLDFSGKTVALf eee hhhhhhhhhhhhh hhh hhhhhhheeeee hhhhhhhhh eeeeeeFLAV_ENTAG MAT-IGIFFGSDTGQTRKVaKLIHQKLDG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFTNT-LSEADLTGKTVALf eeee hhhhhhhhhhhh hhh hhhhhhheeeee hhhhh eeeee4fxn ----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVNIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KISGKKVALF eeeee ssshhhhhhhhhhhhhhhtt eeeettt sttttt seeeeee btttb ttthhhhhhh hst t tt eeeeeFLAV_MEGEL M---VEIVYWSGTGNTEAMaNEIEAAVKAAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSVVEPFFTD-LAP-KLKGKKVGLf hhhhhhhhhhhhhh eeeee hhhhhhhh eeeee eeeeeFLAV_CLOAB M-K-ISILYSSKTGKTERVaKLIEEGVKRSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWEMKKWIDE-SSEFNLEGKLGAAf eee hhhhhhhhhhhhhh eeeeee hhhhhhhhhh eeee hhhhhhhhh eeeee3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DALNKLQAGGYGFVISD---WNMPNM----------DGLELLKTIRADGAMSALPVLMV tt eeee s hhhhhhhhhhhhhht eeeesshh hhhhhhhh eeeee s sss hhhhhhhhhh ttttt eeee 1fx1 GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI-------- eee s ss sstthhhhhhhhhhhttt ee s eeees gggghhhhhhhhhhhhhhFLAV_DESVH GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------- eee hhhhhhhhhhhh eeeee eeeee hhhhhhhhhhhhhhFLAV_DESGI GCGDS-SY-TYFCGAVDVIEKKAEELgATLVAS---------------------SLKIDGE--P--DSAEVLDwAREVLARV-------- eee hhhhhhhhhhhh eeeee hhhhhhhhhhhFLAV_DESSA GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD---------------------SLKIDGD--P--ERDEIVSwGSGIADKI-------- hhhhhhhhhhhh eeeee e eeeFLAV_DESDE ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL-------- e hhhhhhhhhhhhhh eeeee ee hhhhhhhhhhh2fcr GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ eee ttt ttsttthhhhhhhhhhhtt eee b gggs s tteet teesseeeettt ss hhhhhhhhhhhhhhhhtFLAV_ANASP GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------ hhhhhhhhhhhhhh eeee hhhhhhhhhhhhhhhhFLAV_ECOLI GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA hhhhhhhhhhhhhh eeee hhhhhhhhhhhhhhhhhhFLAV_AZOVI GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L-- e hhhhhhhhhhhhhh eeeee hhhhhhhhhhhFLAV_ENTAG GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------ hhhhhhhhhhhhhhh eeee hhhhhhh hhhhhhhhhhhh4fxn G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------------------PLIVQNE--PDEAEQDCIEFGKKIANI--------- e eesss shhhhhhhhhhhhtt ee s eeees ggghhhhhhhhhhhhtFLAV_MEGEL G-----SYGWGSGEWMDAWKQRTEDTgATVIGT----------------------AIVNEM--PDNAPE-CKElGEAAAKA--------- hhhhhhhhhhh eeeee eeee h hhhhhhhhFLAV_CLOAB STANSIA-GGSDIALLTILNHLMVK-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfGERiANkV--KQIF-- hhhhhhhhhhhhhh eeeee hhhh hhh hhhhhhhhhhhh h3chy -----------TAEAKKENIIAAAQAGASGY-------------------------VVK----P-FTAATLEEKLNKIFEKLGM------ ess hhhhhhhhhtt see ees s hhhhhhhhhhhhhhht
G
![Page 56: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/56.jpg)
• Profile pre-processing• Secondary structure-induced alignment• Homology-extended alignment • Matrix extension
Strategies for multiple sequence alignment
![Page 57: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/57.jpg)
PSI-PRALINE
Multiple alignment of distant sequences using PSI-BLAST
![Page 58: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/58.jpg)
Distant sequences
• Methyltransferase (16.7% sequence identity)• Same / fold
1GZ0A SEIYGIHAVQALLERAPERFQEVFILKGREDKRL LPLIHALESQGVVIQLANRQYLDEKSDGAVHQG IIARVKPGRQ
1IPAA MRITSTANPRIKELARLLERKHRDSQRRFLIEGAREIERALQAGIELEQALVWEGGLNPEEQQVYAALLALLEVSEAVLKKLSVRDNPAGLIALARMPER
![Page 59: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/59.jpg)
How well do alignment methods perform
• Normal pair-wise alignment • 10% correctly aligned positions
• T-COFFEE (Notredame et al., 2000)• 15% correctly aligned positions
• MUSCLE (Edgar, 2004)• 15% correctly aligned positions
![Page 60: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/60.jpg)
Profile-profile alignment
• Homology detection has provided a solution
• BLAST (sequence-sequence)• PSI-BLAST (profile-sequence)• New generation methods (profile-profile)New generation methods (profile-profile)
![Page 61: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/61.jpg)
State-of-the-art homology detection
• Sequence used to scan database and collect homologous information (PSI-BLAST)– Local pair-wise alignment
• PSI-BLAST profile used to scan profiles of other sequences– Local pair-wise alignment
• The difference between methods is the profile-profile scoring scheme used for the alignment
![Page 62: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/62.jpg)
The PRALINE way • Sequence used to scan database and collect homologous
information (PSI-BLAST)– Local alignment
• PSI-BLAST profiles are used for the alignment instead of the original sequences– Global, pair-wise OR multiple alignmentGlobal, pair-wise OR multiple alignment
• PRALINE scores 92%92% correctly aligned positions on the previous example (methyltransferase)
![Page 63: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/63.jpg)
-5
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60 70 80 90 100sequence identity (%)
DQ a
ccur
acy
(%)
PRALINET-COFFEE v2.03MUSCLE v3.51ALICAO
PSI
Pair-wise alignment
![Page 64: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/64.jpg)
Multiple alignment
-15
-10
-5
0
5
10
15
20
0 10 20 30 40 50 60 70 80 90 100
sequence identity (%)
D CS
accu
racy
(%)
PRALINEPRALINET-COFFEE v2.03MUSCLE v3.51
PSI
PREPRO
![Page 65: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/65.jpg)
Example: methyltransferases
![Page 66: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/66.jpg)
A
B
0%
20%
40%
60%
80%
100%
0 10 5 1 e-01 e-02 e-03 e-06 inc max
e-value threshold
Improved Equal Worse
-0.05
0
0.05
0.1
0.15
0.2
0 10 5 1 e-01 e-02 e-03 e-06 inc maxe-value threshold
D Q a
ccur
acy
0-100% 0-30% 30-60% 60-100%
The effects of using E-value thresholds of increasing stringency in PRALINEPSI on the 624 HOMSTRAD pairwise alignments.
(A) The difference between the average Q scores of PRALINEPSI and the basic PRALINE method
(B) The distributions of improved, equal and worsened cases compared with the basic PRALINE method for each E-value threshold.
The ‘inc’ column is thePRALINEPSI incremental strategy starting from a threshold of 10-6, and the ‘max’ column is PRALINEPSI’s theoretical upper limit for the tested threshold range.
![Page 67: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/67.jpg)
• Profile pre-processing• Secondary structure-induced
alignment• Homology-extended alignment• Matrix extension
Objective: try to avoid (early) errors
Strategies for multiple sequence alignment
![Page 68: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/68.jpg)
Integrating alignment methods and alignment information with
T-Coffee• Integrating different pair-wise alignment
techniques (NW, SW, ..)• Combining different multiple alignment
methods (consensus multiple alignment)• Combining sequence alignment methods
with structural alignment techniques• Plug in user knowledge
![Page 69: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/69.jpg)
Matrix extension
T-CoffeeTree-based Consistency Objective Function
For alignmEnt Evaluation
Cedric NotredameDes HigginsJaap Heringa J. Mol. Biol., J. Mol. Biol., 302, 205-217302, 205-217;2000;2000
![Page 70: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/70.jpg)
Using different sources of alignment information
Clustal
Dialign
Clustal
Lalign
Structure alignments
Manual
T-Coffee
![Page 71: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/71.jpg)
T-Coffee library system
Seq1 AA1 Seq2 AA2 Weight
3 V31 5 L33 103 V31 6 L34 14
5 L33 6 R35 215 l33 6 I36 35
![Page 72: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/72.jpg)
Matrix extensionMatrix extension1
2
1 3
1 4
2 3
2 4
3 4
![Page 73: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/73.jpg)
Search matrix extension – alignment transitivity
![Page 74: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/74.jpg)
T-Coffee
Direct alignment
Other sequences
![Page 75: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/75.jpg)
Search matrix extension
![Page 76: 1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment](https://reader035.vdocument.in/reader035/viewer/2022070502/56814c61550346895db985c5/html5/thumbnails/76.jpg)
but.....T-COFFEE (V1.23) multiple sequence alignmentFlavodoxin-cheY1fx1 ----PKALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK-----FLAV_DESVH ---MPKALIVYGSTTGNTEYTAETIARELADAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK-----FLAV_DESGI ---MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-METTVVNVADVT-APGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPL-YEDLDRAGLKDKK-----FLAV_DESSA ---MSKSLIVYGSTTGNTETAAEYVAEAFENKE-IDVELKNVTDVS-VADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPL-YDSLENADLKGKK-----FLAV_DESDE ---MSKVLIVFGSSTGNTESIAQKLEELIAAGG-HEVTLLNAADAS-AENLADGYDAVLFGCSAWGMEDLE------MQDDFLSL-FEEFNRFGLAGRK-----4fxn ------MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVN-IDELL-NEDILILGCSAMGDEVLE-------ESEFEPF-IEEIS-TKISGKK-----FLAV_MEGEL -----MVEIVYWSGTGNTEAMANEIEAAVKAAG-ADVESVRFEDTN-VDDVA-SKDVILLGCPAMGSEELE-------DSVVEPF-FTDLA-PKLKGKK-----FLAV_CLOAB ----MKISILYSSKTGKTERVAKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN---------ISWEMKKW-IDESSEFNLEGKL-----2fcr -----KIGIFFSTSTGNTTEVADFIGKTLGAKA---DAPIDVDDVTDPQAL-KDYDLLFLGAPTWNTGA----DTERSGTSWDEFLYDKLPEVDMKDLP-----FLAV_ENTAG ---MATIGIFFGSDTGQTRKVAKLIHQKLDGIA---DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQEF-TNTLSEADLTGKT-----FLAV_ANASP ---SKKIGLFYGTQTGKTESVAEIIRDEFGNDV---VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL--------QSDWEGL-YSELDDVDFNGKL-----FLAV_AZOVI ----AKIGLFFGSNTGKTRKVAKSIKKRFDDET-M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEEF-LPKIEGLDFSGKT-----FLAV_ECOLI ----AITGIFFGSDTGNTENIAKMIQKQLGKDV---ADVHDIAKSS-KEDL-EAYDILLLGIPTWYYGEA--------QCDWDDF-FPTLEEIDFNGKL-----3chy ADKELKFLVVD--DFSTMRRIVRNLLKELGFN-NVE-EAEDGVDALNKLQ-AGGYGFVISDWNMPNMDGLE--------------LLKTIRADGAMSALPVLMV :. . . : . :: 1fx1 ---------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI--------FLAV_DESVH ---------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI--------FLAV_DESGI ---------VGVFGCGDSS--YTYFCGA-VDVIEKKAEELGATLVASS---------------------LKIDGEPDSA----EVLDWAREVLARV--------FLAV_DESSA ---------VSVFGCGDSD--YTYFCGA-VDAIEEKLEKMGAVVIGDS---------------------LKIDGDPE----RDEIVSWGSGIADKI--------FLAV_DESDE ---------VAAFASGDQE--YEHFCGA-VPAIEERAKELGATIIAEG---------------------LKMEGDASND--PEAVASFAEDVLKQL--------4fxn ---------VALFGS------YGWGDGKWMRDFEERMNGYGCVVVETP---------------------LIVQNEPD--EAEQDCIEFGKKIANI---------FLAV_MEGEL ---------VGLFGS------YGWGSGEWMDAWKQRTEDTGATVIGTA---------------------IV--NEMP--DNAPECKELGEAAAKA---------FLAV_CLOAB ---------GAAFSTANSI--AGGSDIA-LLTILNHLMVKGMLVY----SGGVAFGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF-----------2fcr ---------VAIFGLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDG-KFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------FLAV_ENTAG ---------VALFGLGDQLNYSKNFVSA-MRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL-------FLAV_ANASP ---------VAYFGTGDQIGYADNFQDA-IGILEEKISQRGGKTVGYWSTDGYDFNDSKALRNG-KFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL------FLAV_AZOVI ---------VALFGLGDQVGYPENYLDA-LGELYSFFKDRGAKIVGSWSTDGYEFESSEAVVDG-KFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL----FLAV_ECOLI ---------VALFGCGDQEDYAEYFCDA-LGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA3chy TAEAKKENIIAAAQAGASGYVVKPFT---AATLEEKLNKIFEKLGM---------------------------------------------------------- .