bioinformatics methods course multiple sequence alignment burkhard morgenstern university of...
TRANSCRIPT
![Page 1: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/1.jpg)
Bioinformatics Methods Course
Multiple Sequence Alignment
Burkhard Morgenstern
University of GöttingenInstitute of Microbiology and Genetics
Department of Bioinformatics
Göttingen, October/November 2006
![Page 2: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/2.jpg)
Tools for multiple sequence alignment
T Y I M R E A Q Y E
T C I V M R E A Y E
![Page 3: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/3.jpg)
Tools for multiple sequence alignment
T Y I - M R E A Q Y E
T C I V M R E A - Y E
![Page 4: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/4.jpg)
Tools for multiple sequence alignment
T Y I M R E A Q Y E
T C I V M R E A Y E
Y I M Q E V Q Q E
Y I A M R E Q Y E
![Page 5: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/5.jpg)
Tools for multiple sequence alignment
T Y I - M R E A Q Y E
T C I V M R E A - Y E
Y - I - M Q E V Q Q E
Y – I A M R E - Q Y E
![Page 6: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/6.jpg)
Tools for multiple sequence alignment
T Y I - M R E A Q Y E
T C I V M R E A - Y E
- Y I - M Q E V Q Q E
Y – I A M R E - Q Y E
Astronomical Number of possible alignments!
![Page 7: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/7.jpg)
Tools for multiple sequence alignment
T Y I - M R E A Q Y E
T C I V - M R E A Y E
- Y I - M Q E V Q Q E
Y – I A M R E - Q Y E
Astronomical Number of possible alignments!
![Page 8: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/8.jpg)
Tools for multiple sequence alignment
T Y I - M R E A Q Y E
T C I V M R E A - Y E
- Y I - M Q E V Q Q E
Y – I A M R E - Q Y E
Which one is the best ???
![Page 9: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/9.jpg)
Tools for multiple sequence alignment
Questions in development of alignment programs:
(1) What is a good alignment?
→ objective function (`score’)
(2) How to find a good alignment?
→ optimization algorithm
First question far more important !
![Page 10: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/10.jpg)
Tools for multiple sequence alignment
Before defining an objective function (scoring scheme)
What is a biologically good alignment ??
![Page 11: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/11.jpg)
Tools for multiple sequence alignment
Criteria for alignment quality:
1. 3D-Structure: align residues at corresponding positions in 3D structure of protein!
2. Evolution: align residues with common ancestors!
![Page 12: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/12.jpg)
Tools for multiple sequence alignment
T Y I - M R E A Q Y E
T C I V - M R E A Y E
- Y I - M Q E V Q Q E
- Y I A M R E - Q Y E
Alignment hypothesis about sequence evolution
Search for most plausible hypothesis!
![Page 13: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/13.jpg)
Tools for multiple sequence alignment
Compute for amino acids a and b
Probability pa,b of substitution
a → b (or b → a), Frequency qa of a
Define
s(a,b) = log (pa,b / qa qb)
![Page 14: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/14.jpg)
Tools for multiple sequence alignment
![Page 15: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/15.jpg)
![Page 16: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/16.jpg)
Tools for multiple sequence alignment
Traditional objective functions:
Define Score of alignments as
Sum of individual similarity scores s(a,b) Gap penalty g for each gap in alignment
Needleman-Wunsch scoring system (1970) for pairwise alignment (= alignment of two sequences)
![Page 17: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/17.jpg)
T Y W I V
T - - L V
Example:
Score = s(T,T) + s(I,L) + s (V,V) – 2 g
![Page 18: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/18.jpg)
T Y W I V
T - - L V
Idea: alignment with optimal (maximal) score probably biologically meaningful.
Dynamic programming algorithm finds optimal alignment for two sequences efficiently (Needleman and Wunsch, 1970).
![Page 19: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/19.jpg)
Tools for multiple sequence alignment
Traditional Objective functions can be generalized to multiple alignment (e.g. sum-of-pair score, tree alignment)
Needleman-Wunsch algorithm can also be generalized to find optimal multiple alignment, but:
Very time and memory consuming!
-> Heuristic algorithm needed, i.e. fast but sub-optimal solution
![Page 20: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/20.jpg)
Tools for multiple sequence alignment
Most commonly used heuristic for multiple alignment:
Progressive alignment
(mid 1980s)
![Page 21: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/21.jpg)
`Progressive´ Alignment
WCEAQTKNGQGWVPSNYITPVN
WWRLNDKEGYVPRNLLGLYP
AVVIQDNSDIKVVPKAKIIRD
YAVESEAHPGSFQPVAALERIN
WLNYNETTGERGDFPGTYVEYIGRKKISP
![Page 22: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/22.jpg)
`Progressive´ Alignment
WCEAQTKNGQGWVPSNYITPVN
WWRLNDKEGYVPRNLLGLYP
AVVIQDNSDIKVVPKAKIIRD
YAVESEAHPGSFQPVAALERIN
WLNYNETTGERGDFPGTYVEYIGRKKISP
Guide tree
![Page 23: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/23.jpg)
`Progressive´ Alignment
WCEAQTKNGQGWVPSNYITPVN
WW--RLNDKEGYVPRNLLGLYP-
AVVIQDNSDIKVVP--KAKIIRD
YAVESEASFQPVAALERIN
WLNYNEERGDFPGTYVEYIGRKKISP
Profile alignment, “once a gap - always a gap”
![Page 24: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/24.jpg)
`Progressive´ Alignment
WCEAQTKNGQGWVPSNYITPVN
WW--RLNDKEGYVPRNLLGLYP-
AVVIQDNSDIKVVP--KAKIIRD
YAVESEASVQ--PVAALERIN------
WLN-YNEERGDFPGTYVEYIGRKKISP
Profile alignment, “once a gap - always a gap”
![Page 25: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/25.jpg)
`Progressive´ Alignment
WCEAQTKNGQGWVPSNYITPVN-
WW--RLNDKEGYVPRNLLGLYP-
AVVIQDNSDIKVVP--KAKIIRD
YAVESEASVQ--PVAALERIN------
WLN-YNEERGDFPGTYVEYIGRKKISP
Profile alignment, “once a gap - always a gap”
![Page 26: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/26.jpg)
`Progressive´ Alignment
WCEAQTKNGQGWVPSNYITPVN--------
WW--RLNDKEGYVPRNLLGLYP--------
AVVIQDNSDIKVVP--KAKIIRD-------
YAVESEA---SVQ--PVAALERIN------
WLN-YNE---ERGDFPGTYVEYIGRKKISP
Profile alignment, “once a gap - always a gap”
![Page 27: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/27.jpg)
CLUSTAL W
Most important software program:
CLUSTAL W:
J. Thompson, T. Gibson, D. Higgins (1994), CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment … Nuc. Acids. Res. 22, 4673 - 4680
(~ 20.000 citations in the literature)
![Page 28: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/28.jpg)
Tools for multiple sequence alignment
Problems with traditional approach:
Results depend on gap penalty
Heuristic guide tree determines alignment;
alignment used for phylogeny reconstruction
Algorithm produces global alignments.
![Page 29: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/29.jpg)
Tools for multiple sequence alignment
Problems with traditional approach:
But:
Many sequence families share only local similarity
E.g. sequences share one conserved motif
![Page 30: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/30.jpg)
Local sequence alignment
Find common motif in sequences; ignore the rest
EYENS
ERYENS
ERYAS
![Page 31: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/31.jpg)
Local sequence alignment
Find common motif in sequences; ignore the rest
E-YENS
ERYENS
ERYA-S
![Page 32: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/32.jpg)
Local sequence alignment
Find common motif in sequences; ignore the rest – Local alignment
E-YENSERYENSERYA-S
![Page 33: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/33.jpg)
Gibbs Motive Sampler
Local multiple alignment without gaps:
C.E. Lawrence et al. (1993)Detecting subtle sequence signals: a Gibbs Sampling Strategy for Multiple AlignmentScience, 262, 208 - 214
![Page 34: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/34.jpg)
Traditional alignment approaches:
Either global or local methods!
![Page 35: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/35.jpg)
New question: sequence families with multiple local similarities
Neither local nor global methods appliccable
![Page 36: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/36.jpg)
New question: sequence families with multiple local similarities
Alignment possible if order conserved
![Page 37: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/37.jpg)
The DIALIGN approach
Morgenstern, Dress, Werner (1996),PNAS 93, 12098-12103
Combination of global and local methods
Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments“)
![Page 38: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/38.jpg)
The DIALIGN approach
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 39: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/39.jpg)
The DIALIGN approach
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 40: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/40.jpg)
The DIALIGN approach
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 41: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/41.jpg)
The DIALIGN approach
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 42: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/42.jpg)
The DIALIGN approach
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 43: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/43.jpg)
The DIALIGN approach
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 44: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/44.jpg)
The DIALIGN approach
atc------taatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 45: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/45.jpg)
The DIALIGN approach
atc------taatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaa--gagtatcacccctgaattgaataa
![Page 46: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/46.jpg)
The DIALIGN approach
atc------taatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaa--gagtatcacc----------cctgaattgaataa
![Page 47: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/47.jpg)
The DIALIGN approach
atc------taatagttaaactcccccgtgc-ttag
cagtgcgtgtattactaac----------gg-ttcaatcgcg
caaa--gagtatcacc----------cctgaattgaataa
![Page 48: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/48.jpg)
The DIALIGN approach
atc------taatagttaaactcccccgtgc-ttag
cagtgcgtgtattactaac----------gg-ttcaatcgcg
caaa--gagtatcacc----------cctgaattgaataa
Consistency!
![Page 49: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/49.jpg)
The DIALIGN approach
atc------TAATAGTTAaactccccCGTGC-TTag
cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg
caaa--GAGTATCAcc----------CCTGaaTTGAATaa
![Page 50: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/50.jpg)
The DIALIGN approach
Multiple alignment:
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 51: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/51.jpg)
The DIALIGN approach
Multiple alignment:
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaccctgaattgaagagtatcacataa
(1) Calculate all optimal pair-wise alignments
![Page 52: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/52.jpg)
The DIALIGN approach
Multiple alignment:
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
(1) Calculate all optimal pair-wise alignments
![Page 53: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/53.jpg)
The DIALIGN approach
Multiple alignment:
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
(1) Calculate all optimal pair-wise alignments
![Page 54: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/54.jpg)
The DIALIGN approach
Fragments from optimal pair-wise alignments might be inconsistent
![Page 55: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/55.jpg)
The DIALIGN approach
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 56: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/56.jpg)
The DIALIGN approach
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 57: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/57.jpg)
The DIALIGN approach
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 58: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/58.jpg)
The DIALIGN approach
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaa--gagtatcacccctgaattgaataa
![Page 59: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/59.jpg)
The DIALIGN approach
atc------taatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaa--gagtatcacccctgaattgaataa
![Page 60: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/60.jpg)
The DIALIGN approach
atctaatagttaaactcccccgtgcttag
cagtgcgtgtattactaacggttcaatcgcg
caaagagtatcacccctgaattgaataa
![Page 61: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/61.jpg)
The DIALIGN approach
Score of alignment:
Define weight score for fragments based on probability of random occurrence
Score of alignment = sum of weight scores of fragments
Goal: find consistent set of fragments with maximum total weight
![Page 62: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/62.jpg)
The DIALIGN approach
Advantages of segment-based approach:
Program can produce global and local alignments!
Sequence families alignable that cannot be aligned with standard methods
![Page 63: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/63.jpg)
T-COFFEE
C. Notredame, D. Higgins, J. Heringa (2000), T-Coffee: A novel algorithm for multiple sequence alignment, J. Mol. Biol.
![Page 64: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/64.jpg)
![Page 65: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/65.jpg)
![Page 66: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/66.jpg)
![Page 67: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/67.jpg)
T-COFFEE
T-COFFEE Less sensitive to spurious pairwise similarities Can handle local homologies better than CLUSTAL
![Page 68: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/68.jpg)
T-COFFEE
T-COFFEE
Idea:
1. Build library of pairwise alignments
2. Alignment from seq i, j and seq j, k supports alignmetn from seq i, k.
![Page 69: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/69.jpg)
Evaluation of multi-alignment methods
Alignment evaluation by comparison to trusted benchmark alignments.
`True’ alignment known by information about structure or evolution.
![Page 70: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/70.jpg)
Evaluation of multi-alignment methods
For protein alignment:
M. McClure et al. (1994):
4 protein families, known functional sites
J. Thompson et al. (1999):
Benchmark data base, 130 known 3D structures (BAliBASE)
T. Lassmann & E. Sonnhammer (2002): BAliBASE + simulated evolution (ROSE)
![Page 71: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/71.jpg)
Evaluation of multi-alignment methods
![Page 72: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/72.jpg)
1aboA 1 .NLFVALYDfvasgdntlsitkGEKLRVLgynhn..............gE 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslvalgfsdgqearpeeiG 1ihvA 1 .NFRVYYRDsrd......pvwkGPAKLLWkg.................eG 1vie 1 .drvrkksga.........awqGQIVGWYctnlt.............peG
1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
Key
alpha helix RED beta strand GREEN core blocks UNDERSCORE BAliBASE
Reference alignments
Evaluation of multi-alignment methods
![Page 73: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/73.jpg)
![Page 74: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/74.jpg)
Result: DIALIGN best method for distantly related sequences, T-Coffee best for globally related proteins
![Page 75: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/75.jpg)
Evaluation of multi-alignment methods
BAliBASE: 5 categories of benchmark sequences
(globally related, internal gaps, end gaps)
CLUSTAL W, T-COFFEE, MAFFT, PROBCONS perform well on globally related sequences, DIALIGN superior for local similarities
![Page 76: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/76.jpg)
Evaluation of multi-alignment methods
Conclusion: no single best multi alignment program!
Advice: try different methods!
![Page 77: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/77.jpg)
Anchored sequence alignment
Idea: semi-automatic alignment
use expert knowledge to define constraints instead of fully automated alignment
Define parts of the sequences where biologically correct alignment is known as anchor points, align rest of the sequences automatically.
![Page 78: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/78.jpg)
Anchored sequence alignment
NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN
IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT
GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS
![Page 79: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/79.jpg)
Anchored sequence alignment
NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN
IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT
GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS
Anchor points in multiple alignment
![Page 80: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/80.jpg)
Anchored sequence alignment
NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN
IIHREDKGVIYALWDYEPQND DELPMKEGDCMT
GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS
Anchor points in multiple alignment
![Page 81: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/81.jpg)
Anchored sequence alignment
-------NLF V-ALYDFVAS GD-------- NTLSITKGEk lrvLGYNhn
iihredkGVI Y-ALWDYEPQ ND-------- DELPMKEGDC MT-------
-------GYQ YrALYDYKKE REedidlhlg DILTVNKGSL VA-LGFS--
Anchored multiple alignment
![Page 82: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/82.jpg)
Algorithmic questions
Goal:
Find optimal alignment (=consistent set of fragments) under costraints given by user-specified anchor points!
![Page 83: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/83.jpg)
Additional input file with anchor points:
1 3 215 231 5 4.5
2 3 34 78 23 1.23
1 4 317 402 8 8.5
Algorithmic questions
![Page 84: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/84.jpg)
Algorithmic questions
NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMTGYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS
![Page 85: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/85.jpg)
Additional input file with anchor points:
1 3 215 231 5 4.5
2 3 34 78 23 1.23
1 4 317 402 8 8.5
Algorithmic questions
![Page 86: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/86.jpg)
Additional input file with anchor points:
1 3 215 231 5 4.5
2 3 34 78 23 1.23
1 4 317 402 8 8.5
Sequences
Algorithmic questions
![Page 87: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/87.jpg)
Additional input file with anchor points:
1 3 215 231 5 4.5
2 3 34 78 23 1.23
1 4 317 402 8 8.5
Sequences start positions
Algorithmic questions
![Page 88: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/88.jpg)
Additional input file with anchor points:
1 3 215 231 5 4.5
2 3 34 78 23 1.23
1 4 317 402 8 8.5
Sequences start positions length
Algorithmic questions
![Page 89: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department](https://reader034.vdocument.in/reader034/viewer/2022051819/55162b21550346a2308b5dcd/html5/thumbnails/89.jpg)
Additional input file with anchor points:
1 3 215 231 5 4.5
2 3 34 78 23 1.23
1 4 317 402 8 8.5
Sequences start positions length score
Algorithmic questions