tm-coffee : accurate multiple sequence alignment of transmembrane proteins with psi-coffee
DESCRIPTION
Chang, J-M, P Di Tommaso, JF Taly, C Notredame. 2012. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.TRANSCRIPT
![Page 1: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/1.jpg)
“Homology-enhanced probabilistic consistency” multiple sequence alignment :
a case study on transmembrane protein
Jia-Ming Chang
2013-July-09
Chang, J-M, P Di Tommaso, J-Fß Taly, C Notredame. 2012. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.
![Page 2: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/2.jpg)
Transmembrane proteinMembrane proteins are likely to constitute 20-30% of all ORFs contained in genomes.
Odorant receptors
Richard Benton, “Eppendorf winner. Evolution and revolution in odor detection,” Science (New York, N.Y.) 326, no. 5951 (October 16, 2009): 382-383.
![Page 3: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/3.jpg)
Transmembrane protein multiple sequence alignment
• 1994 first address alignment for transmembrane proteins
– Cserzo M, Bernassau JM, Simon I, Maigret B: New alignment strategy for
transmembrane proteins. J Mol Biol 1994, 243(3):388-396.
• Few multiple sequence alignment software till now => 3
– ShafrirY, Guy HR: STAM: simple transmembrane alignment method.
Bioinformatics 2004, 20(5):758-769.
– Forrest LR, Tang CL, Honig B: On the accuracy of homology modeling and
sequence alignment methods applied to membrane proteins. Biophys J 2006,
91(2):508-517.
– Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for improved
multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492-
497.
![Page 4: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/4.jpg)
BAliBASE 2.0 reference 7
Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492-497.
![Page 5: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/5.jpg)
We need an accurate Transmembrane MSA!
![Page 6: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/6.jpg)
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.
![Page 7: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/7.jpg)
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.
![Page 8: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/8.jpg)
Pair-hidden Markov Model
Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 2005, 15(2):330-340.
Emission probabilities, which correspond to traditional substitution scores, are based on the BLOSUM62 matrix.
![Page 9: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/9.jpg)
Probabilistic consistency transformation
![Page 10: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/10.jpg)
Homology-extended probabilistic consistency
New emission probabilities are like the following.
20 20
)..,..(),('m n
nmnmji AAAApyxp
where αm is the frequency with which residue m appears at position i and βn is the frequency with which residue n appears at position j; p(A.A.m, A.A.n) is the original emission probabilities in ProbCons.
![Page 11: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/11.jpg)
Homology-extended probabilistic consistency
P(xi ~ y j Îa* | x,y)¬1
Sa ig kP xi ~ zk Îa* | x,z( )
zk
å · b jg kP zk ~ y j Îa* | z,y( )zÎS
å
where αi , βj , and rk are the profile frequency.
![Page 12: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/12.jpg)
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.
Que1: how to build a profile?
Que2: how to score profiles?
![Page 13: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/13.jpg)
Que1: how to build a profile?• Database Size
• Searching parameters
– E-value : most used, anything else???
1. Matrix file : -M2. Filter the query sequence for low-complexity subsequence : -F3. Neighborhood word threshold : -f4. Truncates the report to number of alignments: -b
![Page 14: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/14.jpg)
Word hit & Neighborhood
![Page 15: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/15.jpg)
Searching parameters
• Fast, Insensitive search
– High percent identity
– blastp –F “m S” –f 999 –M BLOSUM80 –G 9 –E 2 –e 1e-5
• Slow, Sensitive search
– Increase sensitivity, decrease specificity
– blastp –F “m S” –f 9 –M BLOSUM45 –e 100 –b 10000 –v 10000
• Book “BLAST”, page 146, 147
![Page 16: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/16.jpg)
UniRef50TM
UniRef90TM
UniRef100TM
UniProtTM
Different database
UniProt (release 15.15 – 2010)
NCBI non-redundant (NR)
UniRef50 UniRef90 UniRef100
keyword:"Transmembrane [KW-0812]"
![Page 17: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/17.jpg)
Database Size
Data Set No.
UniRef50-TM 87,989
UniRef90-TM 263,306
UniRef100-TM 613,015
UniProt-TM 818,635
UniRef50 3,077,464
UniRef90 6,544,144
UniRef100 9,865,668
UniProt 11,009,767
NCBI NR 10,565,004
UniRef50TM
UniRef90TM
UniRef100TM
UniProtTM
UniProt (release 15.15 – 2010)
NCBI non-redundant (NR)
UniRef50 UniRef90 UniRef100
keyword:"Transmembrane [KW-0812]"
![Page 18: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/18.jpg)
Performance comparison of different database sizes for the BAliBASE2-ref7.
UniRef50-TM contains about 100 times fewer sequences than the full UniProt.
The level accuracy is comparable and even superior to that achieved with the default PSI-Coffee while the CPU time requirements are dramatically decreased by a factor 10.
![Page 19: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/19.jpg)
![Page 20: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/20.jpg)
10% more columns are correctly aligned when compared with PRALINETM .
The rows, Pairs and Cols, denote the sum of corrected aligned pairs and columns, respectively. The number of pairs and columns in the reference alignments are 3,294,102 and 1,781, respectively.
![Page 21: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/21.jpg)
BAliBASE 3.0
The performance of other methods are from Rausch et al. The SP and TC scores of full-length sequences are evaluated by core blocks (by xml).
![Page 22: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/22.jpg)
Que2: how to score profiles?
Edgar RC, Sjolander K: A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 2004, 20(8):1301-1308.
![Page 23: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/23.jpg)
• Prediction mode : –template_file PSITM
• Output : -output tm_html
This output was obtained on Or94b of D. melanogaster and its orthologs of other Drosophlia species.Notably, the predicted topology of the Or94b set is consistent with the Benton et al.’s conclusion.
![Page 24: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee](https://reader033.vdocument.in/reader033/viewer/2022052311/559510d71a28ab1c108b4741/html5/thumbnails/24.jpg)
Paolo Di Tommaso
http://tcoffee.crg.cat/tmcoffee