hierarchical clustering of homolog groups based on overlap of secondary structure

1
Alignment of primary structure is the basis of detection of putative homologous proteins. The software BLAST is the most popular and efficient tool for calculation of these alignments. This software prints a value (called by E-value) that represents the number of alignments which scores equal or better than those obtained without homology relationship. However, when score is under a certain limit (< 100), the E-value will not achieve a trustable value. In this work we used the pair-wise overlapping of secondary structure to build clusters of homolog groups, using as a model the COG of Superoxide dismutase (COG0605) bearing 74 sequences, enriched with UniRef50 members. HIERARCHICAL CLUSTERING OF HOMOLOG GROUPS BASED HIERARCHICAL CLUSTERING OF HOMOLOG GROUPS BASED ON OVERLAP OF SECONDARY STRUCTURE ON OVERLAP OF SECONDARY STRUCTURE Oto Coelho Jr., Lucas Santos, Adriano Silva, Izabella Pena, J Miguel Ortega¹ [email protected], [email protected], [email protected], [email protected], [email protected] Supported by UFMG INTRODUCTION INTRODUCTION Metodology Metodology Results Results A total of 551 sequences were submitted to the prediction of secondary structure with the software SSPro4, resulting in output files using the alphabet of H: alpha-helix, E: beta-strand and C: unstructured coil. Predictions were pair-wised aligned (Clustal/W) and used to calculate the Structural overlap (SOV). The SOV result is a percentage of secondary structure overlap. After this, a local implementation of hierarchical clustering (Average and Simple Linkage politic) was used to create clusters (named CSO after clusters of structural overlap) over the filtered similarity values in a cutoff of 70%. In a Simple Linkage politic (considering the maximum SOV value between sequences) was able to create 15 clusters containing from two up to 224 members. In a Average Linkage politic (considering the average SOV between sequences) Using Sov values, a local implementation of hierarchical clustering was able to create 6 clusters containing from one up to 510 members We also calculated the hierarchical clustering in simple and average politics for four other groups of proteins: Ferredoxin, Peroxiredoxin, Hydrogenase Maturation Factor and Transcription Elongation Factor. 1. Laboratório de Biodados. Departamento de Bioquímica e Imunologia, ICB-UFMG Figure 3. Peroxiredoxin clusters using a simple and average linkage politic of hierarchical clustering. Figure 2. Ferredoxin clusters using a simple and average linkage politic of hierarchical clustering. Figure 4. Simple and Average politics of hierarchical clustering for Hydrogenase maturation factor group. Figure 5. Simple and Average politics of hierarchical clustering for Transcription elongation factor. We evaluated the clustering procedure by analyzing the CSO in respect of their UniRef_50 clusters composition. We found that CSO0605_1 (arrow) gathered members of 22 different UniRef50 clusters. Conversely, Seed Linkage software and PSI-BLAST grouped, respectively, 541 and 531 sequences. Conclusions Conclusions Thus, hierarchical clustering of secondary structure represents a novel and more stringent clustering procedure than COG, Seed Linkage and PSI-BLAST although not so stringent as UniRef50 procedure. These pictures below show two representants of the superoxide dismutases group that have resolved structures in the PDB database. These proteins are aparently similar, but they are in a different clusters. Figure 6. Seed Linkage in Ferredoxin goup. Figure 7. Two proteins of Superoxide dismutases group that are in different clusters.. Figure 1. Superoxide dismutase clusters using a simple and average linkage politic of hierarchical clustering.

Upload: lindsay

Post on 06-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Figure 4. Simple and Average politics of hierarchical clustering for Hydrogenase maturation factor group. Figure 5. Simple and Average politics of hierarchical clustering for Transcription elongation factor. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HIERARCHICAL CLUSTERING OF HOMOLOG GROUPS BASED ON OVERLAP OF SECONDARY STRUCTURE

Alignment of primary structure is the basis of detection of putative homologous proteins. The software BLAST is the most popular and efficient tool for calculation of these alignments. This software prints a value (called by E-value) that represents the number of alignments which scores equal or better than those obtained without homology relationship. However, when score is under a certain limit (< 100), the E-value will not achieve a trustable value. In this work we used the pair-wise overlapping of secondary structure to build clusters of homolog groups, using as a model the COG of Superoxide dismutase (COG0605) bearing 74 sequences, enriched with UniRef50 members.

HIERARCHICAL CLUSTERING OF HOMOLOG GROUPS HIERARCHICAL CLUSTERING OF HOMOLOG GROUPS BASED ON OVERLAP OF SECONDARY STRUCTUREBASED ON OVERLAP OF SECONDARY STRUCTURE

Oto Coelho Jr., Lucas Santos, Adriano Silva, Izabella Pena, J Miguel Ortega¹[email protected], [email protected], [email protected], [email protected], [email protected]

Supported by

UFMG

INTRODUCTIONINTRODUCTION

MetodologyMetodology

ResultsResults

A total of 551 sequences were submitted to the prediction of secondary structure with the software SSPro4, resulting in output files using the alphabet of H: alpha-helix, E: beta-strand and C: unstructured coil. Predictions were pair-wised aligned (Clustal/W) and used to calculate the Structural overlap (SOV). The SOV result is a percentage of secondary structure overlap. After this, a local implementation of hierarchical clustering (Average and Simple Linkage politic) was used to create clusters (named CSO after clusters of structural overlap) over the filtered similarity values in a cutoff of 70%.

In a Simple Linkage politic (considering the maximum SOV value between sequences) was able to create 15 clusters containing from two up to 224 members. In a Average Linkage politic (considering the average SOV between sequences) Using Sov values, a local implementation of hierarchical clustering was able to create 6 clusters containing from one up to 510 members

We also calculated the hierarchical clustering in simple and average politics for four other groups of proteins: Ferredoxin, Peroxiredoxin, Hydrogenase Maturation Factor and Transcription Elongation Factor.

1. Laboratório de Biodados. Departamento de Bioquímica e Imunologia, ICB-UFMG

Figure 3. Peroxiredoxin clusters using a simple and average linkage politic of hierarchical clustering.

Figure 2. Ferredoxin clusters using a simple and average linkage politic of hierarchical clustering.

Figure 4. Simple and Average politics of hierarchical clustering for Hydrogenase maturation factor group.

Figure 5. Simple and Average politics of hierarchical clustering for Transcription elongation factor.

We evaluated the clustering procedure by analyzing the CSO in respect of their UniRef_50 clusters composition. We found that CSO0605_1 (arrow) gathered members of 22 different UniRef50 clusters. Conversely, Seed Linkage software and PSI-BLAST grouped, respectively, 541 and 531 sequences.

ConclusionsConclusionsThus, hierarchical clustering of secondary structure represents a novel and more stringent clustering procedure than COG, Seed Linkage and PSI-BLAST although not so stringent as UniRef50 procedure.

These pictures below show two representants of the superoxide dismutases group that have resolved structures in the PDB database. These proteins are aparently similar, but they are in a different clusters.

Figure 6. Seed Linkage in Ferredoxin goup.

Figure 7. Two proteins of Superoxide dismutases group that are in different clusters..

Figure 1. Superoxide dismutase clusters using a simple and average linkage politic of hierarchical clustering.