STRUCTURAL COMPARISON
By: Z. S. Rezaei
Structural comparison Structural alignment spectrum
of structural alignment methods The properties of output Types of comparison Algorithmic complexity Representation of structures Distance matrix Methods Alignment of large RNA molecules The classes of scoring
Structural alignment homology between two or more polymer (2) a window into the distant past of protein evolution(1) identification homologous(1) imply evolutionary relationships between proteins that share
very little common sequence(2) prediction of the functions and the family of the query
protein(2)
STRUCTURAL ALIGNMENT
rely on information about conformations.( from X-ray crystallography or NMR spectroscopy or structure prediction methods)for evaluating prediction methods
spectrumof structural alignment methods(1)
Rigid aligner
Flexible aligner
Elastic aligner
The properties of out put a superposition of the atomic coordinate sets and a
minimal RMSD. existence of multiple protein domains complicates the
Structural alignment a set of superposed three-dimensional coordinates for
each input structure(2)
DEFINITION OF RMS
the root mean square (RMS)
(3)
DEFINITION OF COORDINATE SYSTEM(5)
Determination uniquely a spatial element
Coordinate system Spatial coordinate Planar coordinate
Types of comparisons
Structural superposition
used to compare multiple conformations of the same protein
uses a simple least-squares fitting algorithm(2)
Alignment Algorithms based on multidimensional rotations and modified quaternions (2)
Definition of quaternion
a number system In mathematics a quaternion as the quotient of two directed lines in a three-dimensional represented as the sum of a scalar and a vector(6)
Algorithmic complexity
Optimal solution Approximate solution(2)
Optimal solution The optimal "threading" shown to be NP-complete Strictly speaking, an optimal solution is only known
for certain protein structure similarity measures the algorithm for optimal solution is not practical
(2)
Approximate solution
Approximate polynomial-time algorithms for structural
alignment
theoretically classify the approximate protein structure
alignment(2)
Representation of structures
Protein structurrepresented in some coordinate-independent space
by constructing series of matrices (2)
distance matrix
a two-dimensional matrix
distances between some subset of the atoms (such as the alpha carbons)
Methods(2) DALI Combinatorial extension(CE) GANGSTA+ MAMMOTH ProBiS RAPIDO SABERTOOTH SSAP Spalign TOPOFIT SSM
DALI distance alignment matrix method breaks the input structures into hexapeptide fragments and
calculates a distance matrix Distance matrix has two diagonals conducted via a series of overlapping submatrices of size
6x6 Submatrix matches are reassembled into a final alignment
DALI The original version used a Monte Carlo simulation The DALI method has also been used to construct a
database known as FSSP (Fold classification based on Structure-Structure alignment of Proteins, or Families of Structurally Similar Proteins)
There is an searchable database based on DALI as well as a downloadable program and web search based on a standalone version known as DaliLite.
Montecarlo method
a class of computational algorithms
relies on repeated random sampling to compute their results
especially useful for simulating systems with many coupled degrees of freedom, such as fluids, disordered materials, strongly coupled solids, and cellular structures (4)
http://ehkinda.biocenter.helsinki.fi/dali_server/
http://ebi.ac.uk/Tools/structure/dalilite
http://ebi.ac.uk/Tools/structure/dalilite
http://ebi.ac.uk/Tools/structure/dalilite
http://ebi.ac.uk/Tools/structure/dalilite
Combinatorial extension(CE)
is similar to DALI
uses AFPs to define a similarity matrix
A number of similarity
metrics are possible
Combinatorial extension(CE)
initial AFP pair that nucleates the alignment proceed with the next AFP The RCSB PDB has recently released an updated
version of CE and FATCAT as part of the RCSB PDB Protein Comparison Tool
provides a new variation of CE that can detect circular permutations in protein structures
Circular permutations A circular permutation
is a relationship between proteins whereby the proteins have a changed order of amino acids in their peptide sequence. The result is a protein structure with different connectivity, but overall similar three-dimensional (3D) shape(7)
GANGSTA+ A combinatorial algorithm for non-sequential
structural alignment of proteins searching for similarity in databases
(http://agknapp.chemie.fu-berlin.de/gplus/) evaluates based on contact maps and secondry
structure
MAMMOTH MAtching Molecular Models Obtained from Theory For comparing models coming from structure prediction decompose the protein structure into heptapeptides The similarity score between two heptapeptides is calculated
using a unit-vector RMS (URMS) method These scores are stored in a similarity matrix Derived from the likelihood of obtaining a given structural
alignment by chance
MAMMOTH-mult extension of the MAMMOTH algorithm is very fast produces consistent and high quality structural
alignments produces structurally implied sequence alignments
that can be further used for multiple-template homology modeling
ProBiS Protein Binding Sites. ProBiS detects structurally similar sites on protein surfaces compares the query protein to members of a
database of protein 3D structures Using an efficient maximum clique algorithm Structural similarity scores are calculated for the
query protein’s surface residues, and are expressed as different colors
used successfully for the detection of protein–protein, protein–small ligand and protein–DNA binding sites
RAPIDO Rapid Alignment of Proteins In terms of Domains a web server for the 3D alignment of crystal using an approach based on difference distance
matrices The Matching Fragment Pairs (MFPs) are then
represented as nodes in a graph nodes in graph are chained together to form an
alignment by means of an algorithm for the identification of the longest path on a DAG (Directed Acyclic Graph).
The final step: improve the quality of the alignment
SABERTOOTH structural profiles to perform structural alignments has favourable scaling of computation time with
chain length SABERTOOTH can be used online at
http://www.fkp.tu-darmstadt.de/sabertooth/
SSAP Sequential Structure Alignment Program uses double dynamic programming constructs its vectors from the beta carbons for all residues
except glycine A series of matrices are constructed Dynamic programming applied to each resulting matrix matrices are then summed into a "summary" matrix to Final dynamic programming is applied again to determine
the overall structural alignment
SSAP originally produced only pairwise alignments but has since been extended to multiple
alignments as well applied in an all-to-all fashion to produce a
hierarchical fold classification scheme known as CATH (Class, Architecture, Topology, Homology)
construct the CATH Protein Structure Classification database
SPalign Based on a new size-independent score
called SPscore for The source code for SPalign and the
server are available at http://sparks.informatics.iupui.edu/yueyang/server/SPalign/
TOPOFIT Based on Delaunay tessellation (DT) identifies a feature point on the RMSD/Ne curve topomax point to detect conformational changes, topological
differences in variable parts
SSM
Secondary Structure Matching (SSM), or PDBeFold at the Protein Data Bank in Europe
uses graph matching followed by c-alpha alignment to compute alignments
Recent DevelopmentsTmalign
uses a novel method for weighting its distance matrix
correct for effects arising from alignment lengths
RNA structural alignment large RNA molecules also form
characteristic tertiary structures A recent method for pairwise structural
alignment of RNA sequences implemented in the program FOLDALIGN
In low sequence identity cases
3D
2D
1D
0D
score
(1)
References1. Hitomi Hasegawa and Liisa Holm: Advances and pitfalls of protein
structural alignment, Current Opinion in Structural Biology 2009, 19:341–348
2. en.wikipedia.org/wiki/structural_alignment software
3. Cartwright, Kenneth V (Fall 2007). "Determining the Effective or RMS Voltage of Various Waveforms without Calculus". Technology Interface 8 (1): 20 pages
4. Anderson, H.L. (1986). "Metropolis, Monte Carlo and the MANIAC". Los Alamos Science 14: 96–108
5. Weisstein, Eric W., "Coordinate System" from MathWorld
6. Boris Abramovich Rozenfelʹd (1988). The history of non-euclidean geometry: evolution of the concept of a geometric space. Springer. p. 385
7. Cunningham, B. A.; Hemperly, J. J.; Hopp, T. P.; Edelman, G. M. (1979). "Favin versus concanavalin A: Circularly permuted amino acid sequences". Proceedings of the National Academy of Sciences of the United States of America 76 (7): 3218–3222
I am ready to answer your questions