![Page 1: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/1.jpg)
Self-organizing Map (SOM) in Protein Folding Based on HP Model
Xiang-Sun [email protected]
http://zhangroup.aporc.org2003.12.2
2 Dec. 2003 at NCSU
![Page 2: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/2.jpg)
Motivation
We are all concerning what we (OR researchers and algorithm designers) can do in Bioinformatics?
What is the junction of Operations research and Bioinfomatics?
![Page 3: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/3.jpg)
AbstractMany problems in Bioinformatics can be formulated as large linear/nonlinear integer programming or combinatorial problems which are NP-hard and unsolvable within existing algorithms. Then efficient approxi- mate methods are needed.As examples, a heuristic algorithm for SBH and a new SOM algorithm for solving the protein HP model are presented.Other related research works in our group are introduced.
![Page 4: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/4.jpg)
Problem areas in Bioinformatics
Human Genome Project Large molecule data in biology, such as DNA an
d proteinGenomics ( 基因组学 ) DNA sequencing Gene prediction Sequence alignment
Proteomics(50000 entries in google)/Protenomics (hundreds entries in google)( 蛋白质学 ) Structure prediction Protein alignment
![Page 5: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/5.jpg)
“Operations Research”
Over 8 millions entries on “google”
![Page 6: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/6.jpg)
DNA Sequencing
ACGTGATCGATCGAGTACGAGAGTCTA_______________________________ACGTGATCGATCGAGTACGAGAGTCTAACGTGATCGATCGAGTACGAGAGTCTAACGTGATCGATCGAGTACGAGAGTCTAACGTGATCGATCGAGTACGAGAGTCTA
![Page 7: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/7.jpg)
Two pieces of a target sequence with longer overlap are preferably connected together, that needs that
the average size of the pieces is as long ٭ as possible and the duplicates of the target sequence are ٭
as many as possible.
![Page 8: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/8.jpg)
A novel DNA sequencing technique, called Sequencing By Hybridization (SBH), was proposed as an alternative to the traditional sequencing by gel electrophoresis.
SBH is based on the DNA chip (or DNA array). A DNA chip contains all probes of length (i.e. a short k-nucleotide fragment of DNA orcalled a k-tuple).
Given a probe and a target DNA, the target will bind (hybridize) to the probe if there is a substring of the target which “fits” the probe.
k4 k
![Page 9: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/9.jpg)
DNA SequencingDNA array (DNA chip) AAATGCG(5 3-tuples, a chip with 3-
tuples) 6443
![Page 10: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/10.jpg)
SBH uses classical probing scheme, i.e., by the hybridization of an (unknown) DNA fragment with this chip, the unknown target DNA can be tested and its all k-tuple compositions (called a spectrum) determined.
SBH provides information about k-tuples presented in target DNA, but does not provide information about positions of these k-tuples. This results in a problem: how to reconstruct the target DNA from this data.
![Page 11: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/11.jpg)
Because of the limitation of technology, k has not been taken as large as possible yet (generally less than 30---already a big chip). This possibly leads to the branching phenomenon in the sequence reconstruction and multiple reconstruction.
On the other hand, there are two cases of errors possibly occur: negative errors (i.e. some k-tuples in the sequence which are not hybridized) and positive errors (i.e. some hybridized probes which are not k-tuples in the sequence). Therefore, for larger DNA fragments, the problem of sequence reconstruction becomes rather complicated and hard to analyze.
![Page 12: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/12.jpg)
In the case of error-free SBH and ideal spectrum (i.e. consists of n-k+1 different k-tuples where n is the length of the DNA fragment), it is known that the SBH reconstruction problem is equivalent to finding an Eulerian path in a corresponding graph, and the algorithm can be implemented in linear time.
An occurrence of positive and negative errors and repetitions of k-tuple in the DNA fragment will result in a computational difficulty, i.e., the Problem becomes a strongly NP-hard one.
![Page 13: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/13.jpg)
Sequencing by Hybridization
DNA fragment ……ATACGAAGA……
Spectrum
Error: Positive (misread) / Negative (missing, repetition)
ATA TAC ACG CGA GAA AAG AGA
Ideal case
ATA TAC AGG CGA GAA AAG AGA
With errors
![Page 14: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/14.jpg)
1989,Pevzner, SBH reconstruction problem is equivalent to finding an Eulerian path in a related graph.
1990,Fleischner, the algorithm can be implemented in linear time.
1991,Dramanac,et al., an algorithm for SBH with errors under assumption that only the first or last nucleotide in the data can be erroneous.
1993,Lipshutz, use empirically derived rates of positive and negative errors and other assumptions. No convergence analysis.
1999,Blazewicz,et al., branch and bound method in the case of only positive errors.
2000,Blazewicz,et al., a heuristic algorithm producing near-optimal solutions.
![Page 15: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/15.jpg)
SBH Reconstruction ProblemDesign efficient heuristic algorithms
Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang. A new approach to the reconstruction of DNA sequencing by hybridization. Bioinformatics, vol 19(1), pages 14-21, 2003.
Xiang-Sun Zhang, Ji-Hong Zhang and Ling-Yun Wu. Combinatorial optimization problems in the positional DNA sequencing by hybridization and its algorithms. System Sciences and Mathematics, vol 3, 2002. (in Chinese)
Ling-Yun Wu, Ji-Hong Zhang and Xiang-Sun Zhang. Application of neural networks in the reconstruction of DNA sequencing by hybridization. In Proceedings of the 4th ISORA, 2002.
![Page 16: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/16.jpg)
Basic ObservationThe spectrum corresponds to a graph: each k-tuple to a vertex and two connected k-tuples to an edge. The structure of the graph is represented by
the adjacency matrix
A reconstruction of the spectrum is a path in the graph. Information about all
paths are implied in the power of the adjacency matrix
![Page 17: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/17.jpg)
Some criteria, using information in the power of adjacency matrix, which can determine the most possible k-tuples at both ends and in the middle of all possible reconstructions of the target DNA in a polynomial time
are given.
A novel means which can transform the negative errors into the positive errors is proposed. It enables us to handle both types of errors easily.
))(( 4knO
![Page 18: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/18.jpg)
Protein Structure Prediction
Predict protein 3D structure from (amino acid) sequenceSequence secondary structure 3D structure function
![Page 19: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/19.jpg)
Proteins Secondary Structure
-helix (30-35%)- 螺旋
-sheet / -strand (20-25%)- 折叠Coil (40-50%) 无规则卷曲Loop 环-turn - 转角
![Page 20: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/20.jpg)
3D Structure of Protein
Alpha-helix Beta-sheet
Loop and Turn
Turn or coil
![Page 21: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/21.jpg)
Protein 3D Structure Detection
X-ray diffractionX- 射线衍射法
ExpensiveSlow
![Page 22: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/22.jpg)
Protein Structure Prediction
Prediction is possible because Sequence information uniquely determines
3D structure Sequence similarity (>50%) tends to imply
structural similarity
Prediction is necessary because DNA sequence data » protein sequence data
» structure data
1994 1997 2002.10Sequence (Swiss-Port) 40,000 68,000 114,033Structure (PDB) 4,045 7,000 18,838
![Page 23: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/23.jpg)
Three Methods of Protein StructurePrediction
GoalFind best fit of sequence to 3D structure
Comparative (homology) modeling ( 同源建模法 )
Construct 3D model from alignment to protein sequences with known structure
Threading (fold recognition) ( 折叠识别法 )Pick best fit to sequences of known 2D / 3D structures (folds)
Ab initio / de novo methods ( 从头预测法 )Attempt to calculate 3D structure “from scratch”
Molecular dynamics Energy minimization Lattice models
![Page 24: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/24.jpg)
• Suppose that each amino acid occupies one point in a space lattice
• It is called an Exact Model
Lattice Models
![Page 25: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/25.jpg)
• Twenty amino acids can be divided into two classes: Hydrophobic/Non-polar (H) ( 疏水 )
Hydrophilic/Polar (P) ( 亲水 )• The contacts between H points are favorable
hydrophobic amino acid
hydrophilic amino acid
Covalent bond
H-H contact• Goal: maximize the number of H-H contacts
HP Model (Simple Model)
![Page 26: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/26.jpg)
Basic Ideas
Each acid (neuron) in the primary sequence occupies one lattice point (city).The distance between two cities mapped by two neighboring neurons is forced to be 1 as a covalent bond length between the amino acids in a protein molecule.Move the neurons to have more H-H contacts, I.e., emphasis on forming hydrophobic core.
![Page 27: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/27.jpg)
Main Observation
A Traveling Salesman Problem with an energy function concerning the H-H contacts that would be maximized.
![Page 28: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/28.jpg)
Mathematical Model (in square lattice)Let the both of sequence and lattice size be , let for the i-th acid taking the j-th lattice point or not. Let be the neighboring set of point j. Let and the coordinates of point j be
n 0/1ijx)( jN
0/1)(/ ifPHi
niYxYx
nix
njxtosubject
xifxif
n
jjji
n
jjij
n
jij
n
iij
n
j
n
i jNs
n
iisij
,...,2,1||||
,...,1,1
,...,1,1
])()([max
1)1(
1
1
1
1 1 )( 1
3/2/1|)(| jNjY
![Page 29: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/29.jpg)
ComplexityNP-hard problem even in the case of two dimensional HP model
P.Crescenzi, et al. On the complexity of protein folding, Journal of Computational Biology, 5(3): 423-, 1998
Many local solutions
GA MC SA ----- time consuming
![Page 30: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/30.jpg)
SOM ApproachExisting algorithm Motivated by Self-Organizing-Map for TSP Incorporation of HP Information Compact lattice (the sequence exactly fills the lattice)
A 36-long sequenceIn a 6x6 lattice
![Page 31: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/31.jpg)
New SOM ApproachMotivation
Consider a bigger lattice than the sequence to have more flexible shapes than the only rectangular shape Equivalent to a PCTSP (Price Collecting Traveling Salesman Problem): a man travels only a part of the city set with some expectation.
Difficulties caused:Number of cities > number of neurons
![Page 32: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/32.jpg)
PCTSPA traveling salesman who gets a prize in every city k that he visits and pays a penalty for every city that he fails to visit, and who travels between cities i and j at cost , wants to
minimize the sum of his travel cost and net penalties, while including in his tour enough cities to collect a prescribed amount of prize money.
kf
lp
l
ijc
0f
![Page 33: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/33.jpg)
n
j
n
jjjijij
m
jij
j
n
iij
m
j
n
i jNs
n
iisij
niYxYx
nix
mjyxtosubject
xifxif
1 1)1(
1
1
1 1 )( 1
,...,2,1||||
,...,1,1
,...,1,1
])()([max
The New SOM model is corresponding to the integer programming:
where m>n and the total variables are (n+1)m.
![Page 34: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/34.jpg)
New SOM Approach
Innovate Points
Heuristic initialization to imitate a protein
Learning sample set partition strategy Learning sample set reduction strategy Local search procedure to overcome
the multi-mapping phenomena
![Page 35: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/35.jpg)
Numerical Results
1. Constructed HP sequences
(Length of 17)
2. HP benchmark (up to 36 amino acids)
![Page 36: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/36.jpg)
SOM Approach for 2D HP-Model
Xiang-Sun Zhang, Yong Wang, Zhong-Wei Zhan, Ling-Yun Wu, Luonan Chen. A New SOM Approach for 2D HP-Model of Proteins' Structure Prediction. Submitted to RECOMB04.
Yong Wang, Zhong-Wei Zhan, Ling-Yun Wu, Xiang-Sun Zhang. Improved Self-Organizing Map Algorithm for Protein Folding and its Realization. Submitted to J. of Systems Science and Mathematical Sciences. (in Chinese)
![Page 37: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/37.jpg)
Main Inprovements
Find the global maximum H-H contacts configurations in all the testsFind more optimal conformationsFast -- running time is linear with the sequence length
![Page 38: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/38.jpg)
Unique Optimal Folding Problem
What proteins in the two dimensional HP model have unique optimal (minimum energy) folding? (Brian Hayes, 1998)
Oswin Aichholzer proved that in square lattice
There are closed chains of monomers with this property for all even lengths.
There are open monomer chains with this property for all lengths divisible by four.
![Page 39: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/39.jpg)
Square Lattice and Triangular Lattice
![Page 40: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/40.jpg)
Our Results
For any n = 18k (k is a positive integer), there exists an n-node (open or closed) chain with at least optimal foldings all with isomorphic contact graphs of size n/2.
On 2D triangular lattice, for any integer n> 19, there exist both closed and open chains of n nodes with unique optimal folding.
)(3 nO
![Page 41: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/41.jpg)
Proteins With Unique Optimal Foldings
Zhen-Ping Li, Xiang-Sun Zhang, Luo-Nan Chen, Protein with Unique Optimal Foldings on a Triangular Lattice in the HP Model, Submitted to Journal of Computational Biology.
![Page 42: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/42.jpg)
Examples of Optimal Foldings
![Page 43: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/43.jpg)
3D Protein Structure Alignment
Motivation Group proteins by structural similarity Determine impact of individual
residues on protein structure Identify distant homologues of protein
families Predict function of proteins with low
sequence similarity Identify new folds / targets for x-ray
crystallography
![Page 44: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/44.jpg)
3D Protein Structure Alignment
Correspondence between atoms Pairwise sequence alignment
Locations of atoms Protein Data Bank (in PDB file)
Bond angles / lengths X,Y,Z atom coordinates
Evaluation metric 6 degrees of freedom
3 degrees of translation (A) 3 degrees of rotation (R)
Root Mean Square Deviation (RMSD) n = number of atoms di = distance between corresponding atoms i
2i
i
dRMSD
n
![Page 45: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/45.jpg)
Structure Alignment Problem
),,( 13
12
11
1iiii xxxX
),,( 23
22
21
2jjjj xxxX
i
j
![Page 46: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/46.jpg)
Match two rigid bodies by rotating and removing them in the 3D space
![Page 47: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/47.jpg)
1 2
1 1
2 2
1
2
2
1 2
1 1
(1) (1)0 10 0
1 2
(2) (2)0 10 0
1 1
20
10
min ( , , )
( )
( )
s.t. 1, 1,2, , ;
1, 1, 2, , .
i i
j j
N N
ij i ji j
N N
i i ii i
N N
j j jj j
N
iji
N
ijj
E S A R s A RX X
s s s
s s s
s j N
s i N
Structure Alignment Problem
A nonlinear integer programming problem:
![Page 48: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/48.jpg)
Structure Alignment Problem
Luo-Nan Chen, Tian-Shou Zhou, Yun Tang, Xiang-Sun Zhang. Structure of Alignment of Protein by Mean Field Annealing. Submitted to ICSB2003.
![Page 49: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/49.jpg)
On-going ResearchProtein structure prediction Algorithms for HP model Threading methods
Protein structure alignment Novel model for structure alignment
SBH reconstruction Algorithms for new pattern SBH methods
SNP(Single Nucleotide Polymorphism) and Haplotype analysis
![Page 50: Self-organizing Map (SOM) in Protein Folding Based on HP Model Xiang-Sun ZHANG ZHANGroup@bioinfoamss.org 2003.12.2 2 Dec. 2003](https://reader036.vdocument.in/reader036/viewer/2022062620/551addce55034606048b57da/html5/thumbnails/50.jpg)
SummaryProblems in Bioinformatics are simple in description but complicated in solving
Many problems in Proteomics are in deterministic nature Combinatorial Continuous model
while many problems in Genomics are instochastic nature
Model a problem accurately but solves it approximately