xpang_paper_a2

14
A Mathematical Model for Peptide Inhibitor Design *XIAODONG PANG, 1,2 *LINXIANG ZHOU, 1 MINGJUN ZHANG, 3 FANG XIE, 3 LONG YU, 3 LILI ZHANG, 4 LINA XU, 4 and XINYI ZHANG 1,2 ABSTRACT This article presents a mathematical model on the design of peptide inhibitors for proteins. This model is a combination of the two rules on protein-ligand interaction, Miyazawa- Jernigan (M-J) matrix and hidden Markov model (HMM). The model is applied to predict peptide inhibitors for the protein cyclophilin A (CypA) and FKBP12, and then validated by the highest occupied molecular orbital calculation, dock process between protein and in- hibitor, and biological experiments. The results are encouraging and suggest that we have taken a step forward towards building a mathematical theory on the design of peptide inhibitors for proteins. The mathematical model is rough at present, but if it represents a correct direction of the theoretical trends of biology as we believe, then this theory can be further developed and become more and more precise. Key words: hidden Markov model, mathematical model, Miyazawa-Jernigan matrix, peptide inhibitor design, protein-ligand interaction. 1. INTRODUCTION S o far, with respect to finding a real ligand for a given target protein, we are limited to experi- mental screening from a large number of small molecules in drug databases or computer scans through free energy calculation of assessing a ligand. Here, we build a mathematical model to help find the peptide inhibitor of a protein. As the Nobel laureate Walter Gilbert said (Gilbert, 1991), ‘‘The new paradigm, now emerging, is that all the ‘genes’ will be known (in the sense of being resident in databases available electronically), and that the starting point of a biological investigation will be theoretical. An individual scientist will begin with a theoretical conjecture, only then turning to experiment to follow or test that hypothesis. The biology will not be a science based on observation and experiment only, it would have theoretical trends.’’ Under specific circumstances, peptides can play an important role in the discovery of lead compounds. De novo peptide design was started in 1995 and ever since has received intense scrutiny in drug discovery for its key advantage of easy synthesis. Current methods for structure-based drug design can be roughly divided into two categories. The first category is directly screening a large number of candidates from an existing database. The second category is structure generation (Lewis and Leach, 1994), also referred to as de novo design. 1 State Key Laboratory of Surface Physics and Department of Physics, 2 Synchrotron Radiation Research Center, and 3 State Key Laboratory of Genetic Engineering Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China. 4 Department of Electrical and Computer Engineering, Rice University, Houston, Texas. *These two authors contributed equally to this work. JOURNAL OF COMPUTATIONAL BIOLOGY Volume 17, Number 8, 2010 # Mary Ann Liebert, Inc. Pp. 1081–1093 DOI: 10.1089/cmb.2009.0272 1081

Upload: xiaodong-pang

Post on 18-Aug-2015

11 views

Category:

Documents


1 download

TRANSCRIPT

A Mathematical Model for Peptide Inhibitor Design

*XIAODONG PANG,1,2 *LINXIANG ZHOU,1 MINGJUN ZHANG,3

FANG XIE,3 LONG YU,3 LILI ZHANG,4 LINA XU,4 and XINYI ZHANG1,2

ABSTRACT

This article presents a mathematical model on the design of peptide inhibitors for proteins.This model is a combination of the two rules on protein-ligand interaction, Miyazawa-Jernigan (M-J) matrix and hidden Markov model (HMM). The model is applied to predictpeptide inhibitors for the protein cyclophilin A (CypA) and FKBP12, and then validated bythe highest occupied molecular orbital calculation, dock process between protein and in-hibitor, and biological experiments. The results are encouraging and suggest that we havetaken a step forward towards building a mathematical theory on the design of peptideinhibitors for proteins. The mathematical model is rough at present, but if it represents acorrect direction of the theoretical trends of biology as we believe, then this theory can befurther developed and become more and more precise.

Key words: hidden Markov model, mathematical model, Miyazawa-Jernigan matrix, peptide

inhibitor design, protein-ligand interaction.

1. INTRODUCTION

So far, with respect to finding a real ligand for a given target protein, we are limited to experi-

mental screening from a large number of small molecules in drug databases or computer scans through free

energy calculation of assessing a ligand. Here, we build a mathematical model to help find the peptide inhibitor

of a protein. As the Nobel laureate Walter Gilbert said (Gilbert, 1991), ‘‘The new paradigm, now emerging, is

that all the ‘genes’ will be known (in the sense of being resident in databases available electronically), and that

the starting point of a biological investigation will be theoretical. An individual scientist will begin with a

theoretical conjecture, only then turning to experiment to follow or test that hypothesis. The biology will not be

a science based on observation and experiment only, it would have theoretical trends.’’

Under specific circumstances, peptides can play an important role in the discovery of lead compounds.

De novo peptide design was started in 1995 and ever since has received intense scrutiny in drug discovery

for its key advantage of easy synthesis. Current methods for structure-based drug design can be roughly

divided into two categories. The first category is directly screening a large number of candidates from an

existing database. The second category is structure generation (Lewis and Leach, 1994), also referred to as

de novo design.

1State Key Laboratory of Surface Physics and Department of Physics, 2Synchrotron Radiation Research Center, and3State Key Laboratory of Genetic Engineering Institute of Genetics, School of Life Sciences, Fudan University,Shanghai, China.

4Department of Electrical and Computer Engineering, Rice University, Houston, Texas.*These two authors contributed equally to this work.

JOURNAL OF COMPUTATIONAL BIOLOGY

Volume 17, Number 8, 2010

# Mary Ann Liebert, Inc.

Pp. 1081–1093

DOI: 10.1089/cmb.2009.0272

1081

The main propose of this article is to present a new method of peptide design. However, it is also our

purpose here to test our whole theoretical approach to peptide inhibitor design and to prove that it works.

Our target is to build a mathematical model, and then make the design work of peptide inhibitor as a step on

the path of our theory. Our mathematical model comprises four knowledge blocks: (1) the two rules on

protein-ligand interaction, (2) Miyazawa-Jernigen (M-J) matrix, (3) hidden Markov model (HMM), and (4)

residue-residue contact preferences.

The outline of this article is as follows. First, we explain the four knowledge blocks of our mathematical

model, as well as our particular mathematical model. In Results and Discussion, we present the design of

the tripeptide inhibitor Ala-Gly-Pro (AGP) for protein cyclosporine A (CypA) and dipeptide Gly-Gln for

protein FKBP12, as well as our criteria and experimental results. Finally, we provide a conclusion.

2. METHODS

2.1. Two rules on protein-ligand interaction

Combining the full electronic structure calculation and surface pocket calculation of proteins, we pro-

pose two rules on protein-ligand interaction. For more detail, please refer to Pang et al. (2008). The first

rule is that interactions only occur between the lowest unoccupied molecular orbitals (LUMOs) of a protein

and the highest occupied molecular orbital (HOMO) of its ligand, not between the HOMOs of a protein and

the LUMO of its ligand. This provides a rough criterion to ligand selection. The second rule is that only

those residues or atoms located both on the LUMOs of a protein and in a surface pocket of a protein are

active residues or active atoms of the protein and the corresponding pocket is the ligand binding site. This

enables us to identify not only the ligand binding site, but also the active residues, and even the active

atoms of a protein.

2.2. Miyazawa-Jernigen matrix

Miyazawa and Jernigan (1985) used existing protein databases and demonstrated that the stronger the

interaction between two residues is the greater their chance of connecting with each other. They analyzed

thousands of crystal structures of proteins in the existing protein databases and obtained the statistical

contact energy between all 20 kinds of residues to construct a 20�20 symmetry matrix. (We call it the ‘‘M-J

matrix,’’ and the unit is RT¼ 0.60 Kcal/mol¼ 4.2�10�21 J¼ 0.0260160 eV.)

The M-J matrix has 20�20¼ 400 elements. As a symmetry matrix, it has 210 independent elements,

including 20 diagonal elements and 190 off-diagonal elements. But when you subtract the average value

from each element and solve its eigen equation to obtain 22 independent elements (Li et al., 1997), 20 of

them express the relative average potential of 20 residues in the folded protein. As for the other two, one

expresses the potential strength, and the other expresses the interaction coupled strength between residues.

Nonetheless that potential strength is two orders of magnitude larger than the interaction coupled strength

between residues, which means that the structure of a protein is not decided by the interaction between

residues, but rather by the average potential in fold.

If we do not consider degeneration, the M-J matrix should have 20 eigen vectors Va and eigen value la.The M-J matrix can be expressed by formula (1):

Mij¼X20

a¼1

kaVa, iVa, j (1)

where i, j is the index of residue. Two of the 20 eigen values have the biggest absolute value:

k1¼ � 22:49

k2¼ 18:62 (2)

kother ¼ 0:013~2:17

Therefore, taking the average hMiji, the formula (1) can be written as

Mij¼hMijiþ k1V1, iV1, jþ k2V2, iV2, j (3)

And the eigen vectors V1 and V2 are relative.

1082 PANG ET AL.

V2, i¼ bþ cV1, i¼ � 0:30� 0:90V1, i (4)

Let qi¼V1,i, then Mij can be simplified as M0ij

M0ij¼C0þC1(qiþ qj)þC3qiqj¼ � 1:492þ 5:030(qiþ qj)� 7:400qiqj (5)

The q value of each residue is shown in Table 1.

The M-J matrix has two large eigen values, which expresses that 20 kinds of residue can be roughly

divided into two groups: hydrophobic residue (H) and polar residue (P). Given this fact, the interaction

between residues has three varieties: H-H, P-P, and H-P. We also find an interesting phenomenon from the

q value of residues: the q values are divided into two groups, and there is a gap between them (Fig. 1).

2.3. HMM

Markov chain is a stochastic process. The next state of a one-step Markov chain is relative to the present

state only, but not relative to the previous state. In this article, we simply call one-step Markov chain a

‘‘Markov chain.’’

Suppose some of the previous states are X0, X1, . . . , Xt, then the probability P(Xtþ1 jXt) of its next state Xtþ1

is only relative to the state Xt. The probability from the state i to the state j is called the ‘‘transition matrix’’:

Pij¼PfXtþ1¼ ajjXt¼ aig (6)

The HMM contains two sequences of stochastic variable. One is the non-observable Markov chain,

which is expressed by the transition matrix. The other is an observable stochastic sequence, which describes

the likely output probability of each observable value under some state of Markov chain through an

emission matrix.

Another element of a Markov chain is its initial distribution p¼ {p:}, so HMM has five elements:

1. The state number M of a Markov chain.

2. The observable N values for each state.

3. A transition matrix T of a Markov chain with M�M dimensions; the sum of each row is one.

4. An emission matrix E with M�N dimensions; the sum of each row is one.

5. An initial distribution of a Markov chain: p.

These five elements form HMM l¼ (TEp).

Table 1. The q Value of Each Residue

Residue q (RT)

Leu (L) �0.443

Phe (F) �0.438

Ile (I) �0.390

Met (M) �0.327

Val (V) �0.315

Trp (W) �0.298

Cys (C) �0.265

Tyr (Y) �0.226

Ala (A) �0.125

His (H) �0.107

Thr (T) �0.058

Pro (P) �0.054

Gly (G) �0.048

Gln (Q) �0.023

Arg (R) �0.020

Ser (S) �0.011

Asn (N) �0.011

Glu (E) þ0.028

Asp (D) þ0.048

Lys (K) þ0.065

MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1083

2.4. Residue-residue contact preferences

Glaser et al. (2001) used a non-redundant set of 621 protein-protein interfaces of known high-resolution

structures to derive residue composition and residue-residue contact preferences. They estimated the

likelihood Gij(v) of contacts between a pair of residues i and j as a criterion for propensity of residue-

residue contact (Table 2):

Gij(v)¼A log (Qij(v)=WiWj) (7)

where Qij(v) is the number of residue-residue contacts by residue volumes V:

Qij¼ (v)¼CijViVj=X

k, l

(CklVkVl) (8)

and Cij is the total number of contacts observed between residue type i and j, and Wi is defined as

Wi¼Fi=X

i

Fi (9)

Fi is the number of residue i, having at least one contact with any residue across the interface.

2.5. Mathematical model for peptide inhibitor design

As we mentioned above, we will build a mathematical model for peptide inhibitor design based on the

above four knowledge blocks. Its principles are as follows:

1. Take a protein sequence as a Markov chain. That means the appearing probability of a residue in a

sequence is decided by its previous residue only.

2. The Markov chain has 20 states due to 20 kinds of residue. They construct 20�20 transition matrix by

the elements of M-J matrix, but let the sum of each line of this matrix be one.

Pij¼ exp (�M0ij=kT)=X20

j¼1

exp (�M0ij=kT) (10)

where k is Boltzmann constant and take temperature T¼ 300 K.

3. The emission matrix has to express the appearing probability of a residue in a peptide inhibitor. It is

also a 20�20 matrix. It should incorporate as much biological knowledge as possible.

FIG. 1. q value of each residue. These values are divided into two groups by a gap between �0.20 and �0.15.

Hydrophobic residues are on the left site of the gap, while the polar residues are on the right side.

1084 PANG ET AL.

The biggest challenge is how to determine the probability of a residue appearing in a peptide inhibitor.

So far we only have limited experimental data to write this emission matrix. The rule of thumb for

constructing an emission is as follows:

a. As a Markov chain, first of all, we have to determine which residue as initial residue of peptide inhibitor.

b. We take Fabian Glaser’s residue-residue contact preferences Gij(v) as the emission matrix (Table 2).

But we need to let the sum of each line of this matrix be one:

gij¼Gij(v)=X20

j¼1

Gij(v) (11)

c. According to the existing experimental data, six residues (Ile, Trp, Tyr, Pro, Arg, and Asp) often fall

down in the active pocket and four small residues (Phe, Val, Ala, and Gly) often as company.

Therefore, we should pay more attention to these 10 residues.

This is a key step in determining the emission matrix in our peptide inhibitor design. At present, we only

construct a mathematical theory, but the emission matrix needs to be revised step by step according to new

exact experimental data; then, this theory will have higher precision.

4. According to the two rules on protein-ligand interaction, for a given protein its LUMOs energy is

fixed. Hence, a ligand with higher HOMO energy would have more chance to interact with the protein

from the energy viewpoint. In other words, the higher the energy level of a peptide, the higher its

probability to interact with the protein. It could be argued that we should select a peptide inhibitor that

has higher M-J energy.

5. Now when having the transition matrix and the emission matrix of HMM and two rules on protein-

ligand interaction, we obtain enough materials to write a script program called ‘‘PEPTIDE.m’’, using

Matlab to generate the potential peptide sequences for a given protein.

6. But the selection of inhibitor requires very complex engineering. We could not use only HMM theory.

We still need to employ some other criteria to identify them, such as HOMO calculation and dock

process, for re-selection. Finally, some biological experiments are undertaken to assist and validate

the identification.

Table 2. Residue-Residue Contact Preferences

I V L F C M A G T S W Y P H E Q D N K R

I 3.89 4.91 4.59 5.33 1.76 5.25 2.84 0.77 3.05 1.00 6.24 5.61 3.27 3.38 3.20 3.60 2.30 1.59 3.23 3.80

V 4.91 3.74 4.20 4.69 2.89 4.37 2.57 20.41 2.83 1.42 2.92 3.95 2.90 3.21 3.22 3.22 1.93 1.36 4.45 4.18

L 4.59 4.20 4.03 4.86 2.93 5.32 2.77 20.37 2.07 1.41 5.77 4.19 2.50 4.88 3.12 3.46 1.40 2.31 3.15 4.99

F 5.33 4.69 4.86 5.34 3.68 5.28 3.00 0.14 3.34 1.75 5.83 5.83 4.25 3.47 2.87 4.25 0.99 3.11 3.57 4.49

C 1.76 2.89 2.93 3.68 7.65 1.84 1.46 20.25 1.03 2.48 2.14 2.47 2.74 4.12 2.51 1.33 0.24 20.42 2.05 2.81

M 5.25 4.37 5.32 5.28 1.84 6.02 2.30 0.91 2.09 1.61 4.89 4.81 3.38 4.65 3.88 4.18 0.36 2.30 3.93 3.62

A 2.84 2.57 2.77 3.00 1.46 2.30 20.52 21.77 1.21 0.39 3.37 2.47 1.22 2.59 1.71 1.72 1.13 1.69 2.13 1.90

G 0.77 20.4 20.4 0.14 20.3 0.91 21.8 4.40 0.21 21.5 1.42 1.25 20.5 1.08 20.9 0.70 20.1 20.5 1.33 1.59

T 3.05 2.83 2.07 3.34 1.03 2.09 1.21 0.21 1.27 1.91 5.12 3.14 2.65 2.71 2.88 1.82 3.88 2.52 3.67 3.77

S 1.00 1.42 1.41 1.75 2.48 1.61 0.39 21.5 1.91 20.1 2.87 2.30 1.33 0.80 2.60 2.00 2.94 1.77 2.74 2.82

W 6.24 2.92 5.77 5.83 2.14 4.89 3.37 1.42 5.12 2.87 5.85 6.19 7.87 6.46 1.20 1.37 2.62 3.54 5.76 8.57

Y 5.61 3.95 4.19 5.83 2.47 4.81 2.47 1.25 3.14 2.30 6.19 5.93 4.22 6.05 4.54 2.05 1.76 3.66 5.26 5.28

P 3.27 2.90 2.50 4.25 2.74 3.38 1.22 20.51 2.65 1.33 7.87 4.22 0.60 2.89 3.17 3.50 1.46 3.09 3.75 3.99

H 3.38 3.21 4.88 3.47 4.12 4.65 2.59 1.08 2.71 0.80 6.46 6.05 2.89 5.37 2.30 4.00 5.20 2.38 2.72 4.90

E 3.20 3.22 3.12 2.87 2.51 3.88 1.71 20.9 2.88 2.60 1.20 4.54 3.17 2.30 1.65 1.95 0.08 2.68 5.32 5.75

Q 3.60 3.22 3.46 4.25 1.33 4.18 1.72 0.70 1.82 2.00 1.37 2.05 3.50 4.00 1.95 2.83 3.26 3.45 3.50 4.50

D 2.30 1.93 1.40 0.99 0.24 0.36 1.13 20.08 3.88 2.94 2.62 1.76 1.46 5.20 0.08 3.26 0.13 3.85 3.90 4.94

N 1.59 1.36 2.31 3.11 20.4 2.30 1.69 20.54 2.52 1.77 3.54 3.66 3.09 2.38 2.68 3.45 3.85 2.92 3.17 3.85

K 3.23 4.45 3.15 3.57 2.05 3.93 2.13 1.33 3.67 2.74 5.76 5.26 3.75 2.72 5.32 3.50 3.90 3.17 3.24 2.29

R 3.80 4.18 4.99 4.49 2.81 3.62 1.90 1.59 3.77 2.82 8.57 5.28 3.99 4.90 5.75 4.50 4.94 3.85 2.29 2.87

MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1085

7. The peptide inhibitor has a disadvantage; it hardly enters into the cell. As an entire drug design, one

can design a corresponding chemical molecule with the same active plot according to peptide in-

hibitor if it is needed. Certain chemical modifications should be made, such as methylation, so that

make the inhibitor has better character.

The working flowchart is as follows:

Run our PEPTIDE.m script program

Using HOMO calculation as first round of selection

Using docking as second round of selection

Final evaluation by biological assays

Chemical modification for further improvement

Select an initial residue:

a) Compare active atoms structure of exited ligands.

b) Compare active atoms (active spot) of protein.

Preparing work for active pocket and active residues:

a) Molecular dynamics

b) Full electronic structure calculation

c) Protein pocket calculation

1086 PANG ET AL.

3. RESULTS AND DISCUSSION

We now apply our mathematical model to design peptide inhibitors for protein CypA and FKBP12.

This work is based on our previous article (Pang et al., 2008).

3.1. The peptide inhibitor design of CypA

Previously (Pang et al., 2008), we obtained the ligand binding pocket and active atoms of CypA. Now,

suppose we want to design a tripeptide inhibitor for CypA.

1. Selecting an initial residue for the tripeptide inhibitor. We select the residue Pro as the initial residue

of tripeptide inhibitor for CypA. Why? The residue Pro is one of three residues (Pro, Gly, and Cys)

with special characters, and it has a ring of sub-amino acids, which is often the position of active

atoms. And we had identified the active residues of the receptor CypA: Phe113 and Phe60 (Pang et

al., 2008). From the M-J matrix, we found that residue Pro has the strongest contact energy with

residue Phe. Besides, Pro is a hydrophobic and non-polar residue, and the ligand binding site of CypA

consists primarily of hydrophobic and non-polar residues. Thus, we selected the residue Pro as the

initial residue of the tripeptide inhibitor.

2. Running the PEPTIDE.m program. After running our PEPTIDE.m program, five tripeptide inhibitors

were suggested for CypA: AGP, Ala-Val-Pro, Val-Ile-Pro, Ala-Trp-Pro, and Ile-Ala-Pro.

3. HOMO calculation (first criterion). Suppose a peptide is composed of n residues X¼X1X2X3 � � �Xn, if

we take a peptide as a Markov chain, then its probability is:

P(X)¼P(x1)P(X2j X1)P(X3j X2) � � �P(Xn� 1j Xn):

According to the Bayes formula P(Xi jXi�1)¼P(Xi�1Xi)/P(Xi�1), we can calculate the HOMO of peptide

pair by pair residues. The HOMO of X-Pro and X-Gly versus the q values (Table 1) are shown in Figure 2a, b,

respectively.

We can see from Figure 2 that Gly-Pro has the highest HOMO among X-Pro pair and Ala-Gly has the

highest HOMO among X-Gly pair. According to the two rules on protein-ligand interaction we proposed

previously and the character of Markov chain, we can conclude that AGP may be the most promising

peptide inhibitor, as a combination of X-Gly and X-Pro dipeptides. It explains well the previous

a b

FIG. 2. HOMO of X-Pro pairs and X-Gly pairs against q value. (a) The dipeptide Gly-Pro has the highest HOMO

energy among X-Pro. (b) The dipeptide Ala-Gly has the highest HOMO energy among X-Gly.

MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1087

experimental observation that CypA recognition of hexapeptides involves contacts with peptide residues

Ala, Gly, and Pro, and is independent of the context of longer sequences (Vajdos et al., 1997).

4. Docking (second criterion). We take the CypA—tripeptide AGP system to run dock process using the

program Autodock4.0 (Morris et al., 1998; Sousa et al., 2006) with the Lamarckian genetic algorithm

(GA) and default parameters. The 200 conformations are performed for each ligand, and the maxi-

mum number of energy evaluations performed during each GA calculation is 3,000,000 steps, which

is big enough to test whether the complex system is converged or not. At the end of docking, a cluster

analysis is performed on the results of docking conformations. The docking results are as follows:

� The conformation of the first cluster with the lowest free energy occupied 168 of 200 conformations.� The estimated free energy for the first cluster is DG¼�7.25 (Kcal/mol).� The inhibition constant for the first cluster is 4.88 mM.� The convergence situation of all conformations is excellent (Fig. 3).

FIG. 4. Spatial configuration of the peptide AGP and the active pocket of CypA. (a) Peptide AGP (blue) perfectly lies

down in the ligand binding pocket of CypA (white). The binding mode of AGP/CypA is generated by Autodock. (b)

The active spot of AGP just covers the active spot of CypA. The small color circles are active atoms of AGP, and they

are located right above the active atoms of protein CypA (large gray circles).

FIG. 3. Number of conformations of peptide AGP in each cluster. The first cluster populates 168 conformations with

the lowest binding energy of �7.25 Kcal/mol. The convergence situation of all conformations is excellent.

1088 PANG ET AL.

� The position of the conformation in the lowest free energy perfectly inserts into the active pocket of

CypA (Fig. 4a).

5. Checking the relation of active spots between CypA and peptide AGP. The active atoms of the

tripeptide AGP were obtained as shown in Table 3 (according to the method described in our previous

work). For protein CypA, the binding pocket and the active atoms had been obtained previously (Pang

et al., 2008). The active spot of CypA has seven atoms forming a quincunx-type. The conformation of

the active atoms of both AGP and CypA are depicted in Figure 4b, where we can see that the active

atoms of AGP cover exactly the active region of CypA—that is, the active spot of the peptide AGP

just cover the active spot of CypA.

Table 3. The Active Spot of the Peptide AGP

Atom Residue X Y Z

C GLY6 51.721 27.873 �11.020

O GLY6 51.972 27.654 �12.218

N PRO4 51.859 29.096 �10.481

CA PRO4 52.326 30.269 �11.241

CB PRO4 51.974 31.423 �10.289

C PRO4 53.858 30.191 �11.443

OT1 PRO4 54.305 30.196 �10.549

OT2 PRO4 54.314 30.135 �12.584

FIG. 5. Sensorgram for AGP and CsA binding to CypA surface on the CM5 sensor chip. Binding responses are

shown for AGP and CsA injected at concentrations of 0.625, 1.25, 2.5, 5, and 10 mM (bottom to top). The biosensor

RUs are concentration-dependent. The equilibrium constants (KD values) evaluating the protein-ligand binding affin-

ities are denoted.

MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1089

This perfect coverage of the docking system not only proved the two rules on protein-ligand interactions

(the higher the HOMO of ligand, the stronger the protein-ligand interactions), but also proved that this

model based on HMM mathematical theory is feasible.

6. Performing biological assays: binding affinity determination and inhibition of PPIase activity of

CypA. The binding affinity of the designed peptides to CypA was measured by surface plasmon

resonance (SPR) with Biacore 3000 instrument (BiacoreAB Corp., Uppsala, Sweden) as described

elsewhere (Chen et al., 2007; Thurmond et al., 2001). AGP was found to bind to CypA in a con-

centration-dependent manner with a KD value of 1.95�10�6 M (Fig. 5), whereas CsA, as a positive

control, showed a KD value of 6.42�10�6 M.

Since AGP could bind to CypA, we sequentially measured its inhibition of PPIase activity of CypA. The

standard spectrophotometric method was applied to determine the inhibitory activity of the compounds on

PPIase. During the assay, the rate constants for the cis–trans conversion were evaluated by fitting the data

to the integrated first-order rate equation through nonlinear least-square analysis.

Inhibitory rate (%)¼ [CypA (dAbs=time) � compounds (dAbs=time)]=[CypA (dAbs=time)

� control (dAbs=time)]

The inhibition results are shown in Figure 6. As a positive control, CsA showed an inhibition of 59.95%

against the PPIase activity of CypA at 1mM, while AGP an inhibition of 37.47% at 1mM.

Our designed peptide AGP has the same order of binding affinity and inhibition of PPIase activity as

CsA. Besides, AGP has no impact on cell proliferation and cell cycle (data not shown). Therefore, peptide

AGP may be new inhibitor for the CypA.

It should be noted that, before obtaining inhibitor AGP for CypA by our model, we were unaware of the

two guesses proposed by Vajdos et al. (1997) through experiments concerning the hexapeptides His-Ala-

Gly-Pro-Ile-Ala in 1997: one is ‘‘CypA recognition of these hexapeptides involves contacts with peptide

residues Ala(Va1) 88, Gly 89, and Pro 90, and is independent of the context of longer sequences’’; the other

is ‘‘the CypA active site is complementary to sequences containing the dipeptide Gly-trans-Pro.’’ Our

calculation results (Table 3 and Fig. 4b) happened to explain well the above two guesses from a theoretical

view point—that is, the active spot of AGP is only located in Gly and Pro (not in Ala), and it perfectly

covers the active spot of CypA. Our findings are independent and derive from the calculation of the full

electronic structure of CypA, two rules on protein-ligand interactions, and a statistical mathematical model.

The prediction of AGP is not accidental.

a b

FIG. 6. CypA PPIase inhibitory activities of AGP and CsA at 1mM. CypA was pre-incubated with 1 mM tested

compounds, and the PPIase activity was evaluated by fitting the data to the integrated first-order rate equation through

nonlinear least-square analysis. (a) The value of dAbs/time represents the rate constant for the cis–trans conversion. (b)

The percent inhibition of the PPIase activity of AGP and CsA at 1 mM.

1090 PANG ET AL.

3.2. The peptide inhibitor design of FKBP12

For FKBP12, we point out that why we select Gln residue as the initial residue for peptide inhibitor and

why we chose dipeptide as its inhibitor, but not tripeptide. We also prove another way to select the initial

residue.

From our previous article (Pang et al., 2008), we knew that the inhibitor FK506 bound well to FKBP12

and we knew the structure of its active spot. Through comparison, we found out that the side chain of

residue Gln has similar construction to that of the active spot of FK506, and residue Gln was therefore

selected as the initial residue. Besides, the active pocket of FKBP12 is smaller than that of CypA; thus, the

dipeptides were chosen as potential inhibitors to test on FKBP12.

After running the PEPTIDE.m program, three dipeptides were suggested: Gly-Gln, Ile-Gly, and Val-Gly.

As the second round of selection, the HOMO calculation suggested the Gly-Gln with the highest HOMO

(Fig. 7). Then we docked Gly-Gln to FKBP12 through the program Autodock to generate their binding

model, and the same docking parameters as the above AGP were employed.

The results of Autodock and active spot of Gly-Gln are as follows:

� The conformation of the first cluster with the lowest free energy occupied 62 of 200 conformations.� The estimated free energy of conformation in the lowest free energy is DG¼�6.02 (Kcal/mol).� The inhibition constant is 38.79mM.

FIG. 7. HOMO of X-Gln pairs against q value. Dipeptide Gly-Gln has the highest HOMO energy among X-Gln.

FIG. 8. Number of conformations of peptide Gly-Gln in each cluster. The convergence situation of all conformations

is excellent.

MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1091

� The convergence of conformation is good (Fig. 8).� The active spot of Gly-Gln comprises six atoms as shown in Table 4. It perfectly covers the active spot

of FKBP12 in the active pocket, as shown in Figure 9.

All these docking results demonstrate that Gly-Gln may be a peptide inhibitor for FKBP12. A biological

assay needs to be done to evaluate the interactions between Gly-Gln and FKBP12. Unfortunately, we have

not performed such an assay yet.

4. CONCLUSION

We have proved that our mathematical model can be applied to peptide inhibitor design for a target

protein based on previous two rules on the protein-ligand interactions, on the M-J matrix, and on HMM.

Our results on CypA and FKBP12 show that the approach is promising for this type of problem, which is

typical of the de novo drug design problems currently being tackled by other workers in the field. Our

method does not require exhaustive search, and the properties of the suggested peptides can at least guide

the design of a novel compound. We have taken a step forward towards building a mathematical theory to

select peptide inhibitors for proteins. Our mathematical model is rough at present, especially its emission

matrix. How to perfect the emission matrix is still a challenge for us and needs more investigation. If it

represents a correct direction for biological theoretical trends, this mathematical model can be further

developed.

Table 4. The Active Atoms of Gly-Gln

Atom Residue X Y Z

O GLY2 27.403 18.511 40.115

C GLN3 25.621 15.257 41.099

OT1 GLN3 24.665 15.519 41.229

OT2 GLN3 26.436 14.305 41.766

N GLN3 27.644 16.309 40.226

CA GLN3 26.210 16.081 39.968

FIG. 9. Spatial configuration of dipeptide Gly-Gln and the binding pocket of FKBP12. (a) The active spot of Gly-

Gln (blue) covers the active spot of FKBP12 (white) in the active pocket. (b) The color circles are active atoms of

Gly-Gln.

1092 PANG ET AL.

ACKNOWLEDGMENTS

We thank Ye Yuanjie, Wang Xun, and Ye Ling for their kind help, and the Modern Applied Mathe-

matical Key Laboratory in Shanghai (Department of Mathematics at Fudan University) and Shanghai

Supercomputer Center (SSC) for providing the parallel computer. This work was supported by the National

Basic Research Program of China (grant 2006CB504509) and the Project of the State Key Program of

National Natural Science Foundation of China (grant 10635060).

DISCLOSURE STATEMENT

No competing financial interests exist.

REFERENCES

Chen, S.A., Zhao, X.M., Tan, J.Z., et al. 2007. Structure-based identification of small molecule compounds targeting

cell cyclophilin A with anti-HIV-1 activity. Eur. J. Pharmacol. 565, 54–59.

Gilbert, W. 1991. Towards a paradigm shift in biology. Nature 349, 99–99.

Glaser, F., Steinberg, D.M., Vakser, I.A., et al. 2001. Residue frequencies and pairing preferences at protein-protein

interfaces. Proteins 43, 89–102.

Lewis, R.A., and Leach, A.R. 1994. Current methods for site-directed structure generation. J. Comput. Aided Mol.

Design 8, 467–475.

Li, H., Tang, C., and Wingreen, N.S. 1997. Nature of driving force for protein folding: a result from analyzing the

statistical potential. Phys. Rev. Lett. 79, 765–768.

Miyazawa, S., and Jernigan, R.L. 1985. Estimation of effective interresidue contact energies from protein crystal-

structures—quasi-chemical approximation. Macromolecules 18, 534–552.

Pang, X., Zhou, L., Zhang, L., et al. 2008. Two rules on the protein-ligand interaction. Nat. Proc. http://precedings

.nature.com/documents/2728/version/1.

Thurmond, R.L., Wadsworth, S.A., Schafer, P.H., et al. 2001. Kinetics of small molecule inhibitor binding to p38

kinase. Eur. J. Biochem. 268, 5747–5754.

Vajdos, F.E., Yoo, S.H., Houseweart, M., et al. 1997. Crystal structure of cyclophilin A complexed with a binding site

peptide from the HIV-1 capsid protein. Protein Sci. 6, 2297–2307.

Address correspondence to:

Dr. Xinyi Zhang

Department of Physics

Fudan University

Shanghai 200433, China

E-mail: [email protected]

MATHEMATICAL MODEL FOR PEPTIDE INHIBITOR DESIGN 1093