evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes
TRANSCRIPT
![Page 1: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/1.jpg)
Molecular Microbiology (2005)
56
(4), 1017–1034 doi:10.1111/j.1365-2958.2005.04566.x
© 2005 Blackwell Publishing Ltd
Blackwell Science, LtdOxford, UKMMIMolecular Microbiology0950-382XBlackwell Publishing Ltd, 2005
? 2005
56
410171034
Original Article
Cognate amino acid bias in amino acid biosynthesisR. Alves and M. A. Savageau
Accepted 5 January, 2005. *For correspondence. [email protected]; Tel. (
+
1) 530 754 8375; Fax (
+
1) 530 7545739.
Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes
Rui Alves
1,2
and Michael A. Savageau
1
*
1
Biomedical Engineering Department, University of California – Davis, Davis, CA, USA.
2
Biomathematics and Biostatistics Group, Departament Ciencies Mediques Basiques, Universidad de Lleida, Spain.
Summary
If the enzymes responsible for biosynthesis of a givenamino acid are repressed and the cognate amino acidpool suddenly depleted, then derepression of theseenzymes and replenishment of the pool would beproblematic, if the enzymes were largely composed ofthe cognate amino acid. In the proverbial ‘Catch 22’,cells would lack the necessary enzymes to make theamino acid, and they would lack the necessary aminoacid to make the needed enzymes. Based on thisscenario, we hypothesize that evolution would lead tothe selection of amino acid biosynthetic enzymes thathave a relatively low content of their cognate aminoacid. We call this the ‘cognate bias hypothesis’. Herewe test several implications of this hypothesisdirectly using data from the proteome of
Escherichiacoli
. Several lines of evidence show that low cognatebias is evident in 15 of the 20 amino acid biosyntheticpathways. Comparison with closely related
Salmo-nella typhimurium
shows similar results. Comparisonwith more distantly related
Bacillus subtilis
showsgeneral similarities as well as significant differencesin the detailed profiles of cognate bias. Thus, selec-tion for low cognate bias plays a significant role inshaping the amino acid composition for a large classof cellular proteins.
Introduction
Proteins are versatile effectors and mediators of cellularresponse. They catalyse reactions, serve as structuralcomponents of the cell and mediate cellular adaptationthrough sensing and signal transduction. Structure andfunction of proteins are subject to natural selection (King
and Jukes, 1969; Richmond, 1970; Li, 1997) and,because structure and function are ultimately determinedby the sequence of amino acids in proteins, it stands toreason that the amino acid composition of proteins is alsosubject to selection.
Previous studies have identified different types of selec-tive pressure that are important in determining the relativeamino acid composition for the proteins of an organism.Differences in the mutational bias of the different codonsfor each amino acid can partly account for the differencesin relative amino acid composition (see Lobry, 1997;Singer and Hickey, 2000; Akashi and Gojobori, 2002;Seligmman, 2003 and references therein). The metaboliccost of synthesizing an amino acid, in terms of ATP andreducing equivalents, is also important in determiningwhich amino acids are more prevalent in a proteome. Thecheaper an amino acid is to synthesize, the more it is used(see Karlin and Bucher, 1992; Lobry and Gautier, 1994;Dufton, 1997; Jansen and Gerstein, 2000; Akashi andGojobori, 2002; Seligmman, 2003 and referencestherein). Functional reasons that justify differential usageof amino acids in a given group of proteins have also beenidentified (Trifonov, 1987; Mazel and Marliere, 1989; Karlinand Bucher, 1992). For example, membrane proteins arebiased towards high relative composition of hydrophobicamino acids (Karlin and Bucher, 1992).
Genes coding for amino acid biosynthetic enzymes arerepressed in a medium where the cognate amino acid ispresent.
Escherichia coli
and many other bacteria typicallyderepress the expression of a small set of enzymes when-ever there is a need to synthesize any given amino acidor set of amino acids. These are usually encoded in anoperon or regulon, and their expression tends to beco-ordinated (see Herrmann and Somerville, 1983;Neidhardt, 1999 for reviews). When growing in a mediumwith low amino acid content, a significant fraction of cel-lular protein consists of enzymes involved in amino acidbiosynthesis (Maaløe and Kjeldgaard, 1966; Neidhardt
et al
., 1990).Although cells have general mechanisms such as
induction of proteolyses and activation of the stringentresponse for remodelling the amino acid content of pro-teins when the organisms is stressed (Reeve
et al
., 1984;Matin, 1991; Foster and Spector, 1995; Magnusson
et al
.,2003; Weichart
et al
., 2003; Nystrom, 2004) by amino acidlimitation, they also are likely to have more specific mech-
![Page 2: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/2.jpg)
1018
R. Alves and M. A. Savageau
© 2005 Blackwell Publishing Ltd,
Molecular Microbiology
,
56
, 1017–1034
anisms. For example, if cells were growing in a richmedium and suddenly one of the exogenously suppliedamino acids became depleted, then derepression of thecorresponding biosynthetic enzymes and replenishmentof the intracellular pool of that amino acid would bedelayed and limited if the enzymes were largely com-posed of the cognate amino acid. Based on such a sce-nario, we hypothesize that evolution would lead to theselection of amino acid biosynthetic enzymes that have arelatively low content of their cognate amino acid, thusavoiding the ‘Catch 22’ situation in which the biosyntheticenzymes cannot be synthesized for lack of the amino acidand the amino acid cannot be synthesized for lack of thebiosynthetic enzymes.
To explore the dynamics of this situation, we first adaptan existing computer model for an amino acid biosyntheticpathway (Xiu
et al
., 2002) and show that low cognate biascorrelates with a greater extent of derepression of thepathway and with faster response times for this derepres-sion. This suggests that our cognate-bias hypothesis isreasonable and that response time may well be the selec-tive pressure for the low bias. We test several implicationsof this hypothesis directly using data from the well-char-acterized organism
Escherichia coli
, and we comparethese results with the results from similar tests for aclosely related Gram-negative organism
Salmonella typh-imurium
and for a more distantly related Gram-positiveorganism
Bacillus subtilis.
For each organism, we calculate the amino acid com-position of proteins that are involved in the amino acidbiosynthetic pathways and compare their compositionwith that of larger groups of proteins from the same organ-ism, including the entire proteome. We find that mostamino acid biosynthetic pathways in each of the organ-isms do have a low cognate bias. There are a few excep-tions and in some cases there are functional reasons thatcan account for this. The closely related organisms havevery similar profiles of cognate bias, whereas the moredistantly related organisms have profiles with significantdifferences that may reflect their different evolutionary his-tory and ecological niche.
Results
Amino acid composition
To determine whether a protein or a group of proteins hasa relative amino acid composition that is significantly dif-ferent from that of a larger group to which it belongs, onemust first determine the composition of the larger group.Choosing an appropriate group of proteins to serve as acontrol for the calculation of average amino acid compo-sition requires careful consideration. The relative compo-sition of the control group, which is then used to estimatethe probability of amino acid occurrence in a protein,
should be a weighted average of the proteins being syn-thesized by the cell. This is so because the synthesis ofan amino acid biosynthetic enzyme during derepressionand the synthesis of all other proteins being expressed atthe same time compete for the limiting amino acid. Therelative amino acid composition of the protein complementin growing cells provides an estimate of the average com-position of these proteins. Proteins expressed at low lev-els have a small contribution to the overall amino acidcomposition of cellular protein, whereas proteins with ahigh level of expression have a large contribution.
Thus, we have searched the literature for experimentaldeterminations of the amino acid composition in theprotein complement of the bacteria studied in this work.Having found such studies we then compared the exper-imentally determined composition with the composition ofother groups of cellular proteins. We found that the relativeamino acid composition of the protein complement ingrowing cells is almost identical to that of the cellularproteome determined from the DNA sequence (Table 1).For each genome, we also have calculated the relativecomposition for the entire group of non-enzymatic pro-teins, for the entire group of enzymatic proteins, and foreach more specialized group of enzymes within an ECclassification (i.e. classes 1 through 6). Class 1 includesall oxyreductase enzymes, class 2 all transferases, class3 all hydrolases, class 4 all lyases, class 5 all isomerasesand class 6 all ligases.
The results of the analysis presented in this article donot differ significantly when the different groups of proteinsare used to calculate the probability of amino acid occur-rence. Therefore, we present only the results based onthe relative amino acid composition of the proteome tocalculate the probability of amino acid occurrence.
Verifying two basic assumptions
There are basic assumptions involving each of the twotypes of Monte Carlo (MC) simulations that we have usedto determine the statistical significance of our compari-sons. The first MC approach assumes that, with respectto the relative amino acid composition, there is no strongcorrelation between any two different types of aminoacids. To test the validity of this assumption for the
E. coli
proteome, we have calculated the Spearman correlationcoefficient between the relative amounts of any two aminoacids in the proteins. These correlation coefficients aresmall, which supports, to a first approximation, our firstassumption (Table S1). (This has no implications regard-ing finer detail correlations between neighbouring aminoacids or other factors that have been shown to influencethe selection of amino acids at any given location in aprotein; Cootes
et al
., 1998.) The second MC approachassumes that the relative composition of a protein is inde-
![Page 3: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/3.jpg)
Cognate amino acid bias in amino acid biosynthesis
1019
© 2005 Blackwell Publishing Ltd,
Molecular Microbiology
,
56
, 1017–1034
pendent of the protein length. To test the validity of thisassumption, we have calculated the Spearman correlationcoefficient between the relative amounts of each aminoacid and the length of the
E. coli
proteins. These correla-tions are also small, which supports the second assump-tion (Table S1).
Having verified that the two assumptions above are, toa first approximation, correct allows us to calculate, inclosed form, the significance of the amino acid biasfor any given protein (see
Experimental procedures
).Although we have used these three different approachesto calculate how significantly biased our proteins are andhave used 10 different control groups, for each protein ofinterest and for each method of calculation and controlgroup, the differences are at most a few per cent (data notshown). Therefore, we shall only present and discuss thedata for the analytical approach using the entire proteomeas the control group.
By using the analytical approach, we estimate the cog-nate amino acid bias of a protein by the probability (
P
-value) that the relative cognate amino acid compositionof a protein is below (low bias) or above (high bias) thatof the control group (see
Experimental procedures
fordetails).
Effect of cognate bias on time for amino acid recovery
Amino acid biosynthetic pathways in bacteria are repress-ible (Herrmann and Somerville, 1983; Neidhardt, 1999).For example, when growing in a medium in which anamino acid is available,
E. coli
cells typically repress
expression of the genes that code for enzymes of thecognate pathway. This situation is represented in Fig. 1.When the proteins of a pathway that synthesize a givenamino acid have a composition that is enriched for thatamino acid (high cognate bias), it is likely that this highcognate bias will tend to prevent or delay the recovery ofamino acid levels when cells are shifted from a mediumthat is rich in the amino acid to one that is poor.
To analyse this hypothesis in a specific case we use apreviously developed model (Xiu
et al
., 2002) of Trp bio-synthesis in
E. coli
. The original normalized equations arethe following
Table 1.
Average relative amino acid composition of bacterial proteins and of two different environments.
Amino acid
E. coli
a
S. typhimurium
a
B. subtilis
a
E. coli
b
S. typhimurium
b
B. subtilis
b
Soil
c
Intestine
c
Ala 0.093 0.098 0.077 0.112 – 0.045 Intermediate LowArg 0.054 0.012 0.008 0.050 – 0.064 Low HighAsn 0.040 0.052 0.052 0.050 – 0.037 Low LowAsp 0.051 0.056 0.072 0.050 – 0.037 Intermediate LowCys 0.012 0.039 0.045 0.017 – 0.013 Low LowGln 0.045 0.074 0.069 0.056 – 0.072 Low LowGlu 0.057 0.023 0.023 0.056 – 0.072 Low HighGly 0.072 0.059 0.074 0.086 – 0.058 High LowHis 0.021 0.043 0.071 0.017 – 0.024 Low LowIle 0.061 0.110 0.096 0.046 – 0.067 Low LowLeu 0.114 0.028 0.028 0.091 – 0.086 Low LowLys 0.045 0.038 0.039 0.056 – 0.090 Low HighMet 0.026 0.045 0.037 0.024 – 0.032 Low LowPhe 0.039 0.044 0.038 0.034 – 0.055 Low LowPro 0.045 0.056 0.041 0.042 – 0.035 Low LowSer 0.056 0.058 0.063 0.049 – 0.043 High LowThr 0.055 0.055 0.054 0.053 – 0.042 Intermediate LowTrp 0.015 0.070 0.068 0.011 – 0.021 Low HighTyr 0.030 0.015 0.010 0.028 – 0.038 Low HighVal 0.069 0.029 0.035 0.072 – 0.068 Low Low
a.
Calculated from the translated version of coding sequences in the genomes.
b.
Experimental determination of the amino acid fraction in the cell after total protein purification and hydrolysis. The values for
B. subtilis
havebeen calculated from Sauer
et al
. (1996). The values for
E. coli
have been calculated from Pramanik and Keasling (1998).
c.
Qualitative amino acid make-up of two different environments (Savageau, 1983) that are relevant for these organisms.
Fig. 1.
Schematic model of a specific amino acid biosynthetic path-way in its cellular context.
X
1
– mRNA coding for the enzymes of the pathway that synthesizes the
k
th amino acid;
X
2
– enzymes of the pathway that synthesizes the
k
th amino acid;
X
3
– cognate amino acid (
k
th) of the biosynthetic pathway. See text for further discussion.
Prek AAk
NA mRNAk
AA20
AA1
•••
•••
AA20
AA1
•••
•••
Enzk
X1
X2
X3
![Page 4: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/4.jpg)
1020
R. Alves and M. A. Savageau
© 2005 Blackwell Publishing Ltd,
Molecular Microbiology
,
56
, 1017–1034
(1)
(2)
(3)
X
1
represents the concentration of the mRNA that codesfor the enzymes of the biosynthetic pathway. The synthe-sis of mRNA is repressed by an increase in the concen-tration of the amino acid (
X
3
) that is synthesized by thepathway. The term
k
3
/(
k
3
+
X
3
) in Eq. 1 represents theattenuation by the leader peptide of the operon in thepresence of Trp. The term (1
+
X
3
)/{1
+
[1
+
k
1
X
3
/(
k
2
+
X
3
)]
X
3
]} is a normalized function of the effect that Trphas on the repressor protein and on the repressor bindingto the operator, assuming rapid equilibrium of both reac-tions. The decay of the mRNA molecules is a first-orderprocess. The mRNA molecules are templates for the syn-thesis of the enzymes (
X
2
) in the biosynthetic pathway. Trp(
X
3
) synthesis is an enzymatic process whose rate isdescribed by
k
5
X
2
/(
k
5
+
X
32
). The free Trp pool can bedepleted by dilution (
k
6
X
3
), binding to the repressor {[
X
3
/(1
+
X
3
)][
k
6
X
3
/(
k
7
+
X
3
)]} or usage in protein synthesis[
k
8
X
3
/(
k
9
+
X
3
)] (for further details on the form of Eqs 1 to3, see Xiu
et al
., 2002).We have modified this model to include the possibility
of an exogenous supply for Trp (
k
11
). The equation thataccounts for the time-dependent behaviour of the enzymeconcentration (
X
2
) is not explicitly dependent on the Trpconcentration (
X
3
) in the original model. This is a justifiedapproximation because the number of Trp residues is verylow in the enzymes that catalyse Trp biosynthesis. Nowimagine an organism that is identical to the first, exceptthat the number of Trp residues in the Trp-biosyntheticenzymes is large. In this situation, the influence of Trpconcentration on the rate of synthesis of the Trp-biosyn-thetic enzymes needs to be made explicit. To do this, wemodify the rate of enzyme production in Eq. 2 to includea Henri-Michaelis-Menten dependence on the concentra-tion of the cognate amino acid. The new equation thatdescribes the time-dependent behaviour of the enzymelevel (
X
2
) is now
(4)
When
K
M
=
0, Eq. 4 is the same as Eq. 2. The larger the
K
M
, the larger the relative amount of cognate amino acidin the biosynthetic enzymes.
ddXt
Xk X
k XX
kk X
k X1 3
1 3
2 33
3
3 34 1
1
1 1=
+( )
+ ++
ÊË
ˆ¯
ÈÎÍ
˘˚̇
+( )-
ddXt
k X k X210 1 1 2= -
ddXt
kk X
k Xk X
XX
k Xk X
k Xk X
311
5 2
5 32 6 3
3
3
6 3
7 3
8 3
9 31= +
+( )- -
+( ) +( )-
+( )
ddXt
k X Xk X
k XM
2 10 1 3
31 2=
+( )-
We use Eqs 1, 3 and 4 and parameter values from Xiuet al. (2002) to simulate the following experiment involv-ing biosynthetic enzymes with increasing levels of cog-nate amino acid in their composition. Let bacteria growexponentially in a Trp-rich medium until a steady statehas been achieved and then, at time zero, switch them tovarious media, with different lower amounts of Trp. Thiswill lead to the derepression of the Trp-biosyntheticenzymes. The results of such an experiment are shownfor three different shifts (Fig. 2). The lower the Trp levelsin the poor medium, the higher the derepressed enzymelevels. For increasing relative composition of Trp in thebiosynthetic enzymes, the organism will take longer toproduce a similar amount of enzyme and thus to set upan appropriate response to the challenge of amino aciddepletion. Furthermore, as the relative amount of aminoacid in the biosynthetic enzymes increases, the aminoacid levels in the new steady state decrease. For a largedepletion of amino acid in the medium, biosyntheticenzymes with high relative amino acid compositionexhibit an initial bout of synthesis, but then fail to besynthesized at a steady state rate (compare among pan-els A, C and E of Fig. 2) sufficient to produce acceptableamino acid levels (compare among panels B, D and F ofFig. 2).
Thus, in nature, with highly variable environmentallysupplied Trp levels, the cells with a higher Trp content intheir Trp-biosynthetic enzymes would be out-competed bythose with a lower Trp content. The regulatory loops in thebiosynthesis of other amino acids are similar to the onewe have analysed, which suggests that compositioneffects on temporal responses are common phenomena.
Correlation between cognate bias and molecular activity
The specific activity of an enzyme is determined by theproduct of its molecular activity and the number of enzymemolecules. For a given specific activity, those enzymeswith the lowest molecular activity require the largestnumber of molecules and their synthesis consumes thelargest amount of the cognate amino acid. During thecritical phase of derepression, the most rate-determiningenzymes in amino acid biosynthetic pathways will beunder strong selection to minimize the content of theircognate amino acid. Hence, we expect the cognate biasof these enzymes to be directly correlated with theirmolecular activity.
Estimates of the specific activity for most enzymesinvolved in amino acid biosynthetic pathways can be foundeither in the primary literature or in the BRENDA data-base. Specific activity is determined by the amount ofreaction catalysed during each time unit by a fixed weightof enzyme (usually having units of mM min-1 mg -1). Weestimate the molecular weight of the enzymes by adding
![Page 5: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/5.jpg)
Cognate amino acid bias in amino acid biosynthesis 1021
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
the weight of their amino acid residues and subtractingthe weight of one water molecule per peptide bond. Usingthis information, together with the specific activity, we cancalculate a molecular activity for each of the enzymes.However, one needs to keep in mind that the purificationmethods and conditions under which the specific activitieshave been determined for the different enzymes are notthe same. Thus, it is likely that some errors are introducedin the calculations. The numbers for both the measuredspecific activity and the calculated molecular activity of theenzymes graphically represented in Fig. 3 are shown inTable 2.
As indicated above, we expect the most rate-determin-ing enzymes in the amino acid biosynthetic pathways tohave a cognate bias, given by a P-value, that is positivelycorrelated with molecular activity. That is, the lower themolecular activity, the lower the cognate bias (P-value).The data in Table 3 show a statistically significant positivecorrelation between cognate bias and molecular activityfor the amino acid biosynthetic enzymes from each of thethree organisms. This supports the cognate bias hypoth-esis and suggests that fast recovery of amino acid poolsis a significant pressure in determining the cognate aminoacid composition of biosynthetic enzymes. The strengthand significance of this pressure is greater for the entericbacteria than for B. subtilis.
As controls, we have determined correlations betweencognate amino acid bias and three other factors that mightinfluence amino acid composition of amino acid biosyn-thetic enzymes. As shown in Table 3, these correlationsare much less significant. We also have determined thecorrelation between bias of each amino acid (non-cognateas well as cognate) and molecular activity for each of theproteins involved in amino acid biosynthesis (Table 4). Thecorrelations are in general low and non-significant, whichshows that the significant correlations are specific for thecognate amino acid.
Methionine provides a notable exception to this patternby exhibiting a statistically significant positive correlationbetween amino acid bias and molecular activity for all theamino acid biosynthetic enzymes. As the first amino acidof a protein is always Met, if enzymes contain fewer inter-nal Met residues, then the additional free Met residuescan be used to synthesize additional peptide chains andthus boost the velocity of the process in which theenzymes are involved. The selection for this effect shouldbe stronger when the molecular activity of an enzyme islower and the required rate of enzyme synthesis is corre-spondingly higher. This correlation should extend to allcellular enzymes, not just amino acid biosyntheticenzymes. However, testing this hypothesis is currently notfeasible because there is not enough information regard-
Rel
ativ
e pr
otei
n le
vels
00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
500 1000 1500 2000
Time
2500 3000 3500
A
Rel
ativ
e am
ino
acid
leve
ls
0
1E+00
1E-01
1E-02
1E-03
1E-04500 1000 1500 2000
Time
2500 3000 3500
BR
elat
ive
prot
ein
leve
ls
00
0.020.040.060.080.10
0.140.12
0.160.180.20
500 1000 1500 2000Time
2500 3000 3500
C
Rel
ativ
e am
ino
acid
leve
ls
0
1E+00
1E-01
1E-02
1E-03
1E-04500 1000 1500 2000
Time2500 3000 3500
D
Rel
ativ
e pr
otei
n le
vels
00
0.2
0.4
0.6
0.8
1.0
1.2
1.4
500 1000 1500 2000
Time
2500 3000 3500
E
Rel
ativ
e am
ino
acid
leve
ls
0
1E+00
1E-01
1E-02
1E-03
1E-04500 1000 1500 2000
Time
2500 3000 3500
F
Fig. 2. Time-course of derepression and amino acid recovery for the computer model of the tryptophan biosynthetic pathway from E. coli. Cells growing in a medium with excess Trp (k11 = 1) are switched to media containing vari-ous lower amounts of Trp (k11 < 1): A and B (k11 = 0.5), C and D (k11 = 0.41), E and F (k11 = 0, which corresponds to no Trp). Expres-sion of the trp operon undergoes derepression and is allowed to reach a new steady state. The upper curve in each panel corresponds to the case in which the rate of enzyme synthesis is independent of Trp concentration (kM = 0), and the curves then decrease in the order of increasing dependence on Trp concentration (kM = 0, 0.1, 1, 10, 20, 50, 100, 500). The steady state concentrations of Trp and of the Trp-bio-synthetic enzymes in a Trp-reduced media decrease with increasing kM.A, C and E. Dimensionless time-course for pro-tein levels, which are normalized with respect to the maximum derepressed steady-state value in (E).B, D and F. Dimensionless time-course for intra-cellular amino acid levels, which are normalized with respect to the same initial value.Note that the y-axes changes scale progres-sively from panel to panel in order to show differences while accommodating the increas-ing degrees of derepression. See text for fur-ther discussion.
![Page 6: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/6.jpg)
1022 R. Alves and M. A. Savageau
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
Tab
le 2
.E
nzym
es o
f Esc
heric
hia
coli,
Sal
mon
ella
typh
imur
ium
and
Bac
illus
sub
tilis
that
are
invo
lved
in th
e bi
osyn
thet
ic p
athw
ay fo
r ea
ch a
min
o ac
id.
Am
ino
Aci
dE
C n
umbe
r
Gen
eS
peci
fic a
ctiv
ity (
mM m
in-1
mg-1
)M
olec
ular
act
ivity
(M
ol r
eact
ant
s-1 M
ol
enzy
me-1
)
E. c
oli
S. t
yphi
mur
ium
B. s
ubtil
isE
. col
iS
. typ
him
uriu
mB
. sub
tilis
E. c
oli
S. t
yphi
mur
ium
B. s
ubtil
is
Ala
nine
2.6.
1.66
avtA
avtA
–0.
0196
0.01
96–
0.01
510.
0152
–2.
6.1.
42ilv
Eilv
Eyw
aA15
.915
.915
.99.
059.
0210
.75.
1.1.
1al
ral
ral
r, yn
cD13
391
014
3–
0.13
010
3A
rgin
ine
2.3.
1.1
argA
argA
argA
133
133
133
109
109
96.1
2.7.
2.8
argB
argB
argB
0.54
00.
540
0.54
00.
244
0.24
30.
249
1.2.
1.38
argC
argC
argC
0.95
00.
950
0.95
00.
569
0.56
90.
603
2.6.
1.11
argD
argD
argD
11.2
11.2
11.2
8.17
8.15
7.63
3.5.
1.16
argE
argE
amhX
800
800
800
564
562
567
2.1.
3.3
argF
–ar
gF29
00–
265
1780
–15
32.
1.3.
3ar
gIar
gI–
2900
1900
–17
8011
60–
6.3.
4.5
argG
argG
argG
12.8
12.8
4.54
10.6
11.1
3.39
4.3.
2.1
argH
argH
argH
0.38
00.
380
0.38
00.
319
0.32
00.
329
Asp
arag
ine
6.3.
5.4
asnB
asnB
asnB
, as
nH,
asnO
0.30
00.
300
0.30
00.
313
0.31
30.
353
6.3.
1.1
asnA
asnA
–57
.357
.3–
35.0
35.1
–3.
5.1.
1ia
aA,
ybik
iaaA
, yb
ikan
sA–
–32
.0–
–19
.4A
spar
tate
2.6.
1.1
aspC
aspC
aspB
232
232
220
168
168
158
4.3.
1.1
aspA
aspA
ansB
167
167
320
150
145
19.4
6.3.
5.4
––
asnO
, as
nH,
asnB
––
0.30
0–
–0.
353
3.5.
1.1
––
ansA
––
32.0
––
19.4
Cys
tein
e2.
5.1.
47cy
sK; c
ysM
cysK
; cys
Mcy
sK,
yrhA
, yt
kP6.
3011
0025
.53.
6239
813
.92.
3.1.
30cy
sEcy
sEcy
sE71
.639
771
.635
.019
428
.8G
luta
min
e6.
3.1.
2gl
nAgl
nAgl
nA15
323
115
313
218
712
86.
3.5.
5ca
rAB
carA
B–
4.16
4.16
–1.
200.
890
–3.
5.1.
12–
–yl
aM,
ybgJ
––
716
––
406
Glu
tam
ate
1.4.
1.13
gltB
Dgl
tBD
gltA
B18
.618
.623
.072
.88.
3212
.41.
4.1.
4gd
hAgd
hA–
250
231
–20
218
7–
1.4.
1.2
––
rocG
––
80.0
––
62.2
Gly
cine
2.1.
2.1
glyA
glyA
glyA
13.6
13.6
13.6
10.3
10.3
10.3
His
tidin
e2.
4.2.
17hi
sGhi
sGhi
sG54
454
454
430
230
121
43.
6.1.
31-3
.5.4
.19
hisI
hisI
hisI
0.00
300
0.00
300
332
0.00
114
0.00
113
132
5.3.
1.16
hisA
hisA
hisA
7.80
7.80
7.80
3.40
3.39
3.45
2.4.
2.-
hisH
Fhi
sHF
hisF
H0.
900
0.90
00.
900
0.32
50.
0523
0.05
064.
2.1.
19-3
.1.3
.15
hisB
hisB
hisB
5.70
2020
000.
310
3.84
1350
000.
111
2.6.
1.9
hisC
hisC
hisC
1890
1890
325
1240
1250
217
3.1.
3.15
––
hisJ
––
427
––
217
1.1.
1.23
hisD
hisD
hisD
15.3
14.3
3.60
11.8
10.9
2.77
Isol
euci
ne1.
2.4.
1–
–ac
oAB
, pd
hAB
––
0.12
0–
–0.
0056
74.
3.1.
19ilv
Ailv
A–
230
683
–21
564
0–
2.2.
1.6
ilvG
M; i
lvB
N; i
lvIH
ilvG
M; i
lvB
N; i
lvIH
ilvB
N40
0040
0040
0050
045
681
21.
1.1.
86ilv
Cilv
Cilv
C1.
901.
911.
911.
711.
721.
194.
2.1.
9ilv
Dilv
Dilv
D63
.063
.063
.067
.635
.762
.52.
6.1.
42ilv
Eilv
Eilv
E27
.327
.327
.315
.54.
3418
.3Le
ucin
e2.
3.3.
13le
uAle
uAle
uA14
.514
.77.
1013
.814
.16.
734.
2.1.
33le
uCD
leuC
Dle
uCD
0.07
20.
0720
6.18
0.00
800
0.00
527
0.98
51.
1.1.
85le
uBle
uBle
uB0.
085
35.5
10.5
0.05
6135
.56.
992.
6.1.
42ilv
Eilv
Eyb
gE57
.053
.427
.332
.553
.418
.01.
4.1.
9–
–bc
d–
–11
0–
–73
.3
For
eac
h am
ino
acid
, ita
liciz
ed t
ype
indi
cate
s th
e ge
ne fo
r th
e en
zym
e in
its
clas
sica
l bio
synt
hetic
pat
hway
. Bol
d ita
liciz
ed t
ype
indi
cate
s an
aux
iliar
y en
zym
e th
at h
as b
een
obse
rved
to
repl
ace
an e
nzym
e in
the
clas
sica
l pa
thw
ay,
at l
east
in
vitr
o, b
ut t
hat
is n
ot u
sual
ly c
onsi
dere
d pa
rt o
f th
e bi
osyn
thet
ic p
athw
ay.
We
wer
e un
able
to
find
appr
opria
te e
stim
ates
for
the
act
ivity
of
enzy
me
2.3.
1.46
in
met
hion
ine
bios
ynth
esis
.
![Page 7: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/7.jpg)
Cognate amino acid bias in amino acid biosynthesis 1023
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
Lysi
ne2.
7.2.
4ly
sCly
sCly
sC5.
695.
6930
.04.
604.
6121
.81.
2.1.
11as
das
das
d14
514
573
.096
.797
.046
.04.
2.1.
52da
pAda
pAda
pA10
010
045
852
.152
.123
71.
3.1.
26da
pBda
pBda
pB39
839
856
.419
119
127
.72.
3.1.
117
dapD
dapD
dapD
36.0
36.0
36.0
17.9
17.9
15.0
2.6.
1.17
dapC
; yfd
Zda
pC; y
fdZ
yugH
5.33
5.33
5.33
4.10
4.10
3.76
3.5.
1.18
dapE
dapE
ytjP
3.33
3.33
3.33
2.31
2.31
1.99
5.1.
1.7
dapF
dapF
dapF
18.7
18.7
18.7
9.45
9.49
9.62
4.1.
1.20
lysA
lysA
lysA
7.50
7.50
28.3
5.77
5.75
22.9
Met
hion
ine
2.3.
1.46
met
Am
etA
met
B–
––
––
–2.
5.1.
48m
etB
met
Byj
cI10
.018
.210
.06.
9212
.66.
954.
4.1.
8m
etC
met
Cyj
cJ24
87.
907.
9017
95.
645.
592.
1.1.
13m
etH
met
H–
9.30
9.30
–21
.121
.6–
2.1.
1.14
met
Em
etE
met
C2.
502.
500.
240
3.53
3.53
0.34
72.
1.1.
10–
–yb
gG–
–1.
37–
–0.
793
Phe
nyla
lani
ne5.
4.99
.5-4
.2.1
.51
pheA
pheA
pheA
, ar
oA,
aroH
52.0
32.0
17.5
37.3
22.8
4.23
2.6.
1.57
tyrB
tyrB
–17
017
0–
123
123
–2.
6.1.
9–
–ar
oJ–
–32
3–
–21
6P
rolin
e2.
7.2.
11pr
oBpr
oBpr
oJ,
proB
12.7
12.7
12.7
8.25
8.27
8.52
1.2.
1.41
proA
proA
proA
28.2
28.2
28.2
21.0
21.0
21.3
1.5.
1.2
proC
proC
proG
H25
5025
5028
012
0011
9014
1S
erin
e1.
1.1.
95se
rAse
rAse
rA9.
709.
7015
.67.
147.
1114
.82.
6.1.
52se
rCse
rCse
rC15
.015
.015
.09.
949.
9510
.03.
1.3.
3se
rBse
rBrs
bP3.
003.
003.
001.
751.
752.
300
Thr
eoni
ne2.
7.2.
4-1.
1.1.
3th
rAth
rAly
sC,
dapG
, yc
lM5.
695.
6930
.08.
458.
4121
.51.
2.1.
11as
das
das
d14
514
50.
0400
97.0
97.0
0.02
521.
1.1.
3–
–ho
m–
–51
.0–
–40
.32.
7.1.
39th
rBth
rBth
rB3.
103.
103.
101.
741.
721.
724.
2.3.
1th
rCth
rCth
rC7.
707.
708.
806.
046.
035.
49Tr
ypto
phan
4.1.
3.27
trpD
Etr
pDE
trpE
2.60
0.38
00.
380
1.30
0.04
570.
368
2.4.
2.18
trpD
trpD
trpD
177
–1.
582.
46–
0.94
85.
3.1.
24-4
.1.1
.48
trpC
trpC
trpF
8.78
2.30
8.78
7.24
1.89
3.52
4.1.
1.48
––
trpC
––
3.70
––
1.72
4.2.
1.20
trpA
Btr
pAB
trpA
B12
512
52.
8020
.039
.71.
07Ty
rosi
ne5.
4.99
.5-1
.3.1
.12
tyrA
tyrA
aroA
, ar
oH,
pheA
52.0
52.0
17.5
36.4
36.3
4.23
2.6.
1.57
tyrB
tyrB
aroJ
70.2
70.2
323
50.9
50.9
216
Val
ine
1.2.
4.1
––
acoA
B,
pdhA
B–
–0.
120
––
0.00
567
4.3.
1.19
ilvA
ilvA
–23
068
3–
215
640
–2.
2.1.
6ilv
GM
; ilv
BN
; ilv
IHilv
GM
; ilv
BN
; ilv
IHilv
BN
4000
4000
4000
500
456
812
1.1.
1.86
ilvC
ilvC
ilvC
1.90
1.91
1.91
1.71
1.72
1.19
4.2.
1.9
ilvD
ilvD
ilvD
63.0
63.0
63.0
67.6
35.7
62.5
2.6.
1.42
ilvE
ilvE
ilvE
27.3
27.3
27.3
15.5
4.34
18.3
2.6.
1.66
avtA
avtA
–0.
0196
0.01
96–
0.01
510.
0152
–
Am
ino
Aci
dE
C n
umbe
r
Gen
eS
peci
fic a
ctiv
ity (
mM m
in-1
mg-1
)M
olec
ular
act
ivity
(M
ol r
eact
ant
s-1 M
ol
enzy
me-1
)
E. c
oli
S. t
yphi
mur
ium
B. s
ubtil
isE
. col
iS
. typ
him
uriu
mB
. sub
tilis
E. c
oli
S. t
yphi
mur
ium
B. s
ubtil
is
For
eac
h am
ino
acid
, ita
liciz
ed t
ype
indi
cate
s th
e ge
ne fo
r th
e en
zym
e in
its
clas
sica
l bio
synt
hetic
pat
hway
. Bol
d ita
liciz
ed t
ype
indi
cate
s an
aux
iliar
y en
zym
e th
at h
as b
een
obse
rved
to
repl
ace
an e
nzym
e in
the
clas
sica
l pa
thw
ay,
at l
east
in
vitr
o, b
ut t
hat
is n
ot u
sual
ly c
onsi
dere
d pa
rt o
f th
e bi
osyn
thet
ic p
athw
ay.
We
wer
e un
able
to
find
appr
opria
te e
stim
ates
for
the
act
ivity
of
enzy
me
2.3.
1.46
in
met
hion
ine
bios
ynth
esis
.
![Page 8: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/8.jpg)
1024 R. Alves and M. A. Savageau
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
correlation between cognate bias and number of peptidechains that make up the pathway complement. This cor-relation indeed exists, as shown in Table 5.
As a control we have also examined the correlationbetween the minimum non-cognate bias and the numberof protein chains of the pathway. Table 5 shows that thecorrelation between minimum non-cognate bias andpathway length is negative, but that it is significantlyweaker and less significant than in the case of the cog-nate bias.
ing the specific activity for all the enzymes in a givenorganism.
Correlation between cognate bias and pathway length
The longer an amino acid biosynthetic pathway (largernumber of protein chains), the stronger the selection fora low cognate bias because a larger number of proteinsneeds to be synthesized to obtain one full pathway set.According to this expectation, there should be an inverse
Fig. 3. Schematic representation of amino acid biosynthetic pathways. Names for each enzyme, represented here by their EC number, and the corresponding gene are given in Table 2.
Table 3. Spearman rank correlation coefficient (r) between cognate amino acid bias of the amino acid biosynthetic enzymes and their molecularactivity.
E. colir (n = 79)d,e
S. typhimuriumr (n = 79)d,e
B. subtilisr (n = 89)d,e
Overall r (n = 247)d,e
Molecular activitya 0.284 (P < 0.0033) 0.301 (P < 0.0030) 0.186 (P < 0.010) 0.230 (P < 5 ¥ 10-7)Protein costb -0.173 (P < 0.05) -0.104 (P < 0.15) 0.147 (P < 0.04) -0.0819 (P < 0.05)GC contentc 0.216 (P < 0.02) 0.154 (P < 0.07) 0.157 (P < 0.04) 0.164 (P < 0.01)Codon biasc -0.201 (P < 0.03) 0.0975 (P < 0.18) 0.0580 (P < 0.27) -0.0322 (P < 0.24)
a. Calculated as described in Experimental procedures.b. Calculated by adding the costs for synthesizing each of the amino acid residues that constitute the enzymes of a given amino acid biosyntheticpathway.c. Calculated for the genes that encode the enzymes of a given amino acid biosynthetic pathway.d. n represents the number of enzyme activities used to calculate the correlations. It can be determined from Table 2.e. P represents the probability that the correlation is non-significant (see Experimental procedures).Also shown for comparison are correlations between cognate amino acid bias of the amino acid biosynthetic enzymes and three other factorsthat could potentially influence the amino acid composition of the same biosynthetic enzymes. These three factors have been previously identifiedas influencing amino acid composition of the total proteome.
![Page 9: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/9.jpg)
Cognate amino acid bias in amino acid biosynthesis 1025
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
Profiles of cognate bias in amino acid biosynthetic pathways
Figure 4B shows the profile of cognate bias for the groupof E. coli proteins involved in the synthesis of a givenamino acid. With the exception of Glu, Gly and Tyr, thegroup of biosynthetic proteins responsible for the synthe-sis of a given amino acid has a cognate bias that is belowthe 50th percentile of the control group.
As expected both from the earlier analysis and fromthe high degree of homology between many of the E.coli and S. typhimurium proteins, the profile of cognatebias for S. typhimurium is similar (Fig. 4A). However, inS. typhimurium the cognate bias of the Asp, Asn andVal biosynthetic pathways is also above the 50th per-
centile. Of the biosynthetic pathways that have a cog-nate bias below the 50th percentile, the average bias issomewhat smaller for S. typhimurium than for E. coli,again as one would have predicted from the fact thatthere is a stronger correlation between cognate biasand molecular activity for the enzymes of S. typhimu-rium (Table 3).
Although the general conclusions are similar, thedetailed profile of cognate bias for B. subtilis differs con-siderably from that of the two enteric bacteria. More thanhalf of the amino acid biosynthetic pathways have a cog-nate bias above the 50th percentile (Fig. 4C). Theseinclude the pathways for Ala, His, Phe, Ser and Thr, whichare in addition to the pathways with cognate bias abovethe 50th percentile in E. coli and S. typhimurium.
Table 4. Significance of the Spearman rank correlation coefficient (r) between amino acid bias and molecular activity of enzymes involved inamino acid biosynthesis.a
E. colir (n = 79)b,c
S. typhimuriumr (n = 79)b,c
B. subtilisr (n = 89)b,c
Overall r (n = 247)b,c
Ala -0.0299 (P < 0.38) -0.0390 (P < 0.35) 0.118 (P < 0.084) -0.0489 (P < 0.16)Arg -0.00708 (P < 0.47) -0.0599 (P < 0.28) 0.00803 (P < 0.47) -0.0883 (P < 0.038)Asn -0.0869 (P < 0.19) -0.0708 (P < 0.24) 0.160 (P < 0.032) -0.0121 (P < 0.40)Asp -0.127 (P < 0.10) -0.157 (P < 0.058) ----0.332 (P < 4.8 ¥¥¥¥ 10----5) -0.155 (P < 0.00045)Cys 0.0288 (P < 0.41) -0.0334 (P < 0.37) 0.0420 (P < 0.36) -0.00400 (P < 0.46)Gln 0.171 (P < 0.054) 0.0300 (P < 0.39) -0.0316 (P < 0.36) 0.103 (P < 0.020)Glu -0.175 (P < 0.043) 0.0159 (P < 0.43) -0.0641 (P < 0.22) -0.0392 (P < 0.21)Gly -0.0731 (P < 0.23) 0.156 (P < 0.061) 0.0145 (P < 0.44) 0.0372 (P < 0.23)His -0.0178 (P < 0.41) -0.176 (P < 0.041) -0.0218 (P < 0.39) -0.129 (P < 0.0046)Ile 0.103 (P < 0.17) 0.108 (P < 0.14) 0.241 (P < 0.0024) 0.0827 (P < 0.049)Leu -0.0469 (P < 0.31) -0.107 (P < 0.14) -0.106 (P < 0.11) -0.0806 (P < 0.052)Lys -0.0802 (P < 0.21) 0.0549 (P < 0.30) -0.0912 (P < 0.14) 0.0337 (P < 0.25)Met 0.259 (P < 0.0067) 0.289 (P < 0.0022) -0.0801 (P < 0.17) 0.234 (P < 1.3 ¥¥¥¥ 10----6)Phe -0.00113 (P < 0.13) -0.168 (P < 0.048) ----0.231 (P < 0.0033) 0.102 (P < 0.020)Pro -0.134 (P < 0.091) -0.0370 (P < 0.35) 0.154 (P < 0.036) -0.0730 (P < 0.069)Ser 0.130 (P < 0.1) 0.197 (P < 0.025) 0.0592 (P < 0.25) 0.128 (P < 0.0050)Thr 0.183 (P < 0.043) 0.131 (P < 0.10) -0.0357 (P < 0.34) 0.0667 (P < 0.093)Trp -0.107 (P < 0.11) -0.176 (P < 0.028) -0.174 (P < 0.0033) -0.111 (P < 0.0066)Tyr -0.176 (P < 0.042) -0.131 (P < 0.094) 0.130 (P < 0.064) -0.0598 (P < 0.10)Val 0.179 (P < 0.045) 0.0552 (P < 0.29) -0.0177 (P < 0.42) 0.0755 (P < 0.065)CBd 0.284 (P < 0.0033) 0.301 (P < 0.0030) 0.186 (P < 0.01) 0.230 (P < 5 ¥ 10-7)
a. Values in bold are those that are significant to a level higher than 99% and r > 0.20.b. n represents the number of enzyme activities used to calculate the correlations. It can be determined from Table 2.c. P represents the probability that the correlation is non-significant (see Experimental procedures).d. CB indicates cognate amino acid bias.
Table 5. Correlation between number of proteins that make up the enzymes of each amino acid biosynthetic pathway and their minimum aminoacid bias.
Minimum biasE. colir (n = 20)a,b
S. typhimuriumr (n = 20)a,b
B. subtilis r (n = 20)a,b
Cognate -0.49 (P < 0.003) -0.72 (P < 0.004) -0.23 (P < 0.21)Non-cognate -0.12 (P < 0.18) -0.22 (P < 0.14) -0.06 (P < 0.39)Overall -0.13 (P < 0.16) -0.22 (P < 0.14) -0.04 (P < 0.44)
a. n represents the number of enzyme activities used to calculate the correlations. It can be determined from Table 2.b. P represents the probability that the correlation is non-significant (see Experimental procedures).
![Page 10: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/10.jpg)
1026 R. Alves and M. A. Savageau
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
Profiles of cognate bias in individual amino acid biosynthetic enzymes
Although the cognate bias of individual biosynthetic path-ways is above the 50th percentile for some of the aminoacids, this does not necessarily mean that selection for alow bias is too weak to be observed for these amino acids.It may be an indication that a subset of the enzymes inthe pathway has a dominant effect on the rate of synthesisfor the cognate amino acid and is critical for recovery ofamino acid levels during derepression. Only this subsetof enzymes would then be sufficiently sensitive to selec-tion for a low cognate bias to be observed. To furtheranalyse this implication of the cognate bias hypothesis,we examine the cognate bias of individual biosyntheticenzymes within each pathway for each of the threeorganisms.
Figure 5 shows the cognate bias for the individualenzyme that has the lowest value within each amino acidbiosynthetic pathway. One sees immediately for each ofthe organisms that approximately 75% of the amino acids
pathways exhibit at least one enzyme that has a lowcognate bias. Of the 75%, the enzyme with the lowestcognate bias has either the lowest or the second lowestmolecular activity in the pathway. Whenever genetic andbiochemical data are available, one finds this enzyme tobe regulated both by end-product inhibition at the level ofenzyme activity and by repression at the level of geneexpression (shown in Table 6 for the E. coli pathways).This type of regulation is an additional indication that theenzyme has a dominant influence over the flux throughthe pathway, particularly during the critical early phase ofrecovery.
This more detailed analysis further supports the notionthat selection for low cognate bias in enzymes within agiven amino acid biosynthesis pathway is strong enough
Fig. 4. The compositional bias of biosynthetic pathways with respect to their cognate amino acid.
1
0.9
0.8
0.7
0.6
0.5
0.4
Pat
hway
bia
s
Pathway
0.3
0.2
0.1
S. typhimurium
0Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val
1
0.9
0.8
0.7
0.6
0.5
0.4
Pat
hway
bia
s
Pathway
0.3
0.2
0.1
0Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val
1
0.9
0.8
0.7
0.6
0.5
0.4
Pat
hway
bia
s
Pathway
0.3
0.2
0.1
0Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val
E. coli
B. subtilis
Fig. 5. Enzyme with the lowest value for cognate bias in the biosyn-thetic pathway of each amino acid. The order of the pathways is the same as that in Fig. 4. The x-axis shows the name of the gene coding for the enzyme. The numbers indicate the probability that this lowest bias occurs in a set of proteins, containing the same number of proteins as the pathway, and drawn randomly from the proteome of each organism.
1
0.9
0.8
0.7
0.6
0.5
0.4P
rote
in b
ias
0.3
0.2
0.10.1 0.2
0.2
0.005 0.001 0.0050.050.080.3 0.3
0.3
0.30.2
0.10
1
1
1
0.09 0.02
S. typhimurium
avtA
argC
asnA
aspA
cysK
gdhP
gdhA
glyA
hisC ilvA
leuA
lysA
met
E
pheA
proB
serB
thrB
trpE
tyrA
avtA
0
1
0.9
0.8
0.7
0.6
0.5
0.4
Pro
tein
bia
s
0.3
0.2
0.10.07 0.05 0.002 0.00020.002 0.03 0.03 0.04
0.30.2
0.20.2
0.2 0.20.09 0.03
0.9
0
11 E. coli
avtA
argC
asnA
aspA
cysK
glnA
gdhA
glyA
hisC ilvA
leuC lysA
met
E
pheA
proA
serB
thrB
trpA
tyrA
avtA
0
1
0.9
0.8
0.7
0.6
0.5
0.4
Pro
tein
bia
s
0.3
0.2
0.100
1
1
1
1
11
0.10.40.7
0.10.40.3
0.04 0.07 0.02 0.040.004 0.1
B. subtilis
ywaA
argD
asnO
aspB
cysK
ypcA
rocG
glyA
hisA
acoA
leuC
dapF
met
C
aroH
proJ
serA
yclM
trpG
aroA
pdhA
0
Enzyme
Enzyme
Enzyme
![Page 11: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/11.jpg)
Cognate amino acid bias in amino acid biosynthesis 1027
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
to influence the relative composition of biosyntheticenzymes. Nevertheless, for each organism there aresome pathways in which the lowest cognate bias isgreater than 0.1. These give rise to profile features thatare fairly similar for the enteric bacteria E. coli and S.typhimurium, but different from that for the more distantlyrelated bacterium B. subtilis.
For E. coli and S. typhimurium there are three aminoacid biosynthetic pathways (Asp, Phe and Tyr) in whichselection for low cognate bias appears to be masked byfunctional requirements.
Asp biosynthesis
In E. coli, AspC is the homodimeric protein that is tradi-tionally thought to catalyse Asp biosynthesis from Glu andoxoglutarate, while AspA is the homotetrameric enzymethought by many to produce fumarate and NH3 from thecatabolism of Asp. However, genetic and biochemical evi-dence suggests that AspA is likely to catalyse the reversereaction (i.e. Fumarate + NH3 Æ Asp) under normal con-ditions in vivo. In industrial processes, purified AspA, orE. coli cells overexpressing the enzyme, are used to pro-duce Asp (for a discussion, see Herrmann and Somerville,1983; Neidhardt, 1999 and references therein). Thecognate bias of both enzymes is higher than the 10thpercentile.
The protein AspC has a cognate bias that is around the50th percentile. The analysis of the dimeric structure ofAspC shows that three Asp residues are involved in theinterface contact surface of the monomers. These threeresidues are also conserved (data not shown) in all AspC
bacterial homologues from the SWISSPROT database(Boeckmann et al., 2003), suggesting an important func-tional role. If these three residues are discounted, thecognate bias of Asp drops to the 27th percentile. This isstill above the 0.1 significance level, but is nevertheless alower cognate bias.1
Surprisingly, the protein AspA has a lower cognate bias(approximately 12th percentile) than AspC, although it isstill above the 10-percentile threshold. This protein isactive as a tetramer in which several of the Asp residuesare involved in forming potential salt bridges between themonomers (Fig. S1). If one discounts these residues,which are selected for functional reasons, then the cog-nate bias of the enzyme is well below the first percentile.Thus, in this case it appears that functional considerationsare responsible for masking the selection for low cognatebias.
Phe and Tyr biosynthesis
Although there is some variation regarding the lowestcognate bias in the Tyr and Phe biosynthetic pathways of
1The cognate bias of a protein after removing a given residue needsto be recalculated. Using AspC as an example, the recalculation isperformed the following way. (i) Calculate the number of Asp residuesin an average protein with the same length as AspC. (ii) Discount thenumber of residues with established functional roles (three residuesin this case) from both AspC and the average proteins and recalcu-late the average frequency of the different amino acids. (iii) Use thesenew probabilities to recalculate the cognate bias of the protein. Inother words, we remove the same number of Asp residues from thecontrol proteins and recalculate the probabilities.
Table 6. Correlation between low cognate amino acid bias and regulation of gene expression and enzyme activity.a
Biosyntheticpathway
Enzymes with low cognate aminoacid bias (below 10% quantile)
Enzymes repressed by cognateamino acid addition
Enzymes derepressed uponcognate amino acid depletion
End-product-inhibitedenzymes
Ala AvtA AvtA – –Arg ArgBCFI ArgABCDEFGHI ArgECBH ArgA, ArgBAsn AsnA, AsnB AsnA – AsnA, AsnBAsp – – – –Cys CysK-Z,CysM CysK-Z,CysM, CysE CysK-Z, CysM, CysE CysK-Z, CysM, CysEGln GltB, GltD GltB, GltD GltD, GltB GltD, GltBGlu – GdhA GdhA GdhAGly – GlyA GlyA –His HisC, HisD All All HisC, HisGIle IlvGA IlvGEDA IlvGEDA IlvALeu IlvE, leuABCD LeuABCD LeuABCD LeuALys LysC, DapE, DapF, LysA LysC, DapE, LysA LysC, DapA, LysA LysC, DapAMet MetA, MetE, MetB MetA, MetH, MetE, MetB, MetL MetA, MetH, MetE MetAPhe PheA slightly above 10% quantile PheA PheA PheAPro ProA, ProB ProB ProC ProASer SerB – – SerAThr ThrAB ThrAB ThrABC ThrABTrp TrpABCDE TrpABCDE TrpABD TrpEDTyr – TyrA TyrA TyrAVal AvtA AvtA, IlvG – IlvIHGM
a. This information has been compiled from Herrmann and Somerville (1983), Neidhardt (1999), (Khodursky et al., 2000) and references therein.
![Page 12: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/12.jpg)
1028 R. Alves and M. A. Savageau
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
E. coli and S. typhimurium, they have in common twoqualitative features of interest. First, it is the initial enzymein each pathway that has the lowest cognate bias. Theseenzymes are repressed at the level of gene expressionand end-product inhibited at the level of enzyme activity(Neidhardt, 1999). As noted above, this is an indicationthat they catalyse a rate-determining step in the corre-sponding biosynthetic pathway. The correlation betweenthe properties of the regulatory enzyme in the relevantpathway and its low cognate bias suggests that selectionfor low cognate bias might still be present but masked forfunctional reasons.
The second feature shared by both pathways is that asingle gene, tyrB, encodes both the second and lastenzymes in each pathway. Although the relevance of thissecond feature for our argument is less obvious, as shownbelow there is reason to believe that selection for lowcognate bias is operating on these enzymes.
Table 1 shows that Phe and Tyr are among the leastabundant amino acids in proteins, at approximately 4%and 3% respectively. However, their particular chemicalproperties, due to their side-chain aromatic ring, makethem especially important in active centres and in sub-strate interaction sites (Pedersen and Finazzi-Agro, 1993;Frey, 2001; Rogers and Dooley, 2003). The number ofresidues that comprise an active centre is usually smallcompared with the total number of residues in the protein.A conservative estimate would suggest that less than 10residues would account for most active sites. As thesmaller active enzymes have between 100 and 150 resi-dues, this would predict that approximately 10% of theresidues in a protein are involved in the formation of anactive centre. If proteins include a low percentage of Pheor Tyr residues, and if Phe or Tyr residues are necessaryin the active centre, then these features will provide astrong selection for a high compositional bias for theseamino acids in any protein.
A more detailed analysis of the enzymes in the Phe andTyr biosynthetic pathways indicates selection for low cog-nate bias when the functional role of the amino acid inactive centres is factored out. The gene that encodes theenzyme catalysing the second step in each pathway istyrB. The protein product has 397 amino acids, of which17 are Phe residues (4.3%) and 15 are Tyr residues(3.8%). A crystal structure for this protein has been depos-ited by Ko et al. in the protein databank (PDB code 3TAT),although a paper analysing the structure has not beenpublished yet. This protein is a dimer and our own analysisshows that at least five Phe residues and six Tyr residuesare involved in the active centre and in the interactionbetween monomers respectively (Fig. S1). Furthermore,these residues are conserved in homologous proteinsfrom other organisms (data not shown). If we discountthese residues, then the cognate bias of tyrB is below the
10th percentile for both the Phe pathway and the Tyrpathway.2
The functional analysis of the first enzyme in each path-way cannot be accomplished as easily. PheA, the firstenzyme in the Phe biosynthetic pathway, is a 386-amino-acid-residue, bifunctional protein. Eleven of the 386 resi-dues are Phe (2.9%). This is below the average cognatebias, but nevertheless above our 10-percentile threshold.The Protein Databank entry for this protein, file 1ECM,shows that there are no Phe residues involved in theactive centre of the chorismate mutase activity of PheA.This is unlike the case of other chorismate mutases, suchas that of B. subtilis. Bacterial homologues of PheA fromthe SWISSPROT database (Boeckmann et al., 2003)show perfect conservation for two of the 11 Phe residues(Fig. S2). This suggests an important functional role forthese residues. If these residues are discounted, then thecognate amino acid bias falls bellow the sixth percentile.Furthermore, three of the other Phe residues are perfectlyconserved in all but two of the proteins. This may be takenas an indication that these residues are important for thefunction or structure of the protein. Again, we find that thehigher-than-expected cognate bias may result from thefunctional requirements of the protein for this specific typeof amino acid residue.
We now consider the TyrA protein. This protein is com-posed of 373 amino acid residues, 10 of which are Tyrresidues (2.7%). There is no known structure for thisenzyme or for any of its homologues. Bacterial homo-logues of TyrA from the SWISSPROT database (Boeck-mann et al., 2003) show perfect conservation for five outof the 10 Tyr residues. An additional Tyr residue is con-served in all but one of the homologues, where it isreplaced by a Phe residue (Fig. S2). If these residues arediscounted, the cognate amino acid bias of TyrA dropswell below the 6th percentile. Furthermore, two of theadditional Tyr residues are conserved in all but one of thehomologous proteins.
As a control for these cases in which cognate aminoacid residues are discounted, one can discount other con-served (non-cognate amino acid) residues and recalculatethe compositional bias. When this is done, the composi-tional bias with respect to these amino acids does not dropas low as that for the cognate amino acid (data not shown).
For B. subtilis there are four amino acid biosyntheticpathways (Ala, Ser, Thr and Val) in which selection for low
2A procedure similar to that described in the previous footnote for Aspis used to recalculate the bias of proteins containing Tyr and Phe res-idues that are functionally important. In the case of these two aminoacids, the functional residues are, in many cases, involved in cataly-sis. Therefore, one could also discount all conserved Tyr or Phe resi-dues from the active centre of enzymes to recalculate the probabilitiesof amino acid occurrence. However, the structure for most enzymes isstill unknown and so is the actual composition of their active centreswhich precludes such an approximation at this time.
![Page 13: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/13.jpg)
Cognate amino acid bias in amino acid biosynthesis 1029
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
cognate bias appears either to be weaker or to be maskedby other requirements.
Ala and Val biosynthesis
Of the 20 amino acids, the biosynthesis of Ala is probablythe least well studied. It is likely that more enzymes yet tobe identified could contribute to the synthesis of thisamino acid. There are no available data that hint at pos-sible explanations for the high cognate bias of Ala biosyn-thesis in B. subtilis. Regarding the biosynthesis of Val,although the cognate bias is not low, one finds that, for B.subtilis, the enzyme with the lowest cognate bias cataly-ses the first step committed to Val biosynthesis. Addition-ally, the second enzyme with the lowest bias is the onethat catalyses the first step in the pathway common to thebiosynthesis of Leu and Val. This suggests that otherfactors may be masking the selection for low cognate biasin these biosynthetic pathways.
Ser and Thr biosynthesis
In B. subtilis, the enzyme of the Ser biosynthetic pathwaywith the lowest cognate bias is encoded by the gene serA,and it is the first enzyme in the pathway. After carefulcomparative sequence and structure analysis with thehomologue from E. coli, we could find no functional justi-fication for the excess of Ser residues in the B. subtilisenzyme. Additional comparative sequence analysis withhomologues from other Gram-positive bacteria shows thatonly two Ser residues are perfectly conserved. If we dis-count these residues and recalculate the cognate bias,this bias is still around the 40th percentile.
The enzyme of the Thr biosynthetic pathway with thelowest cognate bias is also the first enzyme in the path-way. A sequence alignment of the relevant Thr enzymesfrom different Gram-positive bacteria shows that there arefour fully conserved Thr residues (data not shown), whichimplies an important functional role for these residues.When they are discounted, the cognate bias for Thr fallsbelow the fifth percentile.
Finally, for all three organisms there are two biosyn-thetic pathways in which selection for low cognate biasappears to be completely overridden by some unknownmechanism that actually yields a higher-than-averagecognate bias.
Gly and Glu biosynthesis
The pathways for Gly and Glu biosynthesis each involvea small number of enzymes (as low as one per pathway,depending on the organism). In the E. coli case, a carefulanalysis of the three-dimensional crystal structure and ofthe fully conserved residues between homologous pro-
teins involved in Gly or Glu biosynthesis does not revealany special function for the relevant amino acid. Therefore,other reasons must account for the higher cognate biasof Gly and Glu biosynthetic pathways. The biosyntheticpathways for these amino acids, which are composed ofonly one enzyme each, fall into the category of shortpathways for which there is less intense selection for lowcognate bias. However, this factor alone would notaccount for their higher-than-average cognate bias.
Correlation between bias in biosynthetic enzymes and environmental abundance of the cognate amino acid
Selection for low cognate bias is expected to be moreintense for those amino acid biosynthetic pathways thatmust undergo the greatest range and frequency of dere-pression. This is likely to be associated with low andinfrequent abundance of the cognate amino acid in theorganism’s environment. Although the environments of E.coli, S. typhimurium and B. subtilis are complex, hetero-geneous and difficult to characterize, there are data(Table 1) that suggest at least a relative ranking for theabundance of the amino acids in the human colon (aprincipal habitat of E. coli and S. typhimurium) and in soil(a principal habitat of B. subtilis). In attempting to comparethe abundance of a given amino acid in these two envi-ronments, it is problematic if its relative abundance is thesame in the two environments, either high or low, becausethese qualitative assessments do not deal with the abso-lute concentrations. Either environment could have anabundance that is either higher or lower than the other.This is less of a problem if qualitative comparisons aremade in cases where the abundance is qualitatively dif-ferent between environments, for example, high in oneenvironment and low in the other. The qualitative result ofsuch a comparison is likely to be valid, even if there isuncertainty in the absolute concentrations.
To perform such a qualitative comparison of environ-mental abundance with cognate bias, we apply a qualita-tive rank correlation test. The amino acids are given ascore of one if the abundance is low, two if the abundanceis intermediate and three if the abundance is high. Then,for each of the 10 amino acids that have a qualitativelydifferent abundance in the two environments, and for eachorganism, we identify the enzyme in their biosyntheticpathway that has the lowest cognate bias. For each aminoacid we ranked the cognate bias in the three organismsin the following way: when comparing the cognate bias ofB. subtilis and E. coli, the lowest ranked organism wasgiven the number 1, the other the number 2. A similarcomparison was made between B. subtilis and S. typh-imurium (Table 7). We then built a table of pairs of values,where the first element of the pair is the rank of the aminoacid in the environment for the relevant organism(s) and
![Page 14: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/14.jpg)
1030 R. Alves and M. A. Savageau
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
the second element is the rank of the cognate bias. Cal-culating the Spearman rank correlation between the twosets of values for B. subtilis and E. coli suggests a positivecorrelation between high cognate bias and the amount ofamino acid in the environment (r = 0.61, P < 0.004). Therank correlation calculated for B. subtilis and S. typhimu-rium is even stronger (r = 0.83, P < 0.00025).
Table 7 shows that four of the amino acids with cognatebias that is higher in E. coli than in B. subtilis also have arelative abundance that is higher in the colon than in soil.Similarly, three of the amino acids with cognate bias thatis higher in B. subtilis than in E. coli also have a relativeabundance that is higher in soil than in the colon. Asp isa clear outlier; its cognate bias is higher in E. coli but itsrelative abundance is higher in soil. The two remainingcases, Gly and Trp, have essentially the same cognatebias in E. coli and B. subtilis, even though the relativeabundance of Trp is greater in the colon and that of Glyis greater in soil.
Discussion
Prototrophic microorganisms like E. coli, S. typhimuriumand B. subtilis are capable of synthesizing all of the aminoacids. A considerable fraction of the bacterial genome isdevoted to the encoding of enzymes involved in the bio-synthesis of the amino acids (Neidhardt, 1999). However,these organisms exist in changing environments andwhen they encounter an exogenous source of a particularamino acid they typically repress the enzymes for itsendogenous biosynthesis. This creates a particular
dilemma when attempting to derepress a pathway forwhich the cognate amino acid has been depleted.
Cells have evolved various strategies for dealing with asudden amino acid depletion. Proteases are able toreconfigure the complement of proteins (Reeve et al.,1984; Matin, 1991; Weichart et al., 2003; Nystrom, 2004)and liberate a supply of the limiting amino acid. The strin-gent response (Foster and Spector, 1995; Magnussonet al., 2003), by shutting down the synthesis of other pro-teins, and stimulating the synthesis of amino acid biosyn-thetic enzymes, can contribute to the replenishment of thelimiting amino acid. These are likely to be rather generalsolutions to the problem of protein synthesis and not spe-cific to a particular subset of amino acid biosyntheticenzymes.
Another strategy that addresses the problem of speci-ficity is the following. With lowered amino acid concentra-tions, there is a shift in charging from isoacceptor tRNAswith lower affinity for the amino acid to ones with higheraffinity, thereby allowing those proteins whose mRNA isenriched for the high-affinity isoaccepting species to besynthesized at a faster rate than would be possible withoutthe enrichment. At least 10 of the amino acid biosyntheticpathways in E. coli show this enrichment specifically forthe cognate amino acid (Elf et al., 2003). Although differ-ences in the charging of isoacceptor tRNAs can accountfor the relative usage of the different synonymous codons,these differences cannot fully account for the total relativeamount of a given amino acid in the proteins that synthe-size that amino acid. Recovery from the repressed statewhen the exogenous supply of a given amino acid is no
Table 7. Correlation between high bias of the cognate amino acid in the biosynthetic enzymes and high relative concentration of the cognateamino acid in the environment.
Amino Acid
Rank of amino acid concentrationa Rank of minimum cognate biasb Rank of minimum cognate biasc
Soil Colon B. subtilis E. coli B. subtilis S. typhimurium
Ala 2 1 2 1 2 1Arg 1 3 1 2 1 2Asp 2 1 1 2 1 2Glu 1 3 1 2 1 2Gly 3 1 1 2 2 1Lys 1 3 1 2 1 2Ser 3 1 2 1 2 1Thr 2 1 2 1 2 1Trp 1 3 1 2 1 2Tyr 1 3 1 2 1 2
Rank correlation (n = 20)d
r = 0.61 (P < 0.004)Rank correlation (n = 20)e
r = 0.83 (P < 0.00025)
a. If the concentration of amino acid in the environment is low, then the rank is 1; if it is intermediate, then the rank is 2; if it is high, then the rank is 3.b. If cognate bias is lower in B. subtilis, then the rank is 1 for B. subtilis and 2 for E coli; if cognate bias is higher in B. subtilis, then the rank is2 for B. subtilis and 1 for E. coli.c. If cognate bias is lower in B. subtilis, then the rank is 1 for B. subtilis and 2 for S. typhimurium; if cognate bias is higher in B. subtilis, then therank is 2 for B. subtilis and 1 for S. typhimurium.d. Rank correlation calculated using the pair of values (rank of amino acid concentration, rank of minimum cognate bias) for B. subtilis and E. coli.e. Rank correlation calculated using the pair of values (rank of amino acid concentration, rank of minimum cognate bias) for B. subtilis and S.typhimurium.
![Page 15: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/15.jpg)
Cognate amino acid bias in amino acid biosynthesis 1031
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
longer available would be difficult if the enzymes of thebiosynthetic pathway had a composition that was high inthe cognate amino acid, independently of the relativecodon usage for that amino acid.
In addressing this issue we have hypothesized that theenzymes of specific amino acid biosynthetic pathways, orat least those with the greatest influence on the rate ofthe pathway, should be biased towards low values of thecognate amino acid when compared with the entire pro-teome of the organism. In this article, we have presentedseveral lines of evidence that support this cognate biashypothesis.
First, a computer model of the tryptophan biosyntheticpathway in E. coli showed that derepression of the tryp-tophan biosynthetic enzymes would be more compro-mised if Trp residues were more abundant in theseenzymes. The results of this simulation (Fig. 2) suggestthat the extent and rapidity of response may well be selec-tive pressures responsible for low cognate bias.
Second, the prediction of a direct correlation betweenmolecular activity of amino acid biosynthetic enzymes andtheir cognate bias was tested by direct calculation usinginformation from databases for enzyme activities andgenome sequences. A statistically significant direct corre-lation between molecular activity and bias is found forcognate (Table 3) but not for non-cognate amino acids(Table 4).
Third, the prediction of an inverse correlation betweennumber of enzymes in the amino acid biosynthetic path-ways and their cognate bias was tested by direct calcula-tion using information from databases for metabolicpathways and genome sequences. As expected, therewas a statistically significant inverse correlation betweenpathway length and bias for cognate but not for non-cognate amino acids (Table 5).
Fourth, a more detailed enzyme-by-enzyme, pathway-by-pathway and organism-by-organism analysis foundstrong evidence for low cognate bias in approximately75% of the amino acid biosynthetic pathways (Fig. 5). Forfour of the remaining pathways the selection for low biasappears to be masked by other factors, and become evi-dent when the influence of these factors is removed. Forexample, certain biosynthetic enzymes have their cognateamino acid located at highly conserved positions that arekey in determining protein structure and function. Whenthese residues are discounted in the calculation of thecognate bias, the residual composition of the enzyme isdistinctly biased towards low values of the cognate aminoacid. This is the case for Asp, Phe and Tyr biosyntheticenzymes in E. coli (Fig. S2) and for the first enzyme ofthe Thr biosynthetic pathway in B. subtilis.
For three cases in B. subtilis the evidence for cognatebias is less clear. As expected, the first enzyme has thelowest cognate bias in the biosynthetic pathway for Ser,
Thr and Val. However, only in the case of Thr are therehighly conserved cognate residues, which when dis-counted result in a significantly low cognate bias for theenzyme. In the case of Ala, the enzymes are still poorlycharacterized and there is no evidence for low cognatebias. In two pathways, Glu and Gly, additional factorsappear to completely override the selection for low cog-nate bias and yield higher-than-average cognate bias.
Clearly, this type of bias is a general principle thatapplies with varying degrees to any system that exhibitsthis form of positive feedback. For example, an earlierstudy has shown that the atomic composition of somebiosynthetic enzymes is biased against atoms that arefixed in metabolism by those enzymes (Baudouin-Cornuet al., 2001). Also, preliminary results from the analysis ofthe E. coli and S. cerevisiae proteomes (R. Alves and A.Salvador, preliminary unpublished results) suggest thatproteins involved in detoxification of reactive oxygen spe-cies are biased towards low relative content of highlyoxydizable amino acid residues, thus allowing these pro-teins to remain active for longer periods in an oxidizingenvironment. The same preliminary results clearly indi-cate that the relative amount of highly oxidyzable aminoacid residues in proteins expressed under anaerobic con-ditions is significantly greater than that in proteinsexpressed exclusively under aerobic conditions.
Finally, in comparisons of E. coli with S. typhimurium,another closely related enteric Gram-negative organism,and B. subtilis, a more distantly related Gram-positiveorganism, we have observed differences in the detailedprofile of cognate bias (Fig. 5) that might reflect differ-ences in the intensity of selection for low cognate bias.We have argued that selection for low cognate bias isexpected to be more intense for those amino acid biosyn-thetic pathways that must undergo the greatest range andfrequency of derepression. This is likely to be associatedwith low and infrequent abundance of the cognate aminoacid in the organism’s environment.
Although there is great heterogeneity in the amino acidmeasurements, the profile of their relative abundance inthe colon appears to exhibit a number of differences fromthat in soil (Table 1). If one accepts the 10 cases in whichthere appear to be a qualitative difference, and the argu-ment that a higher relative concentration implies weakerselection for low cognate bias, then one can examinewhether these data are consistent with those for cognatebias in Fig. 5. The results of our comparisons show apositive qualitative correlation (Table 7) that further sup-ports the selectionist explanation for low cognate aminoacid bias in amino acid biosynthetic enzymes.
In summary, we have presented several lines of evi-dence showing that cognate bias plays a highly significantrole in shaping the amino acid composition for a largeclass of cellular proteins. The profiles of cognate amino
![Page 16: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/16.jpg)
1032 R. Alves and M. A. Savageau
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
acid bias are similar for two closely related organisms, E.coli and S. typhimurium; they differ for two more distantlyrelated organisms E. coli or S. typhimurium and B. subtilisin ways that show a qualitative relationship to the environ-ments of these organisms. Such differences, if substanti-ated with a broader group of organisms, may serve as a‘finger print’ that reflects their different evolutionary historyand ecological niche.
Experimental procedures
Model organisms
We use E. coli K12, S. typhimurium and B. subtilis as ourmodel organisms. The proteome and genome information forthese organisms was downloaded from the KEGG databaserelease 28.0 (Kanehisa et al., 2002).
Proteins involved in amino acid biosynthesis
We use pathway information available on Ecocyc (Karp et al.,2002), KEGG (Kanehisa et al., 2002), WIT (Overbeek et al.,2000), Herrmann and Somerville (1983) and Neidhardt(1999), and cross-correlate this information to determine thebiosynthetic pathway for each amino acid in each of theorganisms. Table 2 summarizes this information in terms ofgene name, enzyme activity and EC number. Figure 3 showshow the network of amino acid biosynthetic reactions isconnected.
Calculation of molecular activity
We have used the database BRENDA and the referencestherein to obtain estimates for the specific activity of theenzymes involved in amino acid biosynthesis. If this activitywas not available for the specific organism of interest, weused the available value for the organism whose protein hadthe strongest homology to the target enzyme. Molar weightof the enzymes has been estimated by adding the individualweight of all residues of a protein and subtracting the weightof a water mol per peptide bond in the protein. Using thismolar weight we converted specific activity into molecularactivity (Table 2). Whenever an enzyme is known to beformed by multiple subunits, the molecular weight of theenzyme was calculated by adding the weight of each constit-uent subunit together.
Analysis of proteome and genome data
The analysis of relative amino acid composition is performedfrom cDNAs and peptide strings using locally developedPERL scripts.
Statistical analysis of the data
Monte Carlo simulations and statistical analysis of the dataare performed using locally developed PERL scripts andMathematica (Wolfram, 1999) notebooks.
The Spearman rank correlation coefficient determines theexistence of non-linear correlations between sets of data(Cohen and Holliday, 1998). This correlation coefficient isgiven by
(5)
where R(xi) and R(yi) represent the rank of xi and yi in thesample, respectively, and n is the number of pairs in thesample.
To test the significance of r we use the Fisher z-statisticwith the null hypothesis that the correlation coefficient is zero(and thus that there is no correlation in the population for thetested variables). The z-test makes no assumptions about thespecific distribution of the data being analysed. It is wellknown that the variable z, defined as
(6)
has a normal distribution with mean 0 and standard devia-tion 1. The P-value is calculated by determining the quantilefor the absolute value of z in the normal distribution. If P < a(0 < a < 1) there is a likelihood a that the correlation coeffi-cient is in fact 0, i.e. that there is no correlation betweenthe y-values and the x-values in the sample. We have alsocalculated the t-statistics for the different coefficients. How-ever, here we present only P-values as determined fromthe z-statistics, because the significance is lower for oursamples, thus providing a more conservative estimate ofsignificance.
Kinetic modelling
The kinetic modelling is performed using the program PLAS
(Voit and Ferreira, 2000).
Calculating statistical significance of amino acid bias
Determining whether a protein is significantly biased towardslow relative composition for any given amino acid is a two-step process. First, we compare the composition of the pro-tein with that of a reference group. Second, we calculate thesignificance of the difference between the protein and thereference group. We use three different approaches to calcu-late this significance. Two involve MC simulations to deter-mine the significance of the bias in the relative compositionof a protein with respect to a given amino acid in the contextof the E. coli proteome, and the third involves an analyticalcalculation.
In the first MC approach, we randomly generate 1000protein sequences having the same length as our protein ofinterest, assuming a relative amino acid composition that is,on average, the same as that of the reference group. Therelative composition of the individual protein sequences in
rR x R y
nR x R y
R xn
R x R yn
R y
i ii
n
i ii
n
i
n
i ii
n
i
n
i ii
n
i
n=
( ) ( ) - ( ) ( )ÈÎÍ
˘˚̇
( ) - ( )ÈÎÍ
˘˚̇
( ) - ( )ÈÎÍ
˘˚̇
= ==
== ==
 ÂÂ
ÂÂ ÂÂ1 11
2
1
2
1
2
1
2
1
1
1 1
zR x R y
n n n
n n nN
i ii
n
=( ) - ( )[ ] -
-( ) +( )
-( ) +( )=Â
1
2
2 2
1 16
1 136
0 1~ ( . )
![Page 17: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/17.jpg)
Cognate amino acid bias in amino acid biosynthesis 1033
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
this set of 1000 random sequences is then ordered withrespect to the amino acid of interest. Finally, our protein isconsidered to be significantly biased towards low levels of agiven amino acid if its composition with respect to that aminoacid is lower than that of 90% of the random proteinsequences in our set, i.e. if it is below the 10th percentile ofbias.
In the second MC approach, we draw from the referencegroup of proteins, randomly, a set of 1000 proteins, allowingfor repetition. We then order the relative composition of thisset of random proteins with respect to the amino acid ofinterest. Our protein is considered to be significantly biasedtowards low levels of a given amino acid if its compositionwith respect to that amino acid is lower than that of 90% ofthe random proteins in our set, i.e. if it is below the 10thpercentile of bias.
The third approach to estimating significance involves ananalytical calculation. Consider a protein P of length L witha relative composition of c1, . . . , c20 for each of the 20 aminoacid types. The average composition for the set of the control
proteins is given by p1, . . . , p20, with . If the relative
composition of a protein with respect to a given amino acidis independent of all other amino acids (which must be veri-fied), then the probability that a protein of length L (belongingto a set of proteins that, on average, has a relative amountpi of amino acid i) has N residues of amino acid i is given by
(7)
The cumulative probability that a protein of L residues hasno more than N residues of amino acid i is then given by
(8)
Thus if Pi(N) < 0.1 for our protein of interest, there is a 90%chance that the protein is significantly biased towards lowvalues of amino acid i with respect to the control group ofproteins.
Homology comparisons of bacterial enzymes
When structural information was unavailable, we have per-formed sequence homology studies to investigate the possi-bility that a sufficient number of cognate residues might beinvolved in important functional roles. To evaluate this possi-bility, we used PSI-BLAST (Altschul et al., 1997) to search forall the bacterial homologues of the relevant protein in theSWISSPROT database (Boeckmann et al., 2003) that areboth classified as having the same function and have an E-value smaller than 10-4. These sequences were then alignedusing CLUSTALW (Chenna et al., 2003) and conservation ofcognate residues was studied.
Acknowledgements
We thank Dr Armindo Salvador for a critical review of anearlier version of this manuscript and for fruitful discussions.We thank three anonymous reviewers for suggestions thatimproved the clarity of this article. This work was supported
pii=
=Â 11
20
p NL
N L Np pi i
Ni
L N( ) =-( )
-( ) -!! !
1
P NL
r L rp pi i
r
r
N
iL r( ) =
-( )-( )
=
-Â !! !1
1
in part by a grant to M.A.S. from the US Public Health Service(RO1-GM30054) and fellowships to R.A. from the SpanishMinisterio de Educacion, Cultura y Deporte (SB2000-031)and the Portuguese FCT (BPD 11533/2002).
Supplementary material
The following material is available fromhttp://www.blackwellpublishing.com/products/journals/suppmat/mmi.mmi4566/mmi4566sm.htmFig. S1. Important functional residues in different biosyn-thetic enzymes of E. coli.Fig. S2. Alignment of E. coli proteins with the lowest cognatebias for Phe and Tyr to homologous bacterial proteins fromthe SWISSPROT database.Table S1. Spearman rank correlation coefficients.
References
Akashi, H., and Gojobori, T. (2002) Metabolic efficiency andamino acid composition in the proteomes of Escherichiacoli and Bacillus subtilis. Proc Natl Acad Sci USA 99:3695–3700.
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J.,Zhang, Z., Miller, W., and Lipman, D.J. (1997) GappedBLAST and PSI-BLAST: a new generation of protein databasesearch programs. Nucleic Acids Res 25: 3389–3402.
Baudouin-Cornu, P., Surdin-Kerjan, Y., Marliere, P., and Tho-mas, D. (2001) Molecular evolution of protein atomic com-position. Science 293: 297–300.
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C.,Estreicher, A., Gasteiger, E., et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBLin 2003. Nucleic Acids Res 31: 365–370.
Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J.,Higgins, D.G., and Thompson, J.D. (2003) Multiplesequence alignment with the Clustal series of programs.Nucleic Acids Res 31: 3497–3500.
Cohen, L., and Holliday, M. (1998) Statistics for the SocialScientists. New York: Addison-Wesley.
Cootes, A.P., Curmi, P.M., Cunningham, R., Donnelly, C.,and Torda, A.E. (1998) The dependence of amino acid paircorrelation on structural environment. Proteins: StructFunction Genet 32: 175–189.
Dufton, M.J. (1997) Genetic code synonym quotas and aminoacid complexity: cutting the cost of proteins? J Theor Biol187: 165–173.
Elf, J., Nilsson, D., Tenson, T., and Ehrenberg, M. (2003)Selective charging of tRNA isoacceptors explains patternsof codon usage. Science 300: 1718–1722.
Foster, J.W., and Spector, M.P. (1995) How Salmonella sur-vive against the odds. Ann Rev Microbiol 49: 145–174.
Frey, P.A. (2001) Radical mechanisms of enzymatic cataly-sis. Ann Rev Biochem 70: 121–148.
Herrmann, K.M., and Somerville, R.L. (1983) Amino Acids:Biosynthesis and Genetic Regulation. Reading, MA:Addison-Wesley Publishing.
![Page 18: Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes](https://reader035.vdocument.in/reader035/viewer/2022080319/5750648a1a28ab0f079e1916/html5/thumbnails/18.jpg)
1034 R. Alves and M. A. Savageau
© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034
Jansen, R., and Gerstein, M. (2000) Analysis of the yeasttranscriptome with structural and functional categories:characterizing highly expressed proteins. Nucleic AcidsRes 28: 1481–1488.
Kanehisa, M., Goto, S., Sato, K., Fijibuchi, W., and Nakaya,A. (2002) The KEGG database at Genomenet. NucleicAcids Res 30: 42–46.
Karlin, S., and Bucher, P. (1992) Correlation analysis ofamino acid usage in protein classes. Proc Natl Acad SciUSA 89: 12165–12169.
Karp, P.D., Riley, M., Saier, M., Paulsen, I.T., Paley, S., andPellegrini-Toole, A. (2002) The Ecocyc Database. NucleicAcids Res 30: 56–58.
Khodursky, A.B., Peter, B.J., Cozzarelli, N.R., Botstein, D.,Brown, P.O., and Yanofsky, C. (2000) DNA microarrayanalysis of gene expression in response to physiologicaland genetic changes that affect tryptophan metabolism inEscherichia coli. Proc Natl Acad Sci USA 97: 12170–12175.
King, J.L., and Jukes, T.H. (1969) Non darwinian evolution.Science 164: 788–798.
Li, W.-H. (1997) Molecular Evolution. New York: SinauerAssociates.
Lobry, J.R. (1997) Influence of genomic G+C content onaverage amino acid composition of proteins from 59 bac-terial species. Gene 205: 309–316.
Lobry, J.R., and Gautier, C. (1994) Hydrophobicity, expres-sivity and aromaticity are the major trends of amino-acidusage in 999 Escherichia coli chromosome-encodedgenes. Nucleic Acids Res 22: 3174–3180.
Maaløe, O., and Kjeldgaard, N.O. (1966) Control of Macro-molecular Synthesis; a Study of DNA, RNA, and ProteinSynthesis in Bacteria. New York: W.A. Benjamin.
Magnusson, L.U., Nystrom, T., and Farewell, A. (2003)Underproduction of sigma 70 mimics a stringent response.A proteome approach. J Biol Chem 278: 968–973.
Matin, A. (1991) The molecular basis of carbon-starvation-induced general resistance in Escherichia coli. Mol Micro-biol 5: 3–10.
Mazel, D., and Marliere, P. (1989) Adaptive eradication ofmethionine and cysteine from cyanobacterial light-harvest-ing proteins. Nature 341: 245–248.
Neidhardt, F.C. (1999) Escherichia coli and Salmonella: Cel-lular and Molecular Biology. Washington, DC: AmericanSociety for Microbiology.
Neidhardt, F.C., Ingraham, J.L., and Schaechter, M. (1990)Physiology of the Bacterial Cell: A Molecular Approach.Sunderland, MA: Sinauer Ass.
Nystrom, T. (2004) Stationary-phase physiology. Ann RevMicrobiol 58: 161–181.
Overbeek, R., Larsen, N., Pusch, G., D’Souza, M., Selkov,E., Jr, Kyrpides, N., et al. (2000) WIT: integrated systemfor high-throughput genome sequence analysis andmetabolic reconstruction. Nucleic Acids Res 28: 123–125.
Pedersen, J.Z., and Finazzi-Agro, A. (1993) Protein-radicalenzymes. FEBS Lett 325: 53–58.
Pramanik, J., and Keasling, J.D. (1998) Effect of Escherichiacoli biomass composition on central metabolic fluxes pre-dicted by a stoichiometric model. Biotech Bioeng 60: 230–238.
Reeve, C.A., Bockman, A.T., and Matin, A. (1984) Role ofprotein degradation in the survival of carbon-starvedEscherichia coli and Salmonella typhimurium. J Bacteriol157: 758–763.
Richmond, R.C. (1970) Non darwinian evolution: a critique.Nature 225: 1025–1028.
Rogers, M.S., and Dooley, D.M. (2003) Copper-tyrosyl radi-cal enzymes. Curr Opin Chem Biol 7: 189–196.
Sauer, U., Hatzimanikatis, V., Hohmann, H.P., Manneberg,M., van Loon, A.P., and Bailey, J.E. (1996) Physiologyand metabolic fluxes of wild-type and riboflavin-producing Bacillus subtilis. Appl Environ Microbiol 62:3687–3696.
Savageau, M.A. (1983) Escherichia coli habitats, cell types,and molecular mechanisms of gene control. Am Nat 122:732–744.
Seligmman, H. (2003) Cost-minimization of amino acidusage. J Mol Evol 56: 151–161.
Singer, G.A., and Hickey, D.A. (2000) Nucleotide bias causesa genomewide bias in the amino acid composition of pro-teins. Mol Biol Evol 17: 1581–1588.
Trifonov, E.N. (1987) Translation framing code and frame-monitoring mechanism as suggested by the analysis ofmRNA and 16 S rRNA nucleotide sequences. J Mol Biol194: 643–652.
Voit, E.O., and Ferreira, A.E.N. (2000) Computational Analy-sis of Biochemical Systems, a Practical Guide for Biochem-ists and Molecular Biologists. Cambridge, UK: CambridgeUniversity Press.
Weichart, D., Querfurth, N., Dreger, M., and Hengge-Aronis,R. (2003) Global role for ClpP-containing proteases in sta-tionary phase adaptation of Escherichia coli. J Bacteriol185: 115–125.
Wolfram, S. (1999) The Mathematica Book. New York:Cambridge University Press.
Xiu, Z.-L., Chang, Z.-Y., and Zeng, A.-P. (2002) Nonlineardynamics of regulation of bacterial trp operon: model anal-ysis of integrated effects of repression, feedback inhibition,and attenuation. Biotechnol Prog 18: 686–693.