evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes

Molecular Microbiology (2005)

56

(4), 1017–1034 doi:10.1111/j.1365-2958.2005.04566.x

© 2005 Blackwell Publishing Ltd

Blackwell Science, LtdOxford, UKMMIMolecular Microbiology0950-382XBlackwell Publishing Ltd, 2005

? 2005

56

410171034

Original Article

Cognate amino acid bias in amino acid biosynthesisR. Alves and M. A. Savageau

Accepted 5 January, 2005. *For correspondence. [email protected]; Tel. (

+

1) 530 754 8375; Fax (

+

1) 530 7545739.

Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes

Rui Alves

1,2

and Michael A. Savageau

1

*

1

Biomedical Engineering Department, University of California – Davis, Davis, CA, USA.

2

Biomathematics and Biostatistics Group, Departament Ciencies Mediques Basiques, Universidad de Lleida, Spain.

Summary

If the enzymes responsible for biosynthesis of a givenamino acid are repressed and the cognate amino acidpool suddenly depleted, then derepression of theseenzymes and replenishment of the pool would beproblematic, if the enzymes were largely composed ofthe cognate amino acid. In the proverbial ‘Catch 22’,cells would lack the necessary enzymes to make theamino acid, and they would lack the necessary aminoacid to make the needed enzymes. Based on thisscenario, we hypothesize that evolution would lead tothe selection of amino acid biosynthetic enzymes thathave a relatively low content of their cognate aminoacid. We call this the ‘cognate bias hypothesis’. Herewe test several implications of this hypothesisdirectly using data from the proteome of

Escherichiacoli

. Several lines of evidence show that low cognatebias is evident in 15 of the 20 amino acid biosyntheticpathways. Comparison with closely related

Salmo-nella typhimurium

shows similar results. Comparisonwith more distantly related

Bacillus subtilis

showsgeneral similarities as well as significant differencesin the detailed profiles of cognate bias. Thus, selec-tion for low cognate bias plays a significant role inshaping the amino acid composition for a large classof cellular proteins.

Introduction

Proteins are versatile effectors and mediators of cellularresponse. They catalyse reactions, serve as structuralcomponents of the cell and mediate cellular adaptationthrough sensing and signal transduction. Structure andfunction of proteins are subject to natural selection (King

and Jukes, 1969; Richmond, 1970; Li, 1997) and,because structure and function are ultimately determinedby the sequence of amino acids in proteins, it stands toreason that the amino acid composition of proteins is alsosubject to selection.

Previous studies have identified different types of selec-tive pressure that are important in determining the relativeamino acid composition for the proteins of an organism.Differences in the mutational bias of the different codonsfor each amino acid can partly account for the differencesin relative amino acid composition (see Lobry, 1997;Singer and Hickey, 2000; Akashi and Gojobori, 2002;Seligmman, 2003 and references therein). The metaboliccost of synthesizing an amino acid, in terms of ATP andreducing equivalents, is also important in determiningwhich amino acids are more prevalent in a proteome. Thecheaper an amino acid is to synthesize, the more it is used(see Karlin and Bucher, 1992; Lobry and Gautier, 1994;Dufton, 1997; Jansen and Gerstein, 2000; Akashi andGojobori, 2002; Seligmman, 2003 and referencestherein). Functional reasons that justify differential usageof amino acids in a given group of proteins have also beenidentified (Trifonov, 1987; Mazel and Marliere, 1989; Karlinand Bucher, 1992). For example, membrane proteins arebiased towards high relative composition of hydrophobicamino acids (Karlin and Bucher, 1992).

Genes coding for amino acid biosynthetic enzymes arerepressed in a medium where the cognate amino acid ispresent.

Escherichia coli

and many other bacteria typicallyderepress the expression of a small set of enzymes when-ever there is a need to synthesize any given amino acidor set of amino acids. These are usually encoded in anoperon or regulon, and their expression tends to beco-ordinated (see Herrmann and Somerville, 1983;Neidhardt, 1999 for reviews). When growing in a mediumwith low amino acid content, a significant fraction of cel-lular protein consists of enzymes involved in amino acidbiosynthesis (Maaløe and Kjeldgaard, 1966; Neidhardt

et al

., 1990).Although cells have general mechanisms such as

induction of proteolyses and activation of the stringentresponse for remodelling the amino acid content of pro-teins when the organisms is stressed (Reeve

et al

., 1984;Matin, 1991; Foster and Spector, 1995; Magnusson

et al

.,2003; Weichart

et al

., 2003; Nystrom, 2004) by amino acidlimitation, they also are likely to have more specific mech-

1018

R. Alves and M. A. Savageau

© 2005 Blackwell Publishing Ltd,

Molecular Microbiology

,

56

, 1017–1034

anisms. For example, if cells were growing in a richmedium and suddenly one of the exogenously suppliedamino acids became depleted, then derepression of thecorresponding biosynthetic enzymes and replenishmentof the intracellular pool of that amino acid would bedelayed and limited if the enzymes were largely com-posed of the cognate amino acid. Based on such a sce-nario, we hypothesize that evolution would lead to theselection of amino acid biosynthetic enzymes that have arelatively low content of their cognate amino acid, thusavoiding the ‘Catch 22’ situation in which the biosyntheticenzymes cannot be synthesized for lack of the amino acidand the amino acid cannot be synthesized for lack of thebiosynthetic enzymes.

To explore the dynamics of this situation, we first adaptan existing computer model for an amino acid biosyntheticpathway (Xiu

et al

., 2002) and show that low cognate biascorrelates with a greater extent of derepression of thepathway and with faster response times for this derepres-sion. This suggests that our cognate-bias hypothesis isreasonable and that response time may well be the selec-tive pressure for the low bias. We test several implicationsof this hypothesis directly using data from the well-char-acterized organism

Escherichia coli

, and we comparethese results with the results from similar tests for aclosely related Gram-negative organism

Salmonella typh-imurium

and for a more distantly related Gram-positiveorganism

Bacillus subtilis.

For each organism, we calculate the amino acid com-position of proteins that are involved in the amino acidbiosynthetic pathways and compare their compositionwith that of larger groups of proteins from the same organ-ism, including the entire proteome. We find that mostamino acid biosynthetic pathways in each of the organ-isms do have a low cognate bias. There are a few excep-tions and in some cases there are functional reasons thatcan account for this. The closely related organisms havevery similar profiles of cognate bias, whereas the moredistantly related organisms have profiles with significantdifferences that may reflect their different evolutionary his-tory and ecological niche.

Results

Amino acid composition

To determine whether a protein or a group of proteins hasa relative amino acid composition that is significantly dif-ferent from that of a larger group to which it belongs, onemust first determine the composition of the larger group.Choosing an appropriate group of proteins to serve as acontrol for the calculation of average amino acid compo-sition requires careful consideration. The relative compo-sition of the control group, which is then used to estimatethe probability of amino acid occurrence in a protein,

should be a weighted average of the proteins being syn-thesized by the cell. This is so because the synthesis ofan amino acid biosynthetic enzyme during derepressionand the synthesis of all other proteins being expressed atthe same time compete for the limiting amino acid. Therelative amino acid composition of the protein complementin growing cells provides an estimate of the average com-position of these proteins. Proteins expressed at low lev-els have a small contribution to the overall amino acidcomposition of cellular protein, whereas proteins with ahigh level of expression have a large contribution.

Thus, we have searched the literature for experimentaldeterminations of the amino acid composition in theprotein complement of the bacteria studied in this work.Having found such studies we then compared the exper-imentally determined composition with the composition ofother groups of cellular proteins. We found that the relativeamino acid composition of the protein complement ingrowing cells is almost identical to that of the cellularproteome determined from the DNA sequence (Table 1).For each genome, we also have calculated the relativecomposition for the entire group of non-enzymatic pro-teins, for the entire group of enzymatic proteins, and foreach more specialized group of enzymes within an ECclassification (i.e. classes 1 through 6). Class 1 includesall oxyreductase enzymes, class 2 all transferases, class3 all hydrolases, class 4 all lyases, class 5 all isomerasesand class 6 all ligases.

The results of the analysis presented in this article donot differ significantly when the different groups of proteinsare used to calculate the probability of amino acid occur-rence. Therefore, we present only the results based onthe relative amino acid composition of the proteome tocalculate the probability of amino acid occurrence.

Verifying two basic assumptions

There are basic assumptions involving each of the twotypes of Monte Carlo (MC) simulations that we have usedto determine the statistical significance of our compari-sons. The first MC approach assumes that, with respectto the relative amino acid composition, there is no strongcorrelation between any two different types of aminoacids. To test the validity of this assumption for the

E. coli

proteome, we have calculated the Spearman correlationcoefficient between the relative amounts of any two aminoacids in the proteins. These correlation coefficients aresmall, which supports, to a first approximation, our firstassumption (Table S1). (This has no implications regard-ing finer detail correlations between neighbouring aminoacids or other factors that have been shown to influencethe selection of amino acids at any given location in aprotein; Cootes

et al

., 1998.) The second MC approachassumes that the relative composition of a protein is inde-

Cognate amino acid bias in amino acid biosynthesis

1019



,

56

, 1017–1034

pendent of the protein length. To test the validity of thisassumption, we have calculated the Spearman correlationcoefficient between the relative amounts of each aminoacid and the length of the

E. coli

proteins. These correla-tions are also small, which supports the second assump-tion (Table S1).

Having verified that the two assumptions above are, toa first approximation, correct allows us to calculate, inclosed form, the significance of the amino acid biasfor any given protein (see

Experimental procedures

).Although we have used these three different approachesto calculate how significantly biased our proteins are andhave used 10 different control groups, for each protein ofinterest and for each method of calculation and controlgroup, the differences are at most a few per cent (data notshown). Therefore, we shall only present and discuss thedata for the analytical approach using the entire proteomeas the control group.

By using the analytical approach, we estimate the cog-nate amino acid bias of a protein by the probability (

P

-value) that the relative cognate amino acid compositionof a protein is below (low bias) or above (high bias) thatof the control group (see


fordetails).

Effect of cognate bias on time for amino acid recovery

Amino acid biosynthetic pathways in bacteria are repress-ible (Herrmann and Somerville, 1983; Neidhardt, 1999).For example, when growing in a medium in which anamino acid is available,

E. coli

cells typically repress

expression of the genes that code for enzymes of thecognate pathway. This situation is represented in Fig. 1.When the proteins of a pathway that synthesize a givenamino acid have a composition that is enriched for thatamino acid (high cognate bias), it is likely that this highcognate bias will tend to prevent or delay the recovery ofamino acid levels when cells are shifted from a mediumthat is rich in the amino acid to one that is poor.

To analyse this hypothesis in a specific case we use apreviously developed model (Xiu

et al

., 2002) of Trp bio-synthesis in

E. coli

. The original normalized equations arethe following

Table 1.

Average relative amino acid composition of bacterial proteins and of two different environments.

Amino acid

E. coli

a

S. typhimurium

a

B. subtilis

a

E. coli

b

S. typhimurium

b

B. subtilis

b

Soil

c

Intestine

c

Ala 0.093 0.098 0.077 0.112 – 0.045 Intermediate LowArg 0.054 0.012 0.008 0.050 – 0.064 Low HighAsn 0.040 0.052 0.052 0.050 – 0.037 Low LowAsp 0.051 0.056 0.072 0.050 – 0.037 Intermediate LowCys 0.012 0.039 0.045 0.017 – 0.013 Low LowGln 0.045 0.074 0.069 0.056 – 0.072 Low LowGlu 0.057 0.023 0.023 0.056 – 0.072 Low HighGly 0.072 0.059 0.074 0.086 – 0.058 High LowHis 0.021 0.043 0.071 0.017 – 0.024 Low LowIle 0.061 0.110 0.096 0.046 – 0.067 Low LowLeu 0.114 0.028 0.028 0.091 – 0.086 Low LowLys 0.045 0.038 0.039 0.056 – 0.090 Low HighMet 0.026 0.045 0.037 0.024 – 0.032 Low LowPhe 0.039 0.044 0.038 0.034 – 0.055 Low LowPro 0.045 0.056 0.041 0.042 – 0.035 Low LowSer 0.056 0.058 0.063 0.049 – 0.043 High LowThr 0.055 0.055 0.054 0.053 – 0.042 Intermediate LowTrp 0.015 0.070 0.068 0.011 – 0.021 Low HighTyr 0.030 0.015 0.010 0.028 – 0.038 Low HighVal 0.069 0.029 0.035 0.072 – 0.068 Low Low

a.

Calculated from the translated version of coding sequences in the genomes.

b.

Experimental determination of the amino acid fraction in the cell after total protein purification and hydrolysis. The values for

B. subtilis

havebeen calculated from Sauer

et al

. (1996). The values for

E. coli

have been calculated from Pramanik and Keasling (1998).

c.

Qualitative amino acid make-up of two different environments (Savageau, 1983) that are relevant for these organisms.

Fig. 1.

Schematic model of a specific amino acid biosynthetic path-way in its cellular context.

X

1

– mRNA coding for the enzymes of the pathway that synthesizes the

k

th amino acid;

X

2

– enzymes of the pathway that synthesizes the

k

th amino acid;

X

3

– cognate amino acid (

k

th) of the biosynthetic pathway. See text for further discussion.

Prek AAk

NA mRNAk

AA20

AA1

•••

•••

AA20

AA1

•••

•••

Enzk

X1

X2

X3

1020

R. Alves and M. A. Savageau



,

56

, 1017–1034

(1)

(2)

(3)

X

1

represents the concentration of the mRNA that codesfor the enzymes of the biosynthetic pathway. The synthe-sis of mRNA is repressed by an increase in the concen-tration of the amino acid (

X

3

) that is synthesized by thepathway. The term

k

3

/(

k

3

+

X

3

) in Eq. 1 represents theattenuation by the leader peptide of the operon in thepresence of Trp. The term (1

+

X

3

)/{1

+

[1

+

k

1

X

3

/(

k

2

+

X

3

)]

X

3

]} is a normalized function of the effect that Trphas on the repressor protein and on the repressor bindingto the operator, assuming rapid equilibrium of both reac-tions. The decay of the mRNA molecules is a first-orderprocess. The mRNA molecules are templates for the syn-thesis of the enzymes (

X

2

) in the biosynthetic pathway. Trp(

X

3

) synthesis is an enzymatic process whose rate isdescribed by

k

5

X

2

/(

k

5

+

X

32

). The free Trp pool can bedepleted by dilution (

k

6

X

3

), binding to the repressor {[

X

3

/(1

+

X

3

)][

k

6

X

3

/(

k

7

+

X

3

)]} or usage in protein synthesis[

k

8

X

3

/(

k

9

+

X

3

)] (for further details on the form of Eqs 1 to3, see Xiu

et al

., 2002).We have modified this model to include the possibility

of an exogenous supply for Trp (

k

11

). The equation thataccounts for the time-dependent behaviour of the enzymeconcentration (

X

2

) is not explicitly dependent on the Trpconcentration (

X

3

) in the original model. This is a justifiedapproximation because the number of Trp residues is verylow in the enzymes that catalyse Trp biosynthesis. Nowimagine an organism that is identical to the first, exceptthat the number of Trp residues in the Trp-biosyntheticenzymes is large. In this situation, the influence of Trpconcentration on the rate of synthesis of the Trp-biosyn-thetic enzymes needs to be made explicit. To do this, wemodify the rate of enzyme production in Eq. 2 to includea Henri-Michaelis-Menten dependence on the concentra-tion of the cognate amino acid. The new equation thatdescribes the time-dependent behaviour of the enzymelevel (

X

2

) is now

(4)

When

K

M

=

0, Eq. 4 is the same as Eq. 2. The larger the

K

M

, the larger the relative amount of cognate amino acidin the biosynthetic enzymes.

ddXt

Xk X

k XX

kk X

k X1 3

1 3

2 33

3

3 34 1

1

1 1=

+( )

+ ++

ÊË

ˆ¯

ÈÎÍ

˘˚̇

+( )-

ddXt

k X k X210 1 1 2= -

ddXt

kk X

k Xk X

XX

k Xk X

k Xk X

311

5 2

5 32 6 3

3

3

6 3

7 3

8 3

9 31= +

+( )- -

+( ) +( )-

+( )

ddXt

k X Xk X

k XM

2 10 1 3

31 2=

+( )-

We use Eqs 1, 3 and 4 and parameter values from Xiuet al. (2002) to simulate the following experiment involv-ing biosynthetic enzymes with increasing levels of cog-nate amino acid in their composition. Let bacteria growexponentially in a Trp-rich medium until a steady statehas been achieved and then, at time zero, switch them tovarious media, with different lower amounts of Trp. Thiswill lead to the derepression of the Trp-biosyntheticenzymes. The results of such an experiment are shownfor three different shifts (Fig. 2). The lower the Trp levelsin the poor medium, the higher the derepressed enzymelevels. For increasing relative composition of Trp in thebiosynthetic enzymes, the organism will take longer toproduce a similar amount of enzyme and thus to set upan appropriate response to the challenge of amino aciddepletion. Furthermore, as the relative amount of aminoacid in the biosynthetic enzymes increases, the aminoacid levels in the new steady state decrease. For a largedepletion of amino acid in the medium, biosyntheticenzymes with high relative amino acid compositionexhibit an initial bout of synthesis, but then fail to besynthesized at a steady state rate (compare among pan-els A, C and E of Fig. 2) sufficient to produce acceptableamino acid levels (compare among panels B, D and F ofFig. 2).

Thus, in nature, with highly variable environmentallysupplied Trp levels, the cells with a higher Trp content intheir Trp-biosynthetic enzymes would be out-competed bythose with a lower Trp content. The regulatory loops in thebiosynthesis of other amino acids are similar to the onewe have analysed, which suggests that compositioneffects on temporal responses are common phenomena.

Correlation between cognate bias and molecular activity

The specific activity of an enzyme is determined by theproduct of its molecular activity and the number of enzymemolecules. For a given specific activity, those enzymeswith the lowest molecular activity require the largestnumber of molecules and their synthesis consumes thelargest amount of the cognate amino acid. During thecritical phase of derepression, the most rate-determiningenzymes in amino acid biosynthetic pathways will beunder strong selection to minimize the content of theircognate amino acid. Hence, we expect the cognate biasof these enzymes to be directly correlated with theirmolecular activity.

Estimates of the specific activity for most enzymesinvolved in amino acid biosynthetic pathways can be foundeither in the primary literature or in the BRENDA data-base. Specific activity is determined by the amount ofreaction catalysed during each time unit by a fixed weightof enzyme (usually having units of mM min-1 mg -1). Weestimate the molecular weight of the enzymes by adding

Cognate amino acid bias in amino acid biosynthesis 1021

© 2005 Blackwell Publishing Ltd, Molecular Microbiology, 56, 1017–1034

the weight of their amino acid residues and subtractingthe weight of one water molecule per peptide bond. Usingthis information, together with the specific activity, we cancalculate a molecular activity for each of the enzymes.However, one needs to keep in mind that the purificationmethods and conditions under which the specific activitieshave been determined for the different enzymes are notthe same. Thus, it is likely that some errors are introducedin the calculations. The numbers for both the measuredspecific activity and the calculated molecular activity of theenzymes graphically represented in Fig. 3 are shown inTable 2.

As indicated above, we expect the most rate-determin-ing enzymes in the amino acid biosynthetic pathways tohave a cognate bias, given by a P-value, that is positivelycorrelated with molecular activity. That is, the lower themolecular activity, the lower the cognate bias (P-value).The data in Table 3 show a statistically significant positivecorrelation between cognate bias and molecular activityfor the amino acid biosynthetic enzymes from each of thethree organisms. This supports the cognate bias hypoth-esis and suggests that fast recovery of amino acid poolsis a significant pressure in determining the cognate aminoacid composition of biosynthetic enzymes. The strengthand significance of this pressure is greater for the entericbacteria than for B. subtilis.

As controls, we have determined correlations betweencognate amino acid bias and three other factors that mightinfluence amino acid composition of amino acid biosyn-thetic enzymes. As shown in Table 3, these correlationsare much less significant. We also have determined thecorrelation between bias of each amino acid (non-cognateas well as cognate) and molecular activity for each of theproteins involved in amino acid biosynthesis (Table 4). Thecorrelations are in general low and non-significant, whichshows that the significant correlations are specific for thecognate amino acid.

Methionine provides a notable exception to this patternby exhibiting a statistically significant positive correlationbetween amino acid bias and molecular activity for all theamino acid biosynthetic enzymes. As the first amino acidof a protein is always Met, if enzymes contain fewer inter-nal Met residues, then the additional free Met residuescan be used to synthesize additional peptide chains andthus boost the velocity of the process in which theenzymes are involved. The selection for this effect shouldbe stronger when the molecular activity of an enzyme islower and the required rate of enzyme synthesis is corre-spondingly higher. This correlation should extend to allcellular enzymes, not just amino acid biosyntheticenzymes. However, testing this hypothesis is currently notfeasible because there is not enough information regard-

Rel

ativ

e pr

otei

n le

vels

00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

500 1000 1500 2000

Time

2500 3000 3500

A

Rel

ativ

e am

ino

acid

leve

ls

0

1E+00

1E-01

1E-02

1E-03

1E-04500 1000 1500 2000

Time

2500 3000 3500

BR

elat

ive

prot

ein

leve

ls

00

0.020.040.060.080.10

0.140.12

0.160.180.20

500 1000 1500 2000Time

2500 3000 3500

C

Rel

ativ

e am

ino

acid

leve

ls

0

1E+00

1E-01

1E-02

1E-03

1E-04500 1000 1500 2000

Time2500 3000 3500

D

Rel

ativ

e pr

otei

n le

vels

00

0.2

0.4

0.6

0.8

1.0

1.2

1.4

500 1000 1500 2000

Time

2500 3000 3500

E

Rel

ativ

e am

ino

acid

leve

ls

0

1E+00

1E-01

1E-02

1E-03

1E-04500 1000 1500 2000

Time

2500 3000 3500

F

Fig. 2. Time-course of derepression and amino acid recovery for the computer model of the tryptophan biosynthetic pathway from E. coli. Cells growing in a medium with excess Trp (k11 = 1) are switched to media containing vari-ous lower amounts of Trp (k11 < 1): A and B (k11 = 0.5), C and D (k11 = 0.41), E and F (k11 = 0, which corresponds to no Trp). Expres-sion of the trp operon undergoes derepression and is allowed to reach a new steady state. The upper curve in each panel corresponds to the case in which the rate of enzyme synthesis is independent of Trp concentration (kM = 0), and the curves then decrease in the order of increasing dependence on Trp concentration (kM = 0, 0.1, 1, 10, 20, 50, 100, 500). The steady state concentrations of Trp and of the Trp-bio-synthetic enzymes in a Trp-reduced media decrease with increasing kM.A, C and E. Dimensionless time-course for pro-tein levels, which are normalized with respect to the maximum derepressed steady-state value in (E).B, D and F. Dimensionless time-course for intra-cellular amino acid levels, which are normalized with respect to the same initial value.Note that the y-axes changes scale progres-sively from panel to panel in order to show differences while accommodating the increas-ing degrees of derepression. See text for fur-ther discussion.

1022 R. Alves and M. A. Savageau


Tab

le 2

.E

nzym

es o

f Esc

heric

hia

coli,

Sal

mon

ella

typh

imur

ium

and

Bac

illus

sub

tilis

that

are

invo

lved

in th

e bi

osyn

thet

ic p

athw

ay fo

r ea

ch a

min

o ac

id.

Am

ino

Aci

dE

C n

umbe

r

Gen

eS

peci

fic a

ctiv

ity (

mM m

in-1

mg-1

)M

olec

ular

act

ivity

(M

ol r

eact

ant

s-1 M

ol

enzy

me-1

)

E. c

oli

S. t

yphi

mur

ium

B. s

ubtil

isE

. col

iS

. typ

him

uriu

mB

. sub

tilis

E. c

oli

S. t

yphi

mur

ium

B. s

ubtil

is

Ala

nine

2.6.

1.66

avtA

avtA

–0.

0196

0.01

96–

0.01

510.

0152

–2.

6.1.

42ilv

Eilv

Eyw

aA15

.915

.915

.99.

059.

0210

.75.

1.1.

1al

ral

ral

r, yn

cD13

391

014

3–

0.13

010

3A

rgin

ine

2.3.

1.1

argA

argA

argA

133

133

133

109

109

96.1

2.7.

2.8

argB

argB

argB

0.54

00.

540

0.54

00.

244

0.24

30.

249

1.2.

1.38

argC

argC

argC

0.95

00.

950

0.95

00.

569

0.56

90.

603

2.6.

1.11

argD

argD

argD

11.2

11.2

11.2

8.17

8.15

7.63

3.5.

1.16

argE

argE

amhX

800

800

800

564

562

567

2.1.

3.3

argF

–ar

gF29

00–

265

1780

–15

32.

1.3.

3ar

gIar

gI–

2900

1900

–17

8011

60–

6.3.

4.5

argG

argG

argG

12.8

12.8

4.54

10.6

11.1

3.39

4.3.

2.1

argH

argH

argH

0.38

00.

380

0.38

00.

319

0.32

00.

329

Asp

arag

ine

6.3.

5.4

asnB

asnB

asnB

, as

nH,

asnO

0.30

00.

300

0.30

00.

313

0.31

30.

353

6.3.

1.1

asnA

asnA

–57

.357

.3–

35.0

35.1

–3.

5.1.

1ia

aA,

ybik

iaaA

, yb

ikan

sA–

–32

.0–

–19

.4A

spar

tate

2.6.

1.1

aspC

aspC

aspB

232

232

220

168

168

158

4.3.

1.1

aspA

aspA

ansB

167

167

320

150

145

19.4

6.3.

5.4

––

asnO

, as

nH,

asnB

––

0.30

0–

–0.

353

3.5.

1.1

––

ansA

––

32.0

––

19.4

Cys

tein

e2.

5.1.

47cy

sK; c

ysM

cysK

; cys

Mcy

sK,

yrhA

, yt

kP6.

3011

0025

.53.

6239

813

.92.

3.1.

30cy

sEcy

sEcy

sE71

.639

771

.635

.019

428

.8G

luta

min

e6.

3.1.

2gl

nAgl

nAgl

nA15

323

115

313

218

712

86.

3.5.

5ca

rAB

carA

B–

4.16

4.16

–1.

200.

890

–3.

5.1.

12–

–yl

aM,

ybgJ

––

716

––

406

Glu

tam

ate

1.4.

1.13

gltB

Dgl

tBD

gltA

B18

.618

.623

.072

.88.

3212

.41.

4.1.

4gd

hAgd

hA–

250

231

–20

218

7–

1.4.

1.2

––

rocG

––

80.0

––

62.2

Gly

cine

2.1.

2.1

glyA

glyA

glyA

13.6

13.6

13.6

10.3

10.3

10.3

His

tidin

e2.

4.2.

17hi

sGhi

sGhi

sG54

454

454

430

230

121

43.

6.1.

31-3

.5.4

.19

hisI

hisI

hisI

0.00

300

0.00

300

332

0.00

114

0.00

113

132

5.3.

1.16

hisA

hisA

hisA

7.80

7.80

7.80

3.40

3.39

3.45

2.4.

2.-

hisH

Fhi

sHF

hisF

H0.

900

0.90

00.

900

0.32

50.

0523

0.05

064.

2.1.

19-3

.1.3

.15

hisB

hisB

hisB

5.70

2020

000.

310

3.84

1350

000.

111

2.6.

1.9

hisC

hisC

hisC

1890

1890

325

1240

1250

217

3.1.

3.15

––

hisJ

––

427

––

217

1.1.

1.23

hisD

hisD

hisD

15.3

14.3

3.60

11.8

10.9

2.77

Isol

euci

ne1.

2.4.

1–

–ac

oAB

, pd

hAB

––

0.12

0–

–0.

0056

74.

3.1.

19ilv

Ailv

A–

230

683

–21

564

0–

2.2.

1.6

ilvG

M; i

lvB

N; i

lvIH

ilvG

M; i

lvB

N; i

lvIH

ilvB

N40

0040

0040

0050

045

681

21.

1.1.

86ilv

Cilv

Cilv

C1.

901.

911.

911.

711.

721.

194.

2.1.

9ilv

Dilv

Dilv

D63

.063

.063

.067

.635

.762

.52.

6.1.

42ilv

Eilv

Eilv

E27

.327

.327

.315

.54.

3418

.3Le

ucin

e2.

3.3.

13le

uAle

uAle

uA14

.514

.77.

1013

.814

.16.

734.

2.1.

33le

uCD

leuC

Dle

uCD

0.07

20.

0720

6.18

0.00

800

0.00

527

0.98

51.

1.1.

85le

uBle

uBle

uB0.

085

35.5

10.5

0.05

6135

.56.

992.

6.1.

42ilv

Eilv

Eyb

gE57

.053

.427

.332

.553

.418

.01.

4.1.

9–

–bc

d–

–11

0–

–73

.3

For

eac

h am

ino

acid

, ita

liciz

ed t

ype

indi

cate

s th

e ge

ne fo

r th

e en

zym

e in

its

clas

sica

l bio

synt

hetic

pat

hway

. Bol

d ita

liciz

ed t

ype

indi

cate

s an

aux

iliar

y en

zym

e th

at h

as b

een

obse

rved

to

repl

ace

an e

nzym

e in

the

clas

sica

l pa

thw

ay,

at l

east

in

vitr

o, b

ut t

hat

is n

ot u

sual

ly c

onsi

dere

d pa

rt o

f th

e bi

osyn

thet

ic p

athw

ay.

We

wer

e un

able

to

find

appr

opria

te e

stim

ates

for

the

act

ivity

of

enzy

me

2.3.

1.46

in

met

hion

ine

bios

ynth

esis

.



Lysi

ne2.

7.2.

4ly

sCly

sCly

sC5.

695.

6930

.04.

604.

6121

.81.

2.1.

11as

das

das

d14

514

573

.096

.797

.046

.04.

2.1.

52da

pAda

pAda

pA10

010

045

852

.152

.123

71.

3.1.

26da

pBda

pBda

pB39

839

856

.419

119

127

.72.

3.1.

117

dapD

dapD

dapD

36.0

36.0

36.0

17.9

17.9

15.0

2.6.

1.17

dapC

; yfd

Zda

pC; y

fdZ

yugH

5.33

5.33

5.33

4.10

4.10

3.76

3.5.

1.18

dapE

dapE

ytjP

3.33

3.33

3.33

2.31

2.31

1.99

5.1.

1.7

dapF

dapF

dapF

18.7

18.7

18.7

9.45

9.49

9.62

4.1.

1.20

lysA

lysA

lysA

7.50

7.50

28.3

5.77

5.75

22.9

Met

hion

ine

2.3.

1.46

met

Am

etA

met

B–

––

––

–2.

5.1.

48m

etB

met

Byj

cI10

.018

.210

.06.

9212

.66.

954.

4.1.

8m

etC

met

Cyj

cJ24

87.

907.

9017

95.

645.

592.

1.1.

13m

etH

met

H–

9.30

9.30

–21

.121

.6–

2.1.

1.14

met

Em

etE

met

C2.

502.

500.

240

3.53

3.53

0.34

72.

1.1.

10–

–yb

gG–

–1.

37–

–0.

793

Phe

nyla

lani

ne5.

4.99

.5-4

.2.1

.51

pheA

pheA

pheA

, ar

oA,

aroH

52.0

32.0

17.5

37.3

22.8

4.23

2.6.

1.57

tyrB

tyrB

–17

017

0–

123

123

–2.

6.1.

9–

–ar

oJ–

–32

3–

–21

6P

rolin

e2.

7.2.

11pr

oBpr

oBpr

oJ,

proB

12.7

12.7

12.7

8.25

8.27

8.52

1.2.

1.41

proA

proA

proA

28.2

28.2

28.2

21.0

21.0

21.3

1.5.

1.2

proC

proC

proG

H25

5025

5028

012

0011

9014

1S

erin

e1.

1.1.

95se

rAse

rAse

rA9.

709.

7015

.67.

147.

1114

.82.

6.1.

52se

rCse

rCse

rC15

.015

.015

.09.

949.

9510

.03.

1.3.

3se

rBse

rBrs

bP3.

003.

003.

001.

751.

752.

300

Thr

eoni

ne2.

7.2.

4-1.

1.1.

3th

rAth

rAly

sC,

dapG

, yc

lM5.

695.

6930

.08.

458.

4121

.51.

2.1.

11as

das

das

d14

514

50.

0400

97.0

97.0

0.02

521.

1.1.

3–

–ho

m–

–51

.0–

–40

.32.

7.1.

39th

rBth

rBth

rB3.

103.

103.

101.

741.

721.

724.

2.3.

1th

rCth

rCth

rC7.

707.

708.

806.

046.

035.

49Tr

ypto

phan

4.1.

3.27

trpD

Etr

pDE

trpE

2.60

0.38

00.

380

1.30

0.04

570.

368

2.4.

2.18

trpD

trpD

trpD

177

–1.

582.

46–

0.94

85.

3.1.

24-4

.1.1

.48

trpC

trpC

trpF

8.78

2.30

8.78

7.24

1.89

3.52

4.1.

1.48

––

trpC

––

3.70

––

1.72

4.2.

1.20

trpA

Btr

pAB

trpA

B12

512

52.

8020

.039

.71.

07Ty

rosi

ne5.

4.99

.5-1

.3.1

.12

tyrA

tyrA

aroA

, ar

oH,

pheA

52.0

52.0

17.5

36.4

36.3

4.23

2.6.

1.57

tyrB

tyrB

aroJ

70.2

70.2

323

50.9

50.9

216

Val

ine

1.2.

4.1

––

acoA

B,

pdhA

B–

–0.

120

––

0.00

567

4.3.

1.19

ilvA

ilvA

–23

068

3–

215

640

–2.

2.1.

6ilv

GM

; ilv

BN

; ilv

IHilv

GM

; ilv

BN

; ilv

IHilv

BN

4000

4000

4000

500

456

812

1.1.

1.86

ilvC

ilvC

ilvC

1.90

1.91

1.91

1.71

1.72

1.19

4.2.

1.9

ilvD

ilvD

ilvD

63.0

63.0

63.0

67.6

35.7

62.5

2.6.

1.42

ilvE

ilvE

ilvE

27.3

27.3

27.3

15.5

4.34

18.3

2.6.

1.66

avtA

avtA

–0.

0196

0.01

96–

0.01

510.

0152

–

Am

ino

Aci

dE

C n

umbe

r

Gen

eS

peci

fic a

ctiv

ity (

mM m

in-1

mg-1

)M

olec

ular

act

ivity

(M

ol r

eact

ant

s-1 M

ol

enzy

me-1

)

E. c

oli

S. t

yphi

mur

ium

B. s

ubtil

isE

. col

iS

. typ

him

uriu

mB

. sub

tilis

E. c

oli

S. t

yphi

mur

ium

B. s

ubtil

is

For

eac

h am

ino

acid

, ita

liciz

ed t

ype

indi

cate

s th

e ge

ne fo

r th

e en

zym

e in

its

clas

sica

l bio

synt

hetic

pat

hway

. Bol

d ita

liciz

ed t

ype

indi

cate

s an

aux

iliar

y en

zym

e th

at h

as b

een

obse

rved

to

repl

ace

an e

nzym

e in

the

clas

sica

l pa

thw

ay,

at l

east

in

vitr

o, b

ut t

hat

is n

ot u

sual

ly c

onsi

dere

d pa

rt o

f th

e bi

osyn

thet

ic p

athw

ay.

We

wer

e un

able

to

find

appr

opria

te e

stim

ates

for

the

act

ivity

of

enzy

me

2.3.

1.46

in

met

hion

ine

bios

ynth

esis

.



correlation between cognate bias and number of peptidechains that make up the pathway complement. This cor-relation indeed exists, as shown in Table 5.

As a control we have also examined the correlationbetween the minimum non-cognate bias and the numberof protein chains of the pathway. Table 5 shows that thecorrelation between minimum non-cognate bias andpathway length is negative, but that it is significantlyweaker and less significant than in the case of the cog-nate bias.

ing the specific activity for all the enzymes in a givenorganism.

Correlation between cognate bias and pathway length

The longer an amino acid biosynthetic pathway (largernumber of protein chains), the stronger the selection fora low cognate bias because a larger number of proteinsneeds to be synthesized to obtain one full pathway set.According to this expectation, there should be an inverse

Fig. 3. Schematic representation of amino acid biosynthetic pathways. Names for each enzyme, represented here by their EC number, and the corresponding gene are given in Table 2.

Table 3. Spearman rank correlation coefficient (r) between cognate amino acid bias of the amino acid biosynthetic enzymes and their molecularactivity.

E. colir (n = 79)d,e

S. typhimuriumr (n = 79)d,e

B. subtilisr (n = 89)d,e

Overall r (n = 247)d,e

Molecular activitya 0.284 (P < 0.0033) 0.301 (P < 0.0030) 0.186 (P < 0.010) 0.230 (P < 5 ¥ 10-7)Protein costb -0.173 (P < 0.05) -0.104 (P < 0.15) 0.147 (P < 0.04) -0.0819 (P < 0.05)GC contentc 0.216 (P < 0.02) 0.154 (P < 0.07) 0.157 (P < 0.04) 0.164 (P < 0.01)Codon biasc -0.201 (P < 0.03) 0.0975 (P < 0.18) 0.0580 (P < 0.27) -0.0322 (P < 0.24)

a. Calculated as described in Experimental procedures.b. Calculated by adding the costs for synthesizing each of the amino acid residues that constitute the enzymes of a given amino acid biosyntheticpathway.c. Calculated for the genes that encode the enzymes of a given amino acid biosynthetic pathway.d. n represents the number of enzyme activities used to calculate the correlations. It can be determined from Table 2.e. P represents the probability that the correlation is non-significant (see Experimental procedures).Also shown for comparison are correlations between cognate amino acid bias of the amino acid biosynthetic enzymes and three other factorsthat could potentially influence the amino acid composition of the same biosynthetic enzymes. These three factors have been previously identifiedas influencing amino acid composition of the total proteome.



Profiles of cognate bias in amino acid biosynthetic pathways

Figure 4B shows the profile of cognate bias for the groupof E. coli proteins involved in the synthesis of a givenamino acid. With the exception of Glu, Gly and Tyr, thegroup of biosynthetic proteins responsible for the synthe-sis of a given amino acid has a cognate bias that is belowthe 50th percentile of the control group.

As expected both from the earlier analysis and fromthe high degree of homology between many of the E.coli and S. typhimurium proteins, the profile of cognatebias for S. typhimurium is similar (Fig. 4A). However, inS. typhimurium the cognate bias of the Asp, Asn andVal biosynthetic pathways is also above the 50th per-

centile. Of the biosynthetic pathways that have a cog-nate bias below the 50th percentile, the average bias issomewhat smaller for S. typhimurium than for E. coli,again as one would have predicted from the fact thatthere is a stronger correlation between cognate biasand molecular activity for the enzymes of S. typhimu-rium (Table 3).

Although the general conclusions are similar, thedetailed profile of cognate bias for B. subtilis differs con-siderably from that of the two enteric bacteria. More thanhalf of the amino acid biosynthetic pathways have a cog-nate bias above the 50th percentile (Fig. 4C). Theseinclude the pathways for Ala, His, Phe, Ser and Thr, whichare in addition to the pathways with cognate bias abovethe 50th percentile in E. coli and S. typhimurium.

Table 4. Significance of the Spearman rank correlation coefficient (r) between amino acid bias and molecular activity of enzymes involved inamino acid biosynthesis.a

E. colir (n = 79)b,c

S. typhimuriumr (n = 79)b,c

B. subtilisr (n = 89)b,c

Overall r (n = 247)b,c

Ala -0.0299 (P < 0.38) -0.0390 (P < 0.35) 0.118 (P < 0.084) -0.0489 (P < 0.16)Arg -0.00708 (P < 0.47) -0.0599 (P < 0.28) 0.00803 (P < 0.47) -0.0883 (P < 0.038)Asn -0.0869 (P < 0.19) -0.0708 (P < 0.24) 0.160 (P < 0.032) -0.0121 (P < 0.40)Asp -0.127 (P < 0.10) -0.157 (P < 0.058) ----0.332 (P < 4.8 ¥¥¥¥ 10----5) -0.155 (P < 0.00045)Cys 0.0288 (P < 0.41) -0.0334 (P < 0.37) 0.0420 (P < 0.36) -0.00400 (P < 0.46)Gln 0.171 (P < 0.054) 0.0300 (P < 0.39) -0.0316 (P < 0.36) 0.103 (P < 0.020)Glu -0.175 (P < 0.043) 0.0159 (P < 0.43) -0.0641 (P < 0.22) -0.0392 (P < 0.21)Gly -0.0731 (P < 0.23) 0.156 (P < 0.061) 0.0145 (P < 0.44) 0.0372 (P < 0.23)His -0.0178 (P < 0.41) -0.176 (P < 0.041) -0.0218 (P < 0.39) -0.129 (P < 0.0046)Ile 0.103 (P < 0.17) 0.108 (P < 0.14) 0.241 (P < 0.0024) 0.0827 (P < 0.049)Leu -0.0469 (P < 0.31) -0.107 (P < 0.14) -0.106 (P < 0.11) -0.0806 (P < 0.052)Lys -0.0802 (P < 0.21) 0.0549 (P < 0.30) -0.0912 (P < 0.14) 0.0337 (P < 0.25)Met 0.259 (P < 0.0067) 0.289 (P < 0.0022) -0.0801 (P < 0.17) 0.234 (P < 1.3 ¥¥¥¥ 10----6)Phe -0.00113 (P < 0.13) -0.168 (P < 0.048) ----0.231 (P < 0.0033) 0.102 (P < 0.020)Pro -0.134 (P < 0.091) -0.0370 (P < 0.35) 0.154 (P < 0.036) -0.0730 (P < 0.069)Ser 0.130 (P < 0.1) 0.197 (P < 0.025) 0.0592 (P < 0.25) 0.128 (P < 0.0050)Thr 0.183 (P < 0.043) 0.131 (P < 0.10) -0.0357 (P < 0.34) 0.0667 (P < 0.093)Trp -0.107 (P < 0.11) -0.176 (P < 0.028) -0.174 (P < 0.0033) -0.111 (P < 0.0066)Tyr -0.176 (P < 0.042) -0.131 (P < 0.094) 0.130 (P < 0.064) -0.0598 (P < 0.10)Val 0.179 (P < 0.045) 0.0552 (P < 0.29) -0.0177 (P < 0.42) 0.0755 (P < 0.065)CBd 0.284 (P < 0.0033) 0.301 (P < 0.0030) 0.186 (P < 0.01) 0.230 (P < 5 ¥ 10-7)

a. Values in bold are those that are significant to a level higher than 99% and r > 0.20.b. n represents the number of enzyme activities used to calculate the correlations. It can be determined from Table 2.c. P represents the probability that the correlation is non-significant (see Experimental procedures).d. CB indicates cognate amino acid bias.

Table 5. Correlation between number of proteins that make up the enzymes of each amino acid biosynthetic pathway and their minimum aminoacid bias.

Minimum biasE. colir (n = 20)a,b

S. typhimuriumr (n = 20)a,b

B. subtilis r (n = 20)a,b

Cognate -0.49 (P < 0.003) -0.72 (P < 0.004) -0.23 (P < 0.21)Non-cognate -0.12 (P < 0.18) -0.22 (P < 0.14) -0.06 (P < 0.39)Overall -0.13 (P < 0.16) -0.22 (P < 0.14) -0.04 (P < 0.44)

a. n represents the number of enzyme activities used to calculate the correlations. It can be determined from Table 2.b. P represents the probability that the correlation is non-significant (see Experimental procedures).



Profiles of cognate bias in individual amino acid biosynthetic enzymes

Although the cognate bias of individual biosynthetic path-ways is above the 50th percentile for some of the aminoacids, this does not necessarily mean that selection for alow bias is too weak to be observed for these amino acids.It may be an indication that a subset of the enzymes inthe pathway has a dominant effect on the rate of synthesisfor the cognate amino acid and is critical for recovery ofamino acid levels during derepression. Only this subsetof enzymes would then be sufficiently sensitive to selec-tion for a low cognate bias to be observed. To furtheranalyse this implication of the cognate bias hypothesis,we examine the cognate bias of individual biosyntheticenzymes within each pathway for each of the threeorganisms.

Figure 5 shows the cognate bias for the individualenzyme that has the lowest value within each amino acidbiosynthetic pathway. One sees immediately for each ofthe organisms that approximately 75% of the amino acids

pathways exhibit at least one enzyme that has a lowcognate bias. Of the 75%, the enzyme with the lowestcognate bias has either the lowest or the second lowestmolecular activity in the pathway. Whenever genetic andbiochemical data are available, one finds this enzyme tobe regulated both by end-product inhibition at the level ofenzyme activity and by repression at the level of geneexpression (shown in Table 6 for the E. coli pathways).This type of regulation is an additional indication that theenzyme has a dominant influence over the flux throughthe pathway, particularly during the critical early phase ofrecovery.

This more detailed analysis further supports the notionthat selection for low cognate bias in enzymes within agiven amino acid biosynthesis pathway is strong enough

Fig. 4. The compositional bias of biosynthetic pathways with respect to their cognate amino acid.

1

0.9

0.8

0.7

0.6

0.5

0.4

Pat

hway

bia

s

Pathway

0.3

0.2

0.1

S. typhimurium

0Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val

1

0.9

0.8

0.7

0.6

0.5

0.4

Pat

hway

bia

s

Pathway

0.3

0.2

0.1


1

0.9

0.8

0.7

0.6

0.5

0.4

Pat

hway

bia

s

Pathway

0.3

0.2

0.1


E. coli

B. subtilis

Fig. 5. Enzyme with the lowest value for cognate bias in the biosyn-thetic pathway of each amino acid. The order of the pathways is the same as that in Fig. 4. The x-axis shows the name of the gene coding for the enzyme. The numbers indicate the probability that this lowest bias occurs in a set of proteins, containing the same number of proteins as the pathway, and drawn randomly from the proteome of each organism.

1

0.9

0.8

0.7

0.6

0.5

0.4P

rote

in b

ias

0.3

0.2

0.10.1 0.2

0.2

0.005 0.001 0.0050.050.080.3 0.3

0.3

0.30.2

0.10

1

1

1

0.09 0.02

S. typhimurium

avtA

argC

asnA

aspA

cysK

gdhP

gdhA

glyA

hisC ilvA

leuA

lysA

met

E

pheA

proB

serB

thrB

trpE

tyrA

avtA

0

1

0.9

0.8

0.7

0.6

0.5

0.4

Pro

tein

bia

s

0.3

0.2

0.10.07 0.05 0.002 0.00020.002 0.03 0.03 0.04

0.30.2

0.20.2

0.2 0.20.09 0.03

0.9

0

11 E. coli

avtA

argC

asnA

aspA

cysK

glnA

gdhA

glyA

hisC ilvA

leuC lysA

met

E

pheA

proA

serB

thrB

trpA

tyrA

avtA

0

1

0.9

0.8

0.7

0.6

0.5

0.4

Pro

tein

bia

s

0.3

0.2

0.100

1

1

1

1

11

0.10.40.7

0.10.40.3

0.04 0.07 0.02 0.040.004 0.1

B. subtilis

ywaA

argD

asnO

aspB

cysK

ypcA

rocG

glyA

hisA

acoA

leuC

dapF

met

C

aroH

proJ

serA

yclM

trpG

aroA

pdhA

0

Enzyme

Enzyme

Enzyme



to influence the relative composition of biosyntheticenzymes. Nevertheless, for each organism there aresome pathways in which the lowest cognate bias isgreater than 0.1. These give rise to profile features thatare fairly similar for the enteric bacteria E. coli and S.typhimurium, but different from that for the more distantlyrelated bacterium B. subtilis.

For E. coli and S. typhimurium there are three aminoacid biosynthetic pathways (Asp, Phe and Tyr) in whichselection for low cognate bias appears to be masked byfunctional requirements.

Asp biosynthesis

In E. coli, AspC is the homodimeric protein that is tradi-tionally thought to catalyse Asp biosynthesis from Glu andoxoglutarate, while AspA is the homotetrameric enzymethought by many to produce fumarate and NH3 from thecatabolism of Asp. However, genetic and biochemical evi-dence suggests that AspA is likely to catalyse the reversereaction (i.e. Fumarate + NH3 Æ Asp) under normal con-ditions in vivo. In industrial processes, purified AspA, orE. coli cells overexpressing the enzyme, are used to pro-duce Asp (for a discussion, see Herrmann and Somerville,1983; Neidhardt, 1999 and references therein). Thecognate bias of both enzymes is higher than the 10thpercentile.

The protein AspC has a cognate bias that is around the50th percentile. The analysis of the dimeric structure ofAspC shows that three Asp residues are involved in theinterface contact surface of the monomers. These threeresidues are also conserved (data not shown) in all AspC

bacterial homologues from the SWISSPROT database(Boeckmann et al., 2003), suggesting an important func-tional role. If these three residues are discounted, thecognate bias of Asp drops to the 27th percentile. This isstill above the 0.1 significance level, but is nevertheless alower cognate bias.1

Surprisingly, the protein AspA has a lower cognate bias(approximately 12th percentile) than AspC, although it isstill above the 10-percentile threshold. This protein isactive as a tetramer in which several of the Asp residuesare involved in forming potential salt bridges between themonomers (Fig. S1). If one discounts these residues,which are selected for functional reasons, then the cog-nate bias of the enzyme is well below the first percentile.Thus, in this case it appears that functional considerationsare responsible for masking the selection for low cognatebias.

Phe and Tyr biosynthesis

Although there is some variation regarding the lowestcognate bias in the Tyr and Phe biosynthetic pathways of

1The cognate bias of a protein after removing a given residue needsto be recalculated. Using AspC as an example, the recalculation isperformed the following way. (i) Calculate the number of Asp residuesin an average protein with the same length as AspC. (ii) Discount thenumber of residues with established functional roles (three residuesin this case) from both AspC and the average proteins and recalcu-late the average frequency of the different amino acids. (iii) Use thesenew probabilities to recalculate the cognate bias of the protein. Inother words, we remove the same number of Asp residues from thecontrol proteins and recalculate the probabilities.

Table 6. Correlation between low cognate amino acid bias and regulation of gene expression and enzyme activity.a

Biosyntheticpathway

Enzymes with low cognate aminoacid bias (below 10% quantile)

Enzymes repressed by cognateamino acid addition

Enzymes derepressed uponcognate amino acid depletion

End-product-inhibitedenzymes

Ala AvtA AvtA – –Arg ArgBCFI ArgABCDEFGHI ArgECBH ArgA, ArgBAsn AsnA, AsnB AsnA – AsnA, AsnBAsp – – – –Cys CysK-Z,CysM CysK-Z,CysM, CysE CysK-Z, CysM, CysE CysK-Z, CysM, CysEGln GltB, GltD GltB, GltD GltD, GltB GltD, GltBGlu – GdhA GdhA GdhAGly – GlyA GlyA –His HisC, HisD All All HisC, HisGIle IlvGA IlvGEDA IlvGEDA IlvALeu IlvE, leuABCD LeuABCD LeuABCD LeuALys LysC, DapE, DapF, LysA LysC, DapE, LysA LysC, DapA, LysA LysC, DapAMet MetA, MetE, MetB MetA, MetH, MetE, MetB, MetL MetA, MetH, MetE MetAPhe PheA slightly above 10% quantile PheA PheA PheAPro ProA, ProB ProB ProC ProASer SerB – – SerAThr ThrAB ThrAB ThrABC ThrABTrp TrpABCDE TrpABCDE TrpABD TrpEDTyr – TyrA TyrA TyrAVal AvtA AvtA, IlvG – IlvIHGM

a. This information has been compiled from Herrmann and Somerville (1983), Neidhardt (1999), (Khodursky et al., 2000) and references therein.



E. coli and S. typhimurium, they have in common twoqualitative features of interest. First, it is the initial enzymein each pathway that has the lowest cognate bias. Theseenzymes are repressed at the level of gene expressionand end-product inhibited at the level of enzyme activity(Neidhardt, 1999). As noted above, this is an indicationthat they catalyse a rate-determining step in the corre-sponding biosynthetic pathway. The correlation betweenthe properties of the regulatory enzyme in the relevantpathway and its low cognate bias suggests that selectionfor low cognate bias might still be present but masked forfunctional reasons.

The second feature shared by both pathways is that asingle gene, tyrB, encodes both the second and lastenzymes in each pathway. Although the relevance of thissecond feature for our argument is less obvious, as shownbelow there is reason to believe that selection for lowcognate bias is operating on these enzymes.

Table 1 shows that Phe and Tyr are among the leastabundant amino acids in proteins, at approximately 4%and 3% respectively. However, their particular chemicalproperties, due to their side-chain aromatic ring, makethem especially important in active centres and in sub-strate interaction sites (Pedersen and Finazzi-Agro, 1993;Frey, 2001; Rogers and Dooley, 2003). The number ofresidues that comprise an active centre is usually smallcompared with the total number of residues in the protein.A conservative estimate would suggest that less than 10residues would account for most active sites. As thesmaller active enzymes have between 100 and 150 resi-dues, this would predict that approximately 10% of theresidues in a protein are involved in the formation of anactive centre. If proteins include a low percentage of Pheor Tyr residues, and if Phe or Tyr residues are necessaryin the active centre, then these features will provide astrong selection for a high compositional bias for theseamino acids in any protein.

A more detailed analysis of the enzymes in the Phe andTyr biosynthetic pathways indicates selection for low cog-nate bias when the functional role of the amino acid inactive centres is factored out. The gene that encodes theenzyme catalysing the second step in each pathway istyrB. The protein product has 397 amino acids, of which17 are Phe residues (4.3%) and 15 are Tyr residues(3.8%). A crystal structure for this protein has been depos-ited by Ko et al. in the protein databank (PDB code 3TAT),although a paper analysing the structure has not beenpublished yet. This protein is a dimer and our own analysisshows that at least five Phe residues and six Tyr residuesare involved in the active centre and in the interactionbetween monomers respectively (Fig. S1). Furthermore,these residues are conserved in homologous proteinsfrom other organisms (data not shown). If we discountthese residues, then the cognate bias of tyrB is below the

10th percentile for both the Phe pathway and the Tyrpathway.2

The functional analysis of the first enzyme in each path-way cannot be accomplished as easily. PheA, the firstenzyme in the Phe biosynthetic pathway, is a 386-amino-acid-residue, bifunctional protein. Eleven of the 386 resi-dues are Phe (2.9%). This is below the average cognatebias, but nevertheless above our 10-percentile threshold.The Protein Databank entry for this protein, file 1ECM,shows that there are no Phe residues involved in theactive centre of the chorismate mutase activity of PheA.This is unlike the case of other chorismate mutases, suchas that of B. subtilis. Bacterial homologues of PheA fromthe SWISSPROT database (Boeckmann et al., 2003)show perfect conservation for two of the 11 Phe residues(Fig. S2). This suggests an important functional role forthese residues. If these residues are discounted, then thecognate amino acid bias falls bellow the sixth percentile.Furthermore, three of the other Phe residues are perfectlyconserved in all but two of the proteins. This may be takenas an indication that these residues are important for thefunction or structure of the protein. Again, we find that thehigher-than-expected cognate bias may result from thefunctional requirements of the protein for this specific typeof amino acid residue.

We now consider the TyrA protein. This protein is com-posed of 373 amino acid residues, 10 of which are Tyrresidues (2.7%). There is no known structure for thisenzyme or for any of its homologues. Bacterial homo-logues of TyrA from the SWISSPROT database (Boeck-mann et al., 2003) show perfect conservation for five outof the 10 Tyr residues. An additional Tyr residue is con-served in all but one of the homologues, where it isreplaced by a Phe residue (Fig. S2). If these residues arediscounted, the cognate amino acid bias of TyrA dropswell below the 6th percentile. Furthermore, two of theadditional Tyr residues are conserved in all but one of thehomologous proteins.

As a control for these cases in which cognate aminoacid residues are discounted, one can discount other con-served (non-cognate amino acid) residues and recalculatethe compositional bias. When this is done, the composi-tional bias with respect to these amino acids does not dropas low as that for the cognate amino acid (data not shown).

For B. subtilis there are four amino acid biosyntheticpathways (Ala, Ser, Thr and Val) in which selection for low

2A procedure similar to that described in the previous footnote for Aspis used to recalculate the bias of proteins containing Tyr and Phe res-idues that are functionally important. In the case of these two aminoacids, the functional residues are, in many cases, involved in cataly-sis. Therefore, one could also discount all conserved Tyr or Phe resi-dues from the active centre of enzymes to recalculate the probabilitiesof amino acid occurrence. However, the structure for most enzymes isstill unknown and so is the actual composition of their active centreswhich precludes such an approximation at this time.



cognate bias appears either to be weaker or to be maskedby other requirements.

Ala and Val biosynthesis

Of the 20 amino acids, the biosynthesis of Ala is probablythe least well studied. It is likely that more enzymes yet tobe identified could contribute to the synthesis of thisamino acid. There are no available data that hint at pos-sible explanations for the high cognate bias of Ala biosyn-thesis in B. subtilis. Regarding the biosynthesis of Val,although the cognate bias is not low, one finds that, for B.subtilis, the enzyme with the lowest cognate bias cataly-ses the first step committed to Val biosynthesis. Addition-ally, the second enzyme with the lowest bias is the onethat catalyses the first step in the pathway common to thebiosynthesis of Leu and Val. This suggests that otherfactors may be masking the selection for low cognate biasin these biosynthetic pathways.

Ser and Thr biosynthesis

In B. subtilis, the enzyme of the Ser biosynthetic pathwaywith the lowest cognate bias is encoded by the gene serA,and it is the first enzyme in the pathway. After carefulcomparative sequence and structure analysis with thehomologue from E. coli, we could find no functional justi-fication for the excess of Ser residues in the B. subtilisenzyme. Additional comparative sequence analysis withhomologues from other Gram-positive bacteria shows thatonly two Ser residues are perfectly conserved. If we dis-count these residues and recalculate the cognate bias,this bias is still around the 40th percentile.

The enzyme of the Thr biosynthetic pathway with thelowest cognate bias is also the first enzyme in the path-way. A sequence alignment of the relevant Thr enzymesfrom different Gram-positive bacteria shows that there arefour fully conserved Thr residues (data not shown), whichimplies an important functional role for these residues.When they are discounted, the cognate bias for Thr fallsbelow the fifth percentile.

Finally, for all three organisms there are two biosyn-thetic pathways in which selection for low cognate biasappears to be completely overridden by some unknownmechanism that actually yields a higher-than-averagecognate bias.

Gly and Glu biosynthesis

The pathways for Gly and Glu biosynthesis each involvea small number of enzymes (as low as one per pathway,depending on the organism). In the E. coli case, a carefulanalysis of the three-dimensional crystal structure and ofthe fully conserved residues between homologous pro-

teins involved in Gly or Glu biosynthesis does not revealany special function for the relevant amino acid. Therefore,other reasons must account for the higher cognate biasof Gly and Glu biosynthetic pathways. The biosyntheticpathways for these amino acids, which are composed ofonly one enzyme each, fall into the category of shortpathways for which there is less intense selection for lowcognate bias. However, this factor alone would notaccount for their higher-than-average cognate bias.

Correlation between bias in biosynthetic enzymes and environmental abundance of the cognate amino acid

Selection for low cognate bias is expected to be moreintense for those amino acid biosynthetic pathways thatmust undergo the greatest range and frequency of dere-pression. This is likely to be associated with low andinfrequent abundance of the cognate amino acid in theorganism’s environment. Although the environments of E.coli, S. typhimurium and B. subtilis are complex, hetero-geneous and difficult to characterize, there are data(Table 1) that suggest at least a relative ranking for theabundance of the amino acids in the human colon (aprincipal habitat of E. coli and S. typhimurium) and in soil(a principal habitat of B. subtilis). In attempting to comparethe abundance of a given amino acid in these two envi-ronments, it is problematic if its relative abundance is thesame in the two environments, either high or low, becausethese qualitative assessments do not deal with the abso-lute concentrations. Either environment could have anabundance that is either higher or lower than the other.This is less of a problem if qualitative comparisons aremade in cases where the abundance is qualitatively dif-ferent between environments, for example, high in oneenvironment and low in the other. The qualitative result ofsuch a comparison is likely to be valid, even if there isuncertainty in the absolute concentrations.

To perform such a qualitative comparison of environ-mental abundance with cognate bias, we apply a qualita-tive rank correlation test. The amino acids are given ascore of one if the abundance is low, two if the abundanceis intermediate and three if the abundance is high. Then,for each of the 10 amino acids that have a qualitativelydifferent abundance in the two environments, and for eachorganism, we identify the enzyme in their biosyntheticpathway that has the lowest cognate bias. For each aminoacid we ranked the cognate bias in the three organismsin the following way: when comparing the cognate bias ofB. subtilis and E. coli, the lowest ranked organism wasgiven the number 1, the other the number 2. A similarcomparison was made between B. subtilis and S. typh-imurium (Table 7). We then built a table of pairs of values,where the first element of the pair is the rank of the aminoacid in the environment for the relevant organism(s) and



the second element is the rank of the cognate bias. Cal-culating the Spearman rank correlation between the twosets of values for B. subtilis and E. coli suggests a positivecorrelation between high cognate bias and the amount ofamino acid in the environment (r = 0.61, P < 0.004). Therank correlation calculated for B. subtilis and S. typhimu-rium is even stronger (r = 0.83, P < 0.00025).

Table 7 shows that four of the amino acids with cognatebias that is higher in E. coli than in B. subtilis also have arelative abundance that is higher in the colon than in soil.Similarly, three of the amino acids with cognate bias thatis higher in B. subtilis than in E. coli also have a relativeabundance that is higher in soil than in the colon. Asp isa clear outlier; its cognate bias is higher in E. coli but itsrelative abundance is higher in soil. The two remainingcases, Gly and Trp, have essentially the same cognatebias in E. coli and B. subtilis, even though the relativeabundance of Trp is greater in the colon and that of Glyis greater in soil.

Discussion

Prototrophic microorganisms like E. coli, S. typhimuriumand B. subtilis are capable of synthesizing all of the aminoacids. A considerable fraction of the bacterial genome isdevoted to the encoding of enzymes involved in the bio-synthesis of the amino acids (Neidhardt, 1999). However,these organisms exist in changing environments andwhen they encounter an exogenous source of a particularamino acid they typically repress the enzymes for itsendogenous biosynthesis. This creates a particular

dilemma when attempting to derepress a pathway forwhich the cognate amino acid has been depleted.

Cells have evolved various strategies for dealing with asudden amino acid depletion. Proteases are able toreconfigure the complement of proteins (Reeve et al.,1984; Matin, 1991; Weichart et al., 2003; Nystrom, 2004)and liberate a supply of the limiting amino acid. The strin-gent response (Foster and Spector, 1995; Magnussonet al., 2003), by shutting down the synthesis of other pro-teins, and stimulating the synthesis of amino acid biosyn-thetic enzymes, can contribute to the replenishment of thelimiting amino acid. These are likely to be rather generalsolutions to the problem of protein synthesis and not spe-cific to a particular subset of amino acid biosyntheticenzymes.

Another strategy that addresses the problem of speci-ficity is the following. With lowered amino acid concentra-tions, there is a shift in charging from isoacceptor tRNAswith lower affinity for the amino acid to ones with higheraffinity, thereby allowing those proteins whose mRNA isenriched for the high-affinity isoaccepting species to besynthesized at a faster rate than would be possible withoutthe enrichment. At least 10 of the amino acid biosyntheticpathways in E. coli show this enrichment specifically forthe cognate amino acid (Elf et al., 2003). Although differ-ences in the charging of isoacceptor tRNAs can accountfor the relative usage of the different synonymous codons,these differences cannot fully account for the total relativeamount of a given amino acid in the proteins that synthe-size that amino acid. Recovery from the repressed statewhen the exogenous supply of a given amino acid is no

Table 7. Correlation between high bias of the cognate amino acid in the biosynthetic enzymes and high relative concentration of the cognateamino acid in the environment.

Amino Acid

Rank of amino acid concentrationa Rank of minimum cognate biasb Rank of minimum cognate biasc

Soil Colon B. subtilis E. coli B. subtilis S. typhimurium

Ala 2 1 2 1 2 1Arg 1 3 1 2 1 2Asp 2 1 1 2 1 2Glu 1 3 1 2 1 2Gly 3 1 1 2 2 1Lys 1 3 1 2 1 2Ser 3 1 2 1 2 1Thr 2 1 2 1 2 1Trp 1 3 1 2 1 2Tyr 1 3 1 2 1 2

Rank correlation (n = 20)d

r = 0.61 (P < 0.004)Rank correlation (n = 20)e

r = 0.83 (P < 0.00025)

a. If the concentration of amino acid in the environment is low, then the rank is 1; if it is intermediate, then the rank is 2; if it is high, then the rank is 3.b. If cognate bias is lower in B. subtilis, then the rank is 1 for B. subtilis and 2 for E coli; if cognate bias is higher in B. subtilis, then the rank is2 for B. subtilis and 1 for E. coli.c. If cognate bias is lower in B. subtilis, then the rank is 1 for B. subtilis and 2 for S. typhimurium; if cognate bias is higher in B. subtilis, then therank is 2 for B. subtilis and 1 for S. typhimurium.d. Rank correlation calculated using the pair of values (rank of amino acid concentration, rank of minimum cognate bias) for B. subtilis and E. coli.e. Rank correlation calculated using the pair of values (rank of amino acid concentration, rank of minimum cognate bias) for B. subtilis and S.typhimurium.



longer available would be difficult if the enzymes of thebiosynthetic pathway had a composition that was high inthe cognate amino acid, independently of the relativecodon usage for that amino acid.

In addressing this issue we have hypothesized that theenzymes of specific amino acid biosynthetic pathways, orat least those with the greatest influence on the rate ofthe pathway, should be biased towards low values of thecognate amino acid when compared with the entire pro-teome of the organism. In this article, we have presentedseveral lines of evidence that support this cognate biashypothesis.

First, a computer model of the tryptophan biosyntheticpathway in E. coli showed that derepression of the tryp-tophan biosynthetic enzymes would be more compro-mised if Trp residues were more abundant in theseenzymes. The results of this simulation (Fig. 2) suggestthat the extent and rapidity of response may well be selec-tive pressures responsible for low cognate bias.

Second, the prediction of a direct correlation betweenmolecular activity of amino acid biosynthetic enzymes andtheir cognate bias was tested by direct calculation usinginformation from databases for enzyme activities andgenome sequences. A statistically significant direct corre-lation between molecular activity and bias is found forcognate (Table 3) but not for non-cognate amino acids(Table 4).

Third, the prediction of an inverse correlation betweennumber of enzymes in the amino acid biosynthetic path-ways and their cognate bias was tested by direct calcula-tion using information from databases for metabolicpathways and genome sequences. As expected, therewas a statistically significant inverse correlation betweenpathway length and bias for cognate but not for non-cognate amino acids (Table 5).

Fourth, a more detailed enzyme-by-enzyme, pathway-by-pathway and organism-by-organism analysis foundstrong evidence for low cognate bias in approximately75% of the amino acid biosynthetic pathways (Fig. 5). Forfour of the remaining pathways the selection for low biasappears to be masked by other factors, and become evi-dent when the influence of these factors is removed. Forexample, certain biosynthetic enzymes have their cognateamino acid located at highly conserved positions that arekey in determining protein structure and function. Whenthese residues are discounted in the calculation of thecognate bias, the residual composition of the enzyme isdistinctly biased towards low values of the cognate aminoacid. This is the case for Asp, Phe and Tyr biosyntheticenzymes in E. coli (Fig. S2) and for the first enzyme ofthe Thr biosynthetic pathway in B. subtilis.

For three cases in B. subtilis the evidence for cognatebias is less clear. As expected, the first enzyme has thelowest cognate bias in the biosynthetic pathway for Ser,

Thr and Val. However, only in the case of Thr are therehighly conserved cognate residues, which when dis-counted result in a significantly low cognate bias for theenzyme. In the case of Ala, the enzymes are still poorlycharacterized and there is no evidence for low cognatebias. In two pathways, Glu and Gly, additional factorsappear to completely override the selection for low cog-nate bias and yield higher-than-average cognate bias.

Clearly, this type of bias is a general principle thatapplies with varying degrees to any system that exhibitsthis form of positive feedback. For example, an earlierstudy has shown that the atomic composition of somebiosynthetic enzymes is biased against atoms that arefixed in metabolism by those enzymes (Baudouin-Cornuet al., 2001). Also, preliminary results from the analysis ofthe E. coli and S. cerevisiae proteomes (R. Alves and A.Salvador, preliminary unpublished results) suggest thatproteins involved in detoxification of reactive oxygen spe-cies are biased towards low relative content of highlyoxydizable amino acid residues, thus allowing these pro-teins to remain active for longer periods in an oxidizingenvironment. The same preliminary results clearly indi-cate that the relative amount of highly oxidyzable aminoacid residues in proteins expressed under anaerobic con-ditions is significantly greater than that in proteinsexpressed exclusively under aerobic conditions.

Finally, in comparisons of E. coli with S. typhimurium,another closely related enteric Gram-negative organism,and B. subtilis, a more distantly related Gram-positiveorganism, we have observed differences in the detailedprofile of cognate bias (Fig. 5) that might reflect differ-ences in the intensity of selection for low cognate bias.We have argued that selection for low cognate bias isexpected to be more intense for those amino acid biosyn-thetic pathways that must undergo the greatest range andfrequency of derepression. This is likely to be associatedwith low and infrequent abundance of the cognate aminoacid in the organism’s environment.

Although there is great heterogeneity in the amino acidmeasurements, the profile of their relative abundance inthe colon appears to exhibit a number of differences fromthat in soil (Table 1). If one accepts the 10 cases in whichthere appear to be a qualitative difference, and the argu-ment that a higher relative concentration implies weakerselection for low cognate bias, then one can examinewhether these data are consistent with those for cognatebias in Fig. 5. The results of our comparisons show apositive qualitative correlation (Table 7) that further sup-ports the selectionist explanation for low cognate aminoacid bias in amino acid biosynthetic enzymes.

In summary, we have presented several lines of evi-dence showing that cognate bias plays a highly significantrole in shaping the amino acid composition for a largeclass of cellular proteins. The profiles of cognate amino



acid bias are similar for two closely related organisms, E.coli and S. typhimurium; they differ for two more distantlyrelated organisms E. coli or S. typhimurium and B. subtilisin ways that show a qualitative relationship to the environ-ments of these organisms. Such differences, if substanti-ated with a broader group of organisms, may serve as a‘finger print’ that reflects their different evolutionary historyand ecological niche.


Model organisms

We use E. coli K12, S. typhimurium and B. subtilis as ourmodel organisms. The proteome and genome information forthese organisms was downloaded from the KEGG databaserelease 28.0 (Kanehisa et al., 2002).

Proteins involved in amino acid biosynthesis

We use pathway information available on Ecocyc (Karp et al.,2002), KEGG (Kanehisa et al., 2002), WIT (Overbeek et al.,2000), Herrmann and Somerville (1983) and Neidhardt(1999), and cross-correlate this information to determine thebiosynthetic pathway for each amino acid in each of theorganisms. Table 2 summarizes this information in terms ofgene name, enzyme activity and EC number. Figure 3 showshow the network of amino acid biosynthetic reactions isconnected.

Calculation of molecular activity

We have used the database BRENDA and the referencestherein to obtain estimates for the specific activity of theenzymes involved in amino acid biosynthesis. If this activitywas not available for the specific organism of interest, weused the available value for the organism whose protein hadthe strongest homology to the target enzyme. Molar weightof the enzymes has been estimated by adding the individualweight of all residues of a protein and subtracting the weightof a water mol per peptide bond in the protein. Using thismolar weight we converted specific activity into molecularactivity (Table 2). Whenever an enzyme is known to beformed by multiple subunits, the molecular weight of theenzyme was calculated by adding the weight of each constit-uent subunit together.

Analysis of proteome and genome data

The analysis of relative amino acid composition is performedfrom cDNAs and peptide strings using locally developedPERL scripts.

Statistical analysis of the data

Monte Carlo simulations and statistical analysis of the dataare performed using locally developed PERL scripts andMathematica (Wolfram, 1999) notebooks.

The Spearman rank correlation coefficient determines theexistence of non-linear correlations between sets of data(Cohen and Holliday, 1998). This correlation coefficient isgiven by

(5)

where R(xi) and R(yi) represent the rank of xi and yi in thesample, respectively, and n is the number of pairs in thesample.

To test the significance of r we use the Fisher z-statisticwith the null hypothesis that the correlation coefficient is zero(and thus that there is no correlation in the population for thetested variables). The z-test makes no assumptions about thespecific distribution of the data being analysed. It is wellknown that the variable z, defined as

(6)

has a normal distribution with mean 0 and standard devia-tion 1. The P-value is calculated by determining the quantilefor the absolute value of z in the normal distribution. If P < a(0 < a < 1) there is a likelihood a that the correlation coeffi-cient is in fact 0, i.e. that there is no correlation betweenthe y-values and the x-values in the sample. We have alsocalculated the t-statistics for the different coefficients. How-ever, here we present only P-values as determined fromthe z-statistics, because the significance is lower for oursamples, thus providing a more conservative estimate ofsignificance.

Kinetic modelling

The kinetic modelling is performed using the program PLAS

(Voit and Ferreira, 2000).

Calculating statistical significance of amino acid bias

Determining whether a protein is significantly biased towardslow relative composition for any given amino acid is a two-step process. First, we compare the composition of the pro-tein with that of a reference group. Second, we calculate thesignificance of the difference between the protein and thereference group. We use three different approaches to calcu-late this significance. Two involve MC simulations to deter-mine the significance of the bias in the relative compositionof a protein with respect to a given amino acid in the contextof the E. coli proteome, and the third involves an analyticalcalculation.

In the first MC approach, we randomly generate 1000protein sequences having the same length as our protein ofinterest, assuming a relative amino acid composition that is,on average, the same as that of the reference group. Therelative composition of the individual protein sequences in

rR x R y

nR x R y

R xn

R x R yn

R y

i ii

n

i ii

n

i

n

i ii

n

i

n

i ii

n

i

n=

( ) ( ) - ( ) ( )ÈÎÍ

˘˚̇

( ) - ( )ÈÎÍ

˘˚̇

( ) - ( )ÈÎÍ

˘˚̇

= ==

== ==

Â ÂÂ

ÂÂ ÂÂ1 11

2

1

2

1

2

1

2

1

1

1 1

zR x R y

n n n

n n nN

i ii

n

=( ) - ( )[ ] -

-( ) +( )

-( ) +( )=Â

1

2

2 2

1 16

1 136

0 1~ ( . )



this set of 1000 random sequences is then ordered withrespect to the amino acid of interest. Finally, our protein isconsidered to be significantly biased towards low levels of agiven amino acid if its composition with respect to that aminoacid is lower than that of 90% of the random proteinsequences in our set, i.e. if it is below the 10th percentile ofbias.

In the second MC approach, we draw from the referencegroup of proteins, randomly, a set of 1000 proteins, allowingfor repetition. We then order the relative composition of thisset of random proteins with respect to the amino acid ofinterest. Our protein is considered to be significantly biasedtowards low levels of a given amino acid if its compositionwith respect to that amino acid is lower than that of 90% ofthe random proteins in our set, i.e. if it is below the 10thpercentile of bias.

The third approach to estimating significance involves ananalytical calculation. Consider a protein P of length L witha relative composition of c1, . . . , c20 for each of the 20 aminoacid types. The average composition for the set of the control

proteins is given by p1, . . . , p20, with . If the relative

composition of a protein with respect to a given amino acidis independent of all other amino acids (which must be veri-fied), then the probability that a protein of length L (belongingto a set of proteins that, on average, has a relative amountpi of amino acid i) has N residues of amino acid i is given by

(7)

The cumulative probability that a protein of L residues hasno more than N residues of amino acid i is then given by

(8)

Thus if Pi(N) < 0.1 for our protein of interest, there is a 90%chance that the protein is significantly biased towards lowvalues of amino acid i with respect to the control group ofproteins.

Homology comparisons of bacterial enzymes

When structural information was unavailable, we have per-formed sequence homology studies to investigate the possi-bility that a sufficient number of cognate residues might beinvolved in important functional roles. To evaluate this possi-bility, we used PSI-BLAST (Altschul et al., 1997) to search forall the bacterial homologues of the relevant protein in theSWISSPROT database (Boeckmann et al., 2003) that areboth classified as having the same function and have an E-value smaller than 10-4. These sequences were then alignedusing CLUSTALW (Chenna et al., 2003) and conservation ofcognate residues was studied.

Acknowledgements

We thank Dr Armindo Salvador for a critical review of anearlier version of this manuscript and for fruitful discussions.We thank three anonymous reviewers for suggestions thatimproved the clarity of this article. This work was supported

pii=

=Â 11

20

p NL

N L Np pi i

Ni

L N( ) =-( )

-( ) -!! !

1

P NL

r L rp pi i

r

r

N

iL r( ) =

-( )-( )

=

-Â !! !1

1

in part by a grant to M.A.S. from the US Public Health Service(RO1-GM30054) and fellowships to R.A. from the SpanishMinisterio de Educacion, Cultura y Deporte (SB2000-031)and the Portuguese FCT (BPD 11533/2002).

Supplementary material

The following material is available fromhttp://www.blackwellpublishing.com/products/journals/suppmat/mmi.mmi4566/mmi4566sm.htmFig. S1. Important functional residues in different biosyn-thetic enzymes of E. coli.Fig. S2. Alignment of E. coli proteins with the lowest cognatebias for Phe and Tyr to homologous bacterial proteins fromthe SWISSPROT database.Table S1. Spearman rank correlation coefficients.

References

Akashi, H., and Gojobori, T. (2002) Metabolic efficiency andamino acid composition in the proteomes of Escherichiacoli and Bacillus subtilis. Proc Natl Acad Sci USA 99:3695–3700.

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J.,Zhang, Z., Miller, W., and Lipman, D.J. (1997) GappedBLAST and PSI-BLAST: a new generation of protein databasesearch programs. Nucleic Acids Res 25: 3389–3402.

Baudouin-Cornu, P., Surdin-Kerjan, Y., Marliere, P., and Tho-mas, D. (2001) Molecular evolution of protein atomic com-position. Science 293: 297–300.

Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C.,Estreicher, A., Gasteiger, E., et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBLin 2003. Nucleic Acids Res 31: 365–370.

Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J.,Higgins, D.G., and Thompson, J.D. (2003) Multiplesequence alignment with the Clustal series of programs.Nucleic Acids Res 31: 3497–3500.

Cohen, L., and Holliday, M. (1998) Statistics for the SocialScientists. New York: Addison-Wesley.

Cootes, A.P., Curmi, P.M., Cunningham, R., Donnelly, C.,and Torda, A.E. (1998) The dependence of amino acid paircorrelation on structural environment. Proteins: StructFunction Genet 32: 175–189.

Dufton, M.J. (1997) Genetic code synonym quotas and aminoacid complexity: cutting the cost of proteins? J Theor Biol187: 165–173.

Elf, J., Nilsson, D., Tenson, T., and Ehrenberg, M. (2003)Selective charging of tRNA isoacceptors explains patternsof codon usage. Science 300: 1718–1722.

Foster, J.W., and Spector, M.P. (1995) How Salmonella sur-vive against the odds. Ann Rev Microbiol 49: 145–174.

Frey, P.A. (2001) Radical mechanisms of enzymatic cataly-sis. Ann Rev Biochem 70: 121–148.

Herrmann, K.M., and Somerville, R.L. (1983) Amino Acids:Biosynthesis and Genetic Regulation. Reading, MA:Addison-Wesley Publishing.

http://www.blackwellpublishing.com/products/journals/



Jansen, R., and Gerstein, M. (2000) Analysis of the yeasttranscriptome with structural and functional categories:characterizing highly expressed proteins. Nucleic AcidsRes 28: 1481–1488.

Kanehisa, M., Goto, S., Sato, K., Fijibuchi, W., and Nakaya,A. (2002) The KEGG database at Genomenet. NucleicAcids Res 30: 42–46.

Karlin, S., and Bucher, P. (1992) Correlation analysis ofamino acid usage in protein classes. Proc Natl Acad SciUSA 89: 12165–12169.

Karp, P.D., Riley, M., Saier, M., Paulsen, I.T., Paley, S., andPellegrini-Toole, A. (2002) The Ecocyc Database. NucleicAcids Res 30: 56–58.

Khodursky, A.B., Peter, B.J., Cozzarelli, N.R., Botstein, D.,Brown, P.O., and Yanofsky, C. (2000) DNA microarrayanalysis of gene expression in response to physiologicaland genetic changes that affect tryptophan metabolism inEscherichia coli. Proc Natl Acad Sci USA 97: 12170–12175.

King, J.L., and Jukes, T.H. (1969) Non darwinian evolution.Science 164: 788–798.

Li, W.-H. (1997) Molecular Evolution. New York: SinauerAssociates.

Lobry, J.R. (1997) Influence of genomic G+C content onaverage amino acid composition of proteins from 59 bac-terial species. Gene 205: 309–316.

Lobry, J.R., and Gautier, C. (1994) Hydrophobicity, expres-sivity and aromaticity are the major trends of amino-acidusage in 999 Escherichia coli chromosome-encodedgenes. Nucleic Acids Res 22: 3174–3180.

Maaløe, O., and Kjeldgaard, N.O. (1966) Control of Macro-molecular Synthesis; a Study of DNA, RNA, and ProteinSynthesis in Bacteria. New York: W.A. Benjamin.

Magnusson, L.U., Nystrom, T., and Farewell, A. (2003)Underproduction of sigma 70 mimics a stringent response.A proteome approach. J Biol Chem 278: 968–973.

Matin, A. (1991) The molecular basis of carbon-starvation-induced general resistance in Escherichia coli. Mol Micro-biol 5: 3–10.

Mazel, D., and Marliere, P. (1989) Adaptive eradication ofmethionine and cysteine from cyanobacterial light-harvest-ing proteins. Nature 341: 245–248.

Neidhardt, F.C. (1999) Escherichia coli and Salmonella: Cel-lular and Molecular Biology. Washington, DC: AmericanSociety for Microbiology.

Neidhardt, F.C., Ingraham, J.L., and Schaechter, M. (1990)Physiology of the Bacterial Cell: A Molecular Approach.Sunderland, MA: Sinauer Ass.

Nystrom, T. (2004) Stationary-phase physiology. Ann RevMicrobiol 58: 161–181.

Overbeek, R., Larsen, N., Pusch, G., D’Souza, M., Selkov,E., Jr, Kyrpides, N., et al. (2000) WIT: integrated systemfor high-throughput genome sequence analysis andmetabolic reconstruction. Nucleic Acids Res 28: 123–125.

Pedersen, J.Z., and Finazzi-Agro, A. (1993) Protein-radicalenzymes. FEBS Lett 325: 53–58.

Pramanik, J., and Keasling, J.D. (1998) Effect of Escherichiacoli biomass composition on central metabolic fluxes pre-dicted by a stoichiometric model. Biotech Bioeng 60: 230–238.

Reeve, C.A., Bockman, A.T., and Matin, A. (1984) Role ofprotein degradation in the survival of carbon-starvedEscherichia coli and Salmonella typhimurium. J Bacteriol157: 758–763.

Richmond, R.C. (1970) Non darwinian evolution: a critique.Nature 225: 1025–1028.

Rogers, M.S., and Dooley, D.M. (2003) Copper-tyrosyl radi-cal enzymes. Curr Opin Chem Biol 7: 189–196.

Sauer, U., Hatzimanikatis, V., Hohmann, H.P., Manneberg,M., van Loon, A.P., and Bailey, J.E. (1996) Physiologyand metabolic fluxes of wild-type and riboflavin-producing Bacillus subtilis. Appl Environ Microbiol 62:3687–3696.

Savageau, M.A. (1983) Escherichia coli habitats, cell types,and molecular mechanisms of gene control. Am Nat 122:732–744.

Seligmman, H. (2003) Cost-minimization of amino acidusage. J Mol Evol 56: 151–161.

Singer, G.A., and Hickey, D.A. (2000) Nucleotide bias causesa genomewide bias in the amino acid composition of pro-teins. Mol Biol Evol 17: 1581–1588.

Trifonov, E.N. (1987) Translation framing code and frame-monitoring mechanism as suggested by the analysis ofmRNA and 16 S rRNA nucleotide sequences. J Mol Biol194: 643–652.

Voit, E.O., and Ferreira, A.E.N. (2000) Computational Analy-sis of Biochemical Systems, a Practical Guide for Biochem-ists and Molecular Biologists. Cambridge, UK: CambridgeUniversity Press.

Weichart, D., Querfurth, N., Dreger, M., and Hengge-Aronis,R. (2003) Global role for ClpP-containing proteases in sta-tionary phase adaptation of Escherichia coli. J Bacteriol185: 115–125.

Wolfram, S. (1999) The Mathematica Book. New York:Cambridge University Press.

Xiu, Z.-L., Chang, Z.-Y., and Zeng, A.-P. (2002) Nonlineardynamics of regulation of bacterial trp operon: model anal-ysis of integrated effects of repression, feedback inhibition,and attenuation. Biotechnol Prog 18: 686–693.

evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes

Documents