directional mutation pressure andneutral molecular evolution · directional mutation pressure,...

5
Proc. Natl. Acad. Sci. USA Vol. 85, pp. 2653-2657, April 1988 Evolution Directional mutation pressure and neutral molecular evolution (guanine-plus-cytosine content/selective constraints/non-Darwinian evolution) NOBORU SUEOKA Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO 80309 Communicated by Norman H. Horowitz, November 9, 1987 (received for review August 13, 1987) ABSTRACT A quantitative theory of directional muta- tion pressure proposed in 1962 explained the wide variation of DNA base composition observed among different bacteria and its small heterogeneity within individual bacterial species. The theory was based on the assumption that the effect of mutation on a genome is not random but has a directionality toward higher or lower guanine-plus-cytosine content of DNA, and this pressure generates directional changes more in neutral parts of the genome than in functionally significant parts. Now that DNA sequence data are available, the theory allows the estimation of the extent of neutrality of directional mutation pressure against selection. Newly defined parameters were used in the analysis, and two apparently universal constants were discovered. Analysis of DNA sequence has revealed that practically all organisms are subject to directional mutation pressure. The theory also offers plausible explanations for the large heterogeneity in guanine-plus-cytosine content among different parts of the vertebrate genome. Questions on relative roles of Darwinian selection and of non-Darwinian or neutral mutation on molecular evolution have been the focus of discussion for the last two decades between neutralists and selectionists. It is now widely ac- cepted that the neutral concept of evolution propounded by Kimura since 1968 (1, 2) and by King and Jukes since 1969 (3) is as important in evolution as selection is. The neutral theory of evolution is based mainly on two premises: (i) functionally neutral changes of the genome by nucleotide- substitution mutation escape the editing effects of Darwinian selection (1, 3) and (ii) these changes are fixed in the population by random genetic drift (1, 2) as formulated by Wright (4). The theory explains the much higher rates of DNA base substitution compared to the mutation rates previously estimated from phenotypically observable muta- tions (1). The two premises were reasonable; the first one had previously been pointed out (5, 6). These premises are not sufficient, however, to explain a number of the major features of the neutral evolution of the genome. For exam- ple, the neutral theory elaborated by Kimura (2) does not explain the main features of DNA base composition: the relatively small heterogeneity within a microbial species (7-10) in contrast to the extremely wide variation, ranging from approximately 25% to 75% guanine-plus-cytosine (G + C) content, among different species of bacteria, algae, and protozoa (11-13). By contrast, the quantitative theory of directional mutation pressure, proposed in 1962 (5, 6), can explain a number of evolutionary changes that have oc- curred in DNA base composition and nucleotide sequence that are not possible to understand by the principles of the neutral theory alone. Moreover, as shown in this paper, use of parameters previously defined in the directional mutation theory allows us to estimate the degree of neutrality of mutational change in terms of DNA base composition (G + C content) in bacteria or in different genes of individual verte- brates. Existence of nonrandom overall mutation pressure- namely, directional mutation pressure toward the a pair (A-T or T-A) or toward the 'y pair (G-C or C-G), was first suspected from the variation of DNA base composition [expressed as G + C content, 'y/(a + y)] among DNAs of different species and heterogeneity within the genome of an individual species of bacteria (9, 10). Variation of G + C content is also reflected in the total amino acid composition in bacteria and Tetrahymena (14, 15). As predicted, when the genetic code was deciphered, it became evident that the correlation of each amino acid content with DNA G + C content depends on the G + C content of the codon set for the particular amino acid. It is therefore important to note that the directional mutation pressure can exert some directional (nonrandom) changes on the amino acid composition of proteins as well. Results of some recent analyses of DNA sequence and codon usage have been explained by directional mutation pressure (16-23). Theory of Directional Mutation Pressure. Based on the major features of variation and heterogeneity of DNA base composition, Sueoka (5) formulated a quantitative theory of directional mutation pressure in 1962. In the same year, Freese (6) also proposed a theory to explain some composi- tional features of DNA. The two theories are similar in principle and conclusions but are different in treatment. The directional mutation theory and rationale for it in the form presented by Sueoka (5) are briefly stated below. The major cause for a change in DNA G + C content of an organism is the mutation (5) between an a pair and a y pair. When there are no selective constraints, u y a (p) V (1 - p) [1] where p is the fractional G + C content of DNA and u and v are mutation rates per generation per base. At equilibrium (5), v v p P = , or - = u + v u 1- - [2] Here, p is the G + C content at equilibrium. The large variation of G + C content among DNAs of different bacteria was explained mainly by differences in v/u rather than by selection (5, 6). Directional Mutation Pressure. A definition of directional mutation pressure is necessary for the quantitative analysis of the effect of directional mutation on the G + C content of DNA. The directional mutation pressure (AD) is now defined as (5) v AD = u + V [3] 2653 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on October 14, 2020

Upload: others

Post on 01-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Directional mutation pressure andneutral molecular evolution · directional mutation pressure, proposed in 1962 (5, 6), can explain a number of evolutionary changes that have oc-curred

Proc. Natl. Acad. Sci. USAVol. 85, pp. 2653-2657, April 1988Evolution

Directional mutation pressure and neutral molecular evolution(guanine-plus-cytosine content/selective constraints/non-Darwinian evolution)

NOBORU SUEOKADepartment of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO 80309

Communicated by Norman H. Horowitz, November 9, 1987 (received for review August 13, 1987)

ABSTRACT A quantitative theory of directional muta-tion pressure proposed in 1962 explained the wide variation ofDNA base composition observed among different bacteria andits small heterogeneity within individual bacterial species. Thetheory was based on the assumption that the effect of mutationon a genome is not random but has a directionality towardhigher or lower guanine-plus-cytosine content of DNA, andthis pressure generates directional changes more in neutralparts of the genome than in functionally significant parts. Nowthat DNA sequence data are available, the theory allows theestimation of the extent of neutrality of directional mutationpressure against selection. Newly defined parameters wereused in the analysis, and two apparently universal constantswere discovered. Analysis of DNA sequence has revealed thatpractically all organisms are subject to directional mutationpressure. The theory also offers plausible explanations for thelarge heterogeneity in guanine-plus-cytosine content amongdifferent parts of the vertebrate genome.

Questions on relative roles of Darwinian selection and ofnon-Darwinian or neutral mutation on molecular evolutionhave been the focus of discussion for the last two decadesbetween neutralists and selectionists. It is now widely ac-cepted that the neutral concept of evolution propounded byKimura since 1968 (1, 2) and by King and Jukes since 1969(3) is as important in evolution as selection is. The neutraltheory of evolution is based mainly on two premises: (i)functionally neutral changes of the genome by nucleotide-substitution mutation escape the editing effects of Darwinianselection (1, 3) and (ii) these changes are fixed in thepopulation by random genetic drift (1, 2) as formulated byWright (4). The theory explains the much higher rates ofDNA base substitution compared to the mutation ratespreviously estimated from phenotypically observable muta-tions (1). The two premises were reasonable; the first onehad previously been pointed out (5, 6). These premises arenot sufficient, however, to explain a number of the majorfeatures of the neutral evolution of the genome. For exam-ple, the neutral theory elaborated by Kimura (2) does notexplain the main features of DNA base composition: therelatively small heterogeneity within a microbial species(7-10) in contrast to the extremely wide variation, rangingfrom approximately 25% to 75% guanine-plus-cytosine(G + C) content, among different species of bacteria, algae,and protozoa (11-13). By contrast, the quantitative theory ofdirectional mutation pressure, proposed in 1962 (5, 6), canexplain a number of evolutionary changes that have oc-curred in DNA base composition and nucleotide sequencethat are not possible to understand by the principles of theneutral theory alone. Moreover, as shown in this paper, useof parameters previously defined in the directional mutationtheory allows us to estimate the degree of neutrality of

mutational change in terms ofDNA base composition (G + Ccontent) in bacteria or in different genes of individual verte-brates.

Existence of nonrandom overall mutation pressure-namely, directional mutation pressure toward the a pair (A-Tor T-A) or toward the 'y pair (G-C or C-G), was first suspectedfrom the variation of DNA base composition [expressed asG + C content, 'y/(a + y)] among DNAs of different speciesand heterogeneity within the genome of an individual speciesof bacteria (9, 10).

Variation of G + C content is also reflected in the totalamino acid composition in bacteria and Tetrahymena (14,15). As predicted, when the genetic code was deciphered, itbecame evident that the correlation of each amino acidcontent with DNA G+ C content depends on the G+Ccontent of the codon set for the particular amino acid. It istherefore important to note that the directional mutationpressure can exert some directional (nonrandom) changes onthe amino acid composition of proteins as well. Results ofsome recent analyses of DNA sequence and codon usagehave been explained by directional mutation pressure(16-23).Theory of Directional Mutation Pressure. Based on the

major features of variation and heterogeneity of DNA basecomposition, Sueoka (5) formulated a quantitative theory ofdirectional mutation pressure in 1962. In the same year,Freese (6) also proposed a theory to explain some composi-tional features of DNA. The two theories are similar inprinciple and conclusions but are different in treatment. Thedirectional mutation theory and rationale for it in the formpresented by Sueoka (5) are briefly stated below.The major cause for a change in DNA G + C content of an

organism is the mutation (5) between an a pair and a y pair.When there are no selective constraints,

u

y a

(p) V (1 -p)[1]

where p is the fractional G + C content of DNA and u and vare mutation rates per generation per base. At equilibrium (5),

v v pP = , or - =

u + v u 1--[2]

Here, p is the G+C content at equilibrium. The largevariation ofG + C content among DNAs of different bacteriawas explained mainly by differences in v/u rather than byselection (5, 6).

Directional Mutation Pressure. A definition of directionalmutation pressure is necessary for the quantitative analysis ofthe effect of directional mutation on the G + C content ofDNA.The directional mutation pressure (AD) is now defined as (5)

vAD =

u + V[3]

2653

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Oct

ober

14,

202

0

Page 2: Directional mutation pressure andneutral molecular evolution · directional mutation pressure, proposed in 1962 (5, 6), can explain a number of evolutionary changes that have oc-curred

Proc. Natl. Acad. Sci. USA 85 (1988)

Table 1. Major statistics for the organisms referred to in this article

Avg. no. G + C contentNo. of of codons

Organism genes per gene Total* P1 P2 P3 V/utBacteria

Bacillus subtilis 10 308.5 0.42 0.516 ± 0.027 0.382 ± 0.065 0.427 ± 0.038 0.75 ± 0.12Escherichia coli 110 438.2 0.50 0.607 ± 0.045 0.404 ± 0.032 0.564 ± 0.059 1.34 ± 0.31

FungusSaccharomyces cerevisiae 49 333.2 0.35 0.465 ± 0.051 0.376 ± 0.048 0.392 ± 0.050 0.66 ± 0.15

ProtozoanHypotrichst 3 295.3 0.42 0.518 ± 0.031 0.401 ± 0.025 0.523 ± 0.063 1.10 ± 0.06

InvertebratesCaenorhabditis elegans§ 8 486.5 0.36 0.579 ± 0.096 0.398 ± 0.136 0.489 ± 0.087 1.01 ± 0.31Drosophila melanogaster 11 272.3 0.40 0.548 ± 0.047 0.398 ± 0.061 0.788 ± 0.034 3.83 ± 0.73Sea urchin1 8 148.3 0.42 0.581 ± 0.053 0.462 ± 0.054 0.621 ± 0.048 1.68 ± 0.34

VertebratesXenopus laevis 5 126.6 0.42 0.567 ± 0.018 0.370 ± 0.058 0.451 ± 0.144 1.01 ± 0.76Chicken 28 257.6 0.44 0.588 ± 0.060 0.410 ± 0.059 0.692 ± 0.138 3.12 ± 2.49Mouse 31 302.4 0.44 0.549 ± 0.049 0.431 ± 0.064 0.619 ± 0.105 1.84 ± 0.84Rat 43 247.8 0.43 0.560 ± 0.065 0.408 ± 0.068 0.654 ± 0.102 2.15 ± 0.95Human 90 311.4 0.43 0.552 ± 0.061 0.414 ± 0.052 0.631 ± 0.165 2.43 ± 2.12

PlantZea mays 8 269.3 0.46 0.616 ± 0.039 0.377 ± 0.017 0.468 ± 0.055 0.90 ± 0.23Unless otherwise stated, codon-frequency data compiled by Maruyama et al. (27) were used. Each value for the three codon positions

represents the average G +C content of genes with standard deviation.*G + C content of the whole genome generally accepted (28).tRatio of mutation rates, calculated by Eq. 4b.tCiliates (data collection assisted by David Prescott).§Nematodes (data collection assisted by Thomas Blumenthal).IDifferent species are combined (27).

Here, gD > 0.5 indicates that the mutation pressure favors yover at and AD < 0.5 indicates a preference for a over y. Atequilibrium, the G + C content of a neutral nucleotide posi-tion equals mutation pressure D (5); p is therefore termedthe G + C content at "mutational equilibrium."

Analysis of directional mutation pressure is based on therelative value of mutation rates, u and v, not on theirabsolute values. Since the probability of fixation in thepopulation is similar for individual neutral mutations, thevalues of v/u and g also apply to the evolution (change inpopulation) of P3, as far as P3 is in equilibrium with direc-tional mutation pressure. On the other hand, the fixation ofP,2 and P123 in a population is subject to selective con-straints, the extent of which will be examined in this paper,assuming that P12 and P123 are values in equilibrium withdirectional mutation pressure and selective constraints. Inother words, the rate of directional change of DNA G +Ccontent is not our concern in the present analysis.

In 1965, Cox and Yanofsky (24) reported experimentalresults indicating that the directional mutation pressureindeed changes the G + C content of Escherichia coli. Theyused a mutator strain, mutT (25); mutT enhances mutationrate 103-fold over the spontaneous level and all mutationalchanges due to mutT are A-T to C-G (26). The continuousculture of the mutT strain for 1200-1600 generations showedchanges in G + C content by 0.2-0.5%.G + C Content of the Three Codon Positions. In the present

analysis, observed G + C contents of the first, second, andthird codon positions (P1, P2, and P3, respectively) arecorrected average G+C contents of the three codon posi-tions that, are calculated from 56 triplets out of 64. Becauseof the inequality of a and 'y at the third codon position, thethree stop codons (TAA, TAG, and TGA) and the threecodons for isoleucine (ATT, ATC, and ATA) were excludedin calculation of P3, and two single codons for methionine(ATG) and tryptophan (TGG) were excluded in all three (P1,P2, and P3). The G+ C contents are calculated for indivi-

dual genes or for the sum of genes of an organism (Table 1).Unless otherwise specified, we analyze the data of codon-usage frequency recently calculated by Maruyama et al. (27)from the sequence data of GenBank* (29).

Analysis of Equilibrium. The G + C content of the thirdcodon position (P3) was used for estimating the directionalmutation pressure (PD) in the present work. Because of thepartially silent nature of the third codon position (30), P3represents one of the most neutral nucleotides within thegenome, so far as the G + C content is concerned (8, 31). Themarked response of P3 to directional mutation pressure isalso manifested in its extreme values in bacteria (9% and95%; Fig. la), in some genes of vertebrates (95%; refs. 32and 33), and in mitochondrial genes of Drosophila yakuba(6.2%; ref. 19). These results indicate close neutrality of thethird codon positions in G + C content. The near neutrality ofP3 does not mean that any change of the third codon positionis neutral. Obviously, some transversions are not neutral,since they cause amino acid changes. However, transitionsshould be sufficient to bring P3 into equilibrium with AD.Nevertheless, changes in P3 may not be completely neutral.If an independent method to assess the extent of neutrality ofP3 becomes available in the future, small corrections maybecome necessary.

Unlike P3, P1 and P2 are subject to functional constraintsagainst change because a mutation at these positions usuallyleads to an amino acid change, except between some codonsof arginine, leucine, or serine. Other regions such as intronsand intergenic spacers are abundant only in eukaryotes andtheir function is not well understood and may differ widely intheir degree of functional neutrality.Assuming that P3 is neutral with regard to directional

mutation pressure and that a near equilibrium has been

*EMBL/GenBank Genetic Sequence Database (1985) GenBank(Bolt, Beranek, and Newman Laboratories, Cambridge, MA),Tape Release 38.0.

2654 Evolution: Sueoka

Dow

nloa

ded

by g

uest

on

Oct

ober

14,

202

0

Page 3: Directional mutation pressure andneutral molecular evolution · directional mutation pressure, proposed in 1962 (5, 6), can explain a number of evolutionary changes that have oc-curred

Proc. Natl. Acad. Sci. USA 85 (1988) 2655

attained for P3, currently the best estimate of AD and v/u canbe obtained by approximating j with P3 in Eqs. 2 and 3:

D= P3 [4a]

V P3

U (1 - P3) [4b]

where P3 is the G + C content of the third codon position atequilibrium with the directional mutation pressure. In real-ity, P3 can be approximated by P3. In this article, P valueswith circumflex indicate values at equilibrium either withmutation alone (P) or with both mutation pressure andselection (P1, P2, P12, and P123).

Extent of Directional Mutation Pressure and Equilibrium.In order to analyze the relation between directional mutationpressure and selective constraints, the data used by Mutoand Osawa (16) are presented in Fig. la with some modifi-cation. The G + C content of the first and the second codonpositions (ordinate) are plotted against the G + C content ofthe third codon position (P3; abscissa), instead of against thetotal DNA G+ C content as originally reported (16). Thetotal G+ C content is not an ideal variable for quantitativearguments because it includes all three letters of the codonand also other noncoding areas. Their data on the codingareas and spacer regions between genes of the same set ofbacteria are also replotted against P3 (Fig. lb).As Muto and Osawa (16) pointed out, it is evident that

bacteria with lower and higher values of G + C content havemore extreme values in the third codon position than thetotal G + C content or than those of the first and secondcodon positions (Fig. la). It is also noted that even amongbacteria whose DNA G + C content is well within the limitsof 25% and 75% G + C, there is the tendency that specieswith a high G + C content have an even higher G + C contentin the third codon position (G + C pressure) and species witha high A + T content have an even higher A + T content(A + T pressure). The consistently higher G + C content and

1.0a b

°0X00

00.5 1.0 0.5 1.0G+C content of third position

FIG. 1. Regression of G + C content of the first and secondpositions with the third codon position among bacterial species. (a)Regression lines were drawn by linear least-squares analysis. G + Ccontents of the first codon position (o), those of the second codonposition (A), and those of the average of the first and second codonpositions (W) are shown. Data compiled by Muto and Osawa (16)were replotted against the G + C content of the third codon position(P3). Because of large differences in the number of genes fordifferent bacteria, regression analysis was performed using theaverage value for each species without weighting by the number ofgenes. Regression coefficients (or E values) and their standarddeviations for the first codon position, the second codon position,and the average of the two are 0.374 ± 0.013, 0.163 ± 0.012, and0.269 ± 0.011, respectively. (b) Data of Muto and Osawa (16) on theG + C content of coding areas (o) and noncoding areas (spacers, x)were replotted against the G + C content of the third codon positionof bacteria. Regression coefficients for the coding areas and thespacers are 0.53 ± 0.02 and 0.66 ± 0.01, respectively.

the slope of the first codon position compared with thesecond come from the fact that more abundant amino acidshave more y pairs at the first codon positions than at thesecond codon positions (3, 14, 33).The situation is shown diagrammatically in Fig. 2. The

extent of discrepancy in G+C content between the thirdcodon position and the average of first and second codonpositions (P12) (the average line in Figs. la and 2), isproportionately larger on both sides of the intersect (Ep), asindicated by the two thick open arrows, one toward higherA+T content and the other toward higher G+C content.This discrepancy is interpreted to mean that the directionalmutation pressure is counteracted by selective constraints,thus forming a new type of equilibrium. As the parameterthat represents the overall situation ofG + C equilibrium, themutation-selection equilibrium coefficient E is defined as theregression coefficient against P3. The equilibrium coefficientE is 0 for no effect of directional mutation pressure (completeselective constraints) and 1 for the complete equilibriumwith .tD (complete neutrality). Thus, E12 represents theextent of equilibrium of P12 by directional mutation pressureand selective constraints. The values of E and 1 - E areconvenient measures of relative effects of neutrality andselective constraints, respectively.

Mutation-Selection Equilibrium. With the assumption ofneutrality of P3 and equilibrium between the directionalmutation pressure and selective constraints, the relationshipbetween P3 and P12 presented in Fig. 2 can be expressed as

12 = Ep + r12(P3 -Ep),or P12 = r12P3 + (1 -E2)Ep.

[5a]

[5b]

At the equilibrium point Ep, the average (P12) of P1 andP2 equals P~; note that if E equals 0 (complete selectiveconstraint), P12 equals Ep independent of P3 or AD (E%. 5b).Note that exactly the same Ep value is applicable to P12 aswell as the whole protein-coding area, including all threecodon positions (P123; see Table 2 legend). Ep thus representsthe proportion of G + C nucleotides in positions 1 and 2 or for

1.0

~00.5-.+

Amm - --- - - 1...0

G+C content of third position

FIG. 2. Diagrammatic representation of the equilibrium coeffi-cient, E12, the equilibrium point (Ep), and the minimal (Ami.) andmaximal (Am.) values of the average G + C content of the first andsecond codon positions. Solid line represents the regression line ofthe average G + C content of the first and the second codon positionsof various bacteria (Fig. la). Broken line is drawn parallel to theregression line through the origin to visualize the equilibriumcoefficient, E12 (degree of neutrality), and the disequilibrium coeffi-cient, 1 - E12 (degree of selective constraints). Open arrows indicatedeviation of the regression line from complete equilibrium with thedirectional mutation pressure, tD = v/(u + v) (dotted line).

Evolution: Sueoka

Dow

nloa

ded

by g

uest

on

Oct

ober

14,

202

0

Page 4: Directional mutation pressure andneutral molecular evolution · directional mutation pressure, proposed in 1962 (5, 6), can explain a number of evolutionary changes that have oc-curred

Proc. Natl. Acad. Sci. USA 85 (1988)

Table 2. Relative contributions of neutrality (e) for P12 and P123, Ep common for P12 and P123, and Amin and Ara for P123 for coding areasof various organisms

Neutrality,* % Aifor A for- A~~~~~~ ~ ~~~~~~~~minforaxfoSystem 612 X 100 E123 X 100 Ept P123.% P123, %

Bacterial speciest 26.3 ± 6.6 50.8 ± 4.4 0.464 ± 0.076 22.8 ± 4.6 73.6 ± 4.6Vertebrates§Xenopus 19.3 ± 12.9 46.2 ± 13.0 0.473 ± 0.114 25.4 ± 1.5 71.6 ± 1.5Chicken 15.4 ± 5.0 44.6 ± 5.4 0.464 ± 0.044 26.2 ± 2.3 69.8 ± 2.3Mouse 27.4 ± 5.4 51.6 ± 5.6 0.442 ± 0.051 21.4 ± 2.8 73.0 ± 2.8Rats 22.6 ± 5.8 40.3 ± 5.7 0.428 ± 0.051 22.1 ± 2.6 70.5 ± 2.6Human 19.4 ± 2.0 46.3 ± 2.2 0.447 ± 0.018 24.0 ± 2.3 70.3 ± 2.3

Coding areas represent the areas of DNA that encode amino acid sequences, including all three codon positions. For a visual presentationof these parameters, see Fig. 2.6123 is the equilibrium coefficient for the coding area. Relative effects are expressed in percent neutrality (E123) and selective constraints (1 -

6123) on the effective directional mutation pressure (P3- Ep) (Fig. 2; Eq. 5a).tEp values were calculated from the regression of P12 or P123 [(P1 + P2 + P3)/3] against P3 without weighting by the number of genes for eachspecies of bacteria and with weighting by the number of codons for each gene for vertebrates. The Ep value for the whole coding area (P123)is the same as the Ep value for P12, because at Ep for P12, P12 = P3, and consequently, at Ep, P123 = (2P12 + P3)/3 = P3, which is the samefor P12. Standard deviations for Ep were calculated as [(1 - E)-2 Var(Amin) + (Amin)2(1 - E) Var(E)V1/2, where Var(Amin) and Var(E) arevariances for Arnin and for E, respectively (see ref. 34).$Data compiled by Muto and Osawa (16) were used for calculation (Fig. lb).§Genes of each vertebrate (27) were used for calculation.$A parotid gland proline-rich protein gene was eliminated from the calculation.

all three codon positions; where P12 and P123 are at equilib-rium with directional mutation pressure (AD) and are notsubject to selective constraints. The EP values calculated for

different bacteria and the EP values for different genes ofindividual vertebrates are similar (Table 2). EP may, there-fore, be a universal constant for P12 and for P123 (the entirecoding area).

Heterogeneity of P3 Within Vertebrate Species. As is evi-dent from Fig. 3 and Tables 1 and 2, the intraspecific

1.1

-

0.

r. O.0

+

1..510 .51

E. ccli Human

t4

~0.5 ~0

0

0.5 1.0 0.5

G+C content of third position

FIG. 3. Comparison of G+C content distributions of genesbetween E. coli and human. Regression lines of the G + C content ofthe first (Pl; o), the second (P2; x), and the average of the first andsecond codon positions (P12; *) against the G + C content of the thirdcodon position (P3) among different genes of human are also shown.The E. coli data do not have enough spread in P3 to show significantregression coefficient. The regression coefficient (612) of the averageG+C content of the first and second codon positions (P12) forhuman data is 0.194 + 0.020 and the equilibrium point (EP) iS 0.447+ 0.011 (see also Table 2). Each datum point was weighted by thenumber of codons of the gene for calculating the regression coeffi-cient.

compositional heterogeneity of vertebrate DNA is known tobe larger than that of bacterial DNA. In vertebrates, P3 ofindividual genes of a species shows much wider distribution,covering the P3 value between 0.3 and 0.9, than intraspecificP3 values of bacteria, yeast, nematode, Drosophila, andmaize (Table 1). Thus, different genes of individual verte-brates have widely different base compositions, particularlyin the third codon position. Moreover, as reported byBernardi et al. (32) and Ikemura (33), a stretch of DNA witha certain G+ C content usually covers larger regions thanindividual genes, probably including several genes. In thisconnection, Hartley and Callan (35) reported that the giantchromatin loop of a newt lampbrush chromosome has a basecomposition different from those of other loops. Thesedifferent domains ('200 kilobases) were termed "isochore"by Bernardi et al. (32). Within a domain, the third codonposition, introns, and 5' and 3' flanking regions show con-certedly high or low G + C contents (32, 33). In this sense,the situation is very similar to the interspecies relationshipfound in bacteria. These results indicate that each domain ofa vertebrate chromosome may have a common, but unique,*LD value. Recently, Nomura et al. (36) reported that thebacterial intraspecific heterogeneity of DNA G + C content,although considerably smaller than that of vertebrate, is alsosignificant and regional.

DISCUSSIONBoth the directional mutation theory and the neutral theorybase their arguments on the selectively neutral changes bymutation that play a major role in changing DNA basesequence. In that sense, the two theories belong to the samecategory. The most fundamental difference between the twotheories lies in the treatment of the mutation effect itself.The directional mutation theory treats mutations as bidirec-tional events as indicated in Eqs. 1 and 2. In addition, thetheory incorporates the effect of selective constraints as an

integral part. For each mutational event, the principle ofrandom drift should be applicable. In the neutral theory,however, directionality and uniformity of mutation withinspecies or within a domain are not taken into account.The three constants 612, 6123, and EP appear to be univer-

sal as shown by the fact that, in the coding areas, they are

similar both among different bacterial species and amongdifferent genes of individual vertebrates (Table 2). The near

E.c.li Human

'; 0

a~~~~~

2656 Evolution: Sueoka

Dow

nloa

ded

by g

uest

on

Oct

ober

14,

202

0

Page 5: Directional mutation pressure andneutral molecular evolution · directional mutation pressure, proposed in 1962 (5, 6), can explain a number of evolutionary changes that have oc-curred

Proc. Natl. Acad. Sci. USA 85 (1988) 2657

a

-)

+

0.5 1.0G+C content of third position

FIG. 4. Combined distribution of P12 of yeast, nematode, andDrosophila. Distributions of P12 values of yeast (o), nematode (x),and D. melanogaster (A) are combined. E12, E123, and Ep valuescalculated by the least-squares method without weighting eachdatum point with the number of codons are 0.137 ± 0.030, 0.425 ±

0.020, and 0.428 ± 0.033, respectively. The point marked with anarrow represents the average value of collagen genes (col-l andcol-2) of C. elegans (37); this point was eliminated from thecalculation of the regression coefficient.

constancy of these values suggests that E12, 6123, and Epreflect the nature of the code and the necessary physico-chemical constraints of proteins to be stable and functionalin the cytoplasm. This point is further supported by the dataon yeast, nematode, and Drosophila presented in Fig. 4,where combination of the data for the three organisms formsanother set of E12 (0.137 + 0.030), E123 (0.425 + 0.020), andEp (0.428 ± 0.033) that are within the range found in bacteriaand vertebrates (Table 2). The extreme values (-25% and-75%) in the G + C content of the coding area in bacteria,protozoa, and algae (10-13) are those corresponding to uD =

0 and 4D = 1. This can be explained as the naturalconsequence of the universal values of 6123 and Ep and therelatively small proportion of noncoding areas in theseorganisms (Fig. 1).To explain the G + C heterogeneity, Cox (38) wrote in 1972

that mutation rates may vary in different parts of thechromosome and the genes may be organized on the chro-mosome by selection according to the extent of their func-tional constraints. Two plausible explanations based ondirectional mutation pressure for the wide intraspecific het-erogeneity of P3 among proteins of higher vertebrates are asfollows. (i) There might be several different directionalmutation pressures (not necessarily mutation rates) in differ-ent locations on the genome, and the cause for this differ-ence might reside in the local structural elements of thechromatin. Thus, major mutagenic events (DNA replicationand repair) may act differently in different domains. (ii) DNAreplication and DNA repair synthesis may make replicationerrors differently and the extent of DNA repair synthesismay vary among various domains of the chromatin becauseof the different susceptibility of DNA to damage and repairdue to differences in chromatin structure. Clarification ofchromosome domains is an important future consideration.Whether or not domains are equivalent to loop structures ofthe chromatin (39) remains to be seen.

I am grateful to those people who have helped me by supplyingdata as acknowledged in this article and to Shawn Elliott, CharlesHambleton, and Todd Devine for their patient assistance in datacalculation. I am much indebted also to those in my laboratory and

in the Department for their help in improving the manuscript.Constructive comments by Drs. James F. Crow and Norman H.Horowitz on this paper are gratefully acknowledged. This workcame out of the research supported by a National Science Founda-tion Grant G-15080 (to N.S.) during the period 1959-1962.

1. Kimura, M. (1968) Nature (London) 217, 624-626.2. Kimura, M. (1983) The Neutral Theory ofMolecular Evolution

(Cambridge Univ. Press, Cambridge, U.K.).3. King, J. L. & Jukes, T. H. (1969) Science 164, 788-798.4. Wright, S. (1938) Science 87, 430-431.5. Sueoka, N. (1962) Proc. Natl. Acad. Sci. USA 48, 582-592.6. Freese, E. (1962) J. Theor. Biol. 3, 82-101.7. Sueoka, N., Marmur, J. & Doty, P. (1959) Nature (London)

183, 1429-1431.8. Rolfe, R. & Meselson, M. (1959) Proc. Natl. Acad. Sci. USA

45, 1039-1043.9. Sueoka, N. (1959) Proc. Natl. Acad. Sci. USA 45, 1480-1490.

10. Sueoka, N. (1961) J. Mol. Biol. 3, 31-40.11. Lee, K. Y., Wahl, R. & Barbu, E. (1956) Ann. Inst. Pasteur

Paris 91, 212-224.12. Belozersky, A. N. & Spirin, A. S. (1958) Nature (London)

182, 111-112.13. Bak, A. L., Atkins, J. F. & Meyer, S. A. (1972) Science 175,

1391-1393.14. Sueoka, N. (1961) Proc. Natl. Acad. Sci. USA 47, 1141-1149.15. Sueoka, N. (1961) Cold Spring Harbor Symp. Quant. Biol. 26,

35-43.16. Muto, A. & Osawa, S. (1987) Proc. NatI. Acad. Sci. USA 84,

161-165.17. Bibb, M. J., Bibb, B. J., Wand, J. M. & Cohan, S. N. (1985)

Mol. Gen. Genet. 199, 26-36.18. Jukes, T. H. & Bhushan, V. (1986) J. Mol. Evol. 24, 39-44.19. Clary D. 0. & Wolstenholme, D. R. (1985) J. Mol. Evol. 22,

252-271.20. Helftenbein, E. (1985) Nucleic Acids Res. 13, 415-433.21. Preer, J. R., Preer, L. B., Rudman, B. M. & Barnett, A. J.

(1985) Nature (London) 314, 188-190.22. Kuchino, Y., Hanyu, N., Tashiro, F. & Nishimura, S. (1985)

Proc. Natl. Acad. Sci. USA 82, 4758-4762.23. Jukes, T. H., Osawa, S. & Muto, A. (1987) Nature (London)

325, 668.24. Cox, E. C. & Yanofsky, C. (1967) Proc. NatI. Acad. Sci. USA

58, 1895-1902.25. Treffers, H. P., Sinelli, V. & Belser, N. 0. (1954) Proc. Natl.

Acad. Sci. USA 40, 1064-1071.26. Yanofsky, C., Cox, E. C. & Horn, V. (1966) Proc. Natl. Acad.

Sci. USA 55, 274-281.27. Maruyama, T., Gojobori, T., Aota, S. & Ikemura, T. (1986)

Nucleic Acids Res. 14, Suppl., rlS1-r197.28. Normore, W. N. & Brown, J. R. (1970) in Handbook of

Biochemistry: Selected Data for Molecular Biology, ed. West,R. C. (Chemical Rubber Co., Cleveland, OH), 2nd Ed., pp.H24-H103.

29. Bilofsky, H. S., Burks, C., Fickett, J. W., Goad, W. B.,Lewitter, F. L., Rindone, W. P., Swindell, C. D. & Tung,C. S. (1986) Nucleic Acids Res. 14, 1-4.

30. Jukes, T. H. (1965) Am. Sci. 53, 477-487.31. Yanofsky, C. & vanCleemput, M. (1982) J. Mol. Biol. 154,

235-246.32. Bernardi, G., Olfsson, B., Filipski, J., Zerial, M., Salinas, J.,

Cuny, G., Meunier-Rotival, M. & Rodier, F. (1985) Science228, 953-958.

33. Ikemura, T. (1985) Mol. Biol. Evol. 2, 13-34.34. Meyer, S. L. (1975) Data Analysisfor Scientists and Engineers

(Wiley, New York), pp. 39-41.35. Hartley, S. E. & Callan, H. G. (1978) J. Cell Sci. 34, 279-288.36. Nomura, M., Sor, F., Yamagishi, M. & Lawson, M. (1987)

Cold Spring Harbor Symp. Quant. Biol. 50, in press.37. Kramer, J. M., Cox, G. N. & Hirsh, D. (1982) Cell 30,

599-606.38. Cox, E. C. (1972) Nature (London) 239, 133-134.39. Gasser, S. M. & Laemmli, U. K. (1987) Trends Genet. 3,

16-22.

Evolution: Sueoka

Dow

nloa

ded

by g

uest

on

Oct

ober

14,

202

0