independent origin of nonclassical class i genes in different … publications/1989-hughes... ·...

21
Evolution of the Major Histocompatibility Complex: Independent Origin of Nonclassical Class I Genes in Different Groups of Mammals Austin L. Hughes and Masatoshi Nei Center for Demographic and Population Genetics, The University of Texas Health Science Center at Houston The class I major histocompatibility complex genes are composed of classical and nonclassical genes, the latter being largely nonfunctional. To understand the evo- lutionary relationships of the two groups of class I genes, a phylogenetic analysis of DNA sequences was conducted using 45 genes from six mammalian and one avian species. The results indicate that nonclassical genes in one species are more closely related to classical genes from the same species than to nonclassical genes from a species belonging to a different order or family. This indicates that the differentiation of classical and nonclassical genes occurs rather rapidly in the genome. Classical genes are apparently duplicated with a high frequency in the evolutionary process, and many of the duplicated genes seem to degenerate into nonclassical genes as a result of deleterious mutation. The nonclassical Qu genes in the mouse have sequences homologous to regulatory sequences involved in the universal expression of classical class I genes, but they have accumulated numerous nucleotide substitutions in these sequences. The pattern of nucleotide substitution in non- classical genes is different from that in classical genes. In nonclassical genes, the rate of nonsynonymous substitution is higher in the antigen recognition site than in other gene regions, as is true of classical genes. However, unlike the case of classical genes, the nonsynonymous rate does not always exceed the synonymous rate in the antigen recognition site. Nonclassical proteins further differ from classical proteins in having amino acid replacements in conserved antigen recognition site positions. These observations are consistent with the hypothesis that nonclassical genes have originated from classical genes but have lost classical class I function because of deleterious mutation. Introduction The major histocompatibility complex (MHC) genes in vertebrates offer a number of intriguing questions to the evolutionary biologist. One such question concerns the relationships among member genes of a multigene family. In the present paper, we address this question for the class I MHC gene family. On the basis of degree of gene expression, class I genes can be divided into three groups: ( 1) the classical class I genes, which are expressed on all nucleated somatic cells, (2) the nonclassical class I genes, which have a more limited expression and are found principally on certain immune system cells, and (3) nonexpressed pseudogenes (Klein 1986, pp. 127-129; Klein and 1. Key sequences. words: gene expression, multigene families, nucleotide substitution, pseudogenes, regulatory Address for correspondence and reprints: Dr. Masatoshi Nei, Center for Demographic and Population Genetics, Graduate School of Biomedical Sciences, The University of Texas Health Science Center at Houston, P.O. Box 20334, Houston, Texas 77225. Mol. Biol. Evol. 6(6):559-579. 1989. 0 1989 by The University of Chicago. All rights reserved. 0737-4038/89/0606-000 1$02.00 559

Upload: others

Post on 02-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

Evolution of the Major Histocompatibility Complex: Independent Origin of Nonclassical Class I Genes in Different Groups of Mammals ’

Austin L. Hughes and Masatoshi Nei Center for Demographic and Population Genetics, The University of Texas Health Science Center at Houston

The class I major histocompatibility complex genes are composed of classical and nonclassical genes, the latter being largely nonfunctional. To understand the evo- lutionary relationships of the two groups of class I genes, a phylogenetic analysis of DNA sequences was conducted using 45 genes from six mammalian and one avian species. The results indicate that nonclassical genes in one species are more closely related to classical genes from the same species than to nonclassical genes from a species belonging to a different order or family. This indicates that the differentiation of classical and nonclassical genes occurs rather rapidly in the genome. Classical genes are apparently duplicated with a high frequency in the evolutionary process, and many of the duplicated genes seem to degenerate into nonclassical genes as a result of deleterious mutation. The nonclassical Qu genes in the mouse have sequences homologous to regulatory sequences involved in the universal expression of classical class I genes, but they have accumulated numerous nucleotide substitutions in these sequences. The pattern of nucleotide substitution in non- classical genes is different from that in classical genes. In nonclassical genes, the rate of nonsynonymous substitution is higher in the antigen recognition site than in other gene regions, as is true of classical genes. However, unlike the case of classical genes, the nonsynonymous rate does not always exceed the synonymous rate in the antigen recognition site. Nonclassical proteins further differ from classical proteins in having amino acid replacements in conserved antigen recognition site positions. These observations are consistent with the hypothesis that nonclassical genes have originated from classical genes but have lost classical class I function because of deleterious mutation.

Introduction

The major histocompatibility complex (MHC) genes in vertebrates offer a number of intriguing questions to the evolutionary biologist. One such question concerns the relationships among member genes of a multigene family. In the present paper, we address this question for the class I MHC gene family. On the basis of degree of gene expression, class I genes can be divided into three groups: ( 1) the classical class I genes, which are expressed on all nucleated somatic cells, (2) the nonclassical class I genes, which have a more limited expression and are found principally on certain immune system cells, and (3) nonexpressed pseudogenes (Klein 1986, pp. 127-129; Klein and

1. Key sequences.

words: gene expression, multigene families, nucleotide substitution, pseudogenes, regulatory

Address for correspondence and reprints: Dr. Masatoshi Nei, Center for Demographic and Population Genetics, Graduate School of Biomedical Sciences, The University of Texas Health Science Center at Houston, P.O. Box 20334, Houston, Texas 77225.

Mol. Biol. Evol. 6(6):559-579. 1989. 0 1989 by The University of Chicago. All rights reserved. 0737-4038/89/0606-000 1$02.00

559

Page 2: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

560 Hughes and Nei

Figueroa 1986; Mellor 1987). Classical class I genes are further distinguished from nonclassical genes in that the former are highly polymorphic whereas the latter are essentially monomorphic (Tewarson et al. 1983; Klein and Figueroa 1986) and in that classical genes can present foreign antigens to cytotoxic T-cells whereas the latter cannot (Howard 1987). A major unsolved question regarding the class I MHC concerns the evolutionary relationship between classical and nonclassical genes and the origin of the nonclassical genes.

Rogers ( 198%) presented evidence, from hybridization with labeled DNA probes, that distantly related mammalian species generally do not possess orthologous non- classical genes. In the case of the rat and mouse [which diverged b 10 Myr ago ( Mya)], the nonclassical genes in each species are more closely related to its classical genes than to the nonclassical genes of the other species. On the other hand, the human and chimpanzee (separated only 5-7 Mya) seem to share at least one nonclassical locus (Lawlor et al. 1988). Such results have led to the hypothesis that nonclassical genes have arisen independently in separate lineages as a result of separate unequal crossover events (Rogers 198%; Klein and Figueroa 1986; Howard 1987). On this hypothesis, one copy of a duplicated classical locus may become nonclassical as a result of mu- tations which reduce its expression. Once nonclassical genes are produced, they may again be duplicated; but over long periods of evolutionary time, old nonclassical genes eventually degenerate into pseudogenes because of deleterious mutations or are re- moved from the genome by unequal crossing-over. Thus it is predicted that distantly related species will share no nonclassical genes, all nonclassical genes having arisen relatively recently.

We study this question by considering three types of data. First, we construct a phylogenetic tree for classical and nonclassical gene sequences from various mam- malian species and examine their evolutionary relationships. Phylogenetic trees for the class I MHC genes have previously been constructed by a number of authors (e.g., Klein and Figueroa 1986; Mellor 1987), but their reliability is low because they were either produced by subjective methods or refer to a small number of genes. Second, we examine the pattern of nucleotide substitution in regions of nonclassical genes which are homologous to regulatory sequences known to be important in regulating the universal expression of classical class I genes. Comparison of these regions between classical and nonclassical genes may provide evidence regarding the hypothesis that nonclassical genes have evolved from classical genes. Third, we examined the rates of synonymous and nonsynonymous (amino acid-altering) nucleotide substitutions in codons of nonclassical genes corresponding to the antigen recognition site (ARS) of classical genes. In classical class I molecules, codons in the ARS are known to show a higher rate of nonsynonymous substitution than of synonymous substitution because of adaptive evolution (Hughes and Nei 1988). If nonclassical genes do not have a function similar to that of classical genes, this relationship would not be observed.

DNA Sequences Used

Classical class I molecules consist of three extracellular domains (al, cr2, and a3), a transmembrane portion, and a cytoplasmic tail. Typically, the al domain contains 90 amino acid residues, a2 contains 92 residues, a3 contains 92 residues, the transmembrane portion contains 39 or 40 residues, and the cytoplasmic tail contains 25-35 residues. The ARS consists of 57 residues, all located in al and a2 (Bjorkman et al. 1987a, 1987b). The a3 domain, which is highly conserved, associates in vivo with P2-microglobulin. The protein is encoded by seven or eight exons. The first of

Page 3: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

Evolution of Class I MHC Genes 561

Table 1 DNA Sequences Used in Analyses

Species, Genes and Alleles

Mouse? Classical: Kb, Kd, Kk, Kw29.7, L’, Lb, Lk, Lp, Dd Nonclassical: Q4, Q7’, Q74 QIOb, Tla”, Tla’, Tla”, Mbl, 37d Pseudogene: T2Aa

Hamster:b Nonclassical(?): Hm1.6

Human:’ Classical: A2, A3, All, Aw24, Aw68, B13, B27, B44, Bw58, Cwl, Cw2, Cw3 Nonclassical: E, JTWZS, 6.0 Pseudogene: ~12.4

Rabbit:d Classical: pR9 Pseudogene: pR14

Pig:” Classical: PDI, PDI 4 Nonclassical: PD6

Bovine:f Classical: BL-6, BL3- 7

Chicken:g Classical: B-F1 2

* References: Moore et al. (198 1); Steinmetz et al. (198 1); Kvist et al. (1983); Weiss et al. ( 1983); Arnold et al. ( 1984); Mellor et al. ( 1984); Chen et al. ( 1985); Devlin et al. ( 1985); Fisher et al. (1985); Morita et al. (1985); Obata et al. (1985); Taylor Sher et al. (1985); Schepart et al. (1986); Transy et al. (1987); Watts et al. (1987); Robinson et al. (1988); Singer et al. (1988); Widmark et al. (1988).

b Reference: McGuire et al. ( 1986). ’ References: Malissen et al. (1982); Sodoyer et al. (1984); Strachan et al. (1984); Holmes

and Parham ( 1985); Koller and Orr ( 1985); N’Guyen et al. ( 1985); Ways et al. ( 1985); Kottman et al. (1986); Szots et al. (1986); Cowan et al. (1987); Geraghty et al. (1987); Gussow et al. (1987); Koller et al. (1988); Mizuno et al. (1988); Zemmour et al. (1988).

d Reference: Tykocinski et al. (1984). ’ References: Satz et al. (1985); Ehrlich et al. (1987). f Reference: Ennis et al. (1988). g Reference: Guillemot et al. (1988).

these encodes the leader peptide; exons 2,3, and 4 encode al, a2, and a3, respectively; exon 5 encodes the transmembrane portion; and exons 6-8 encode the cytoplasmic tail. Nonclassical class I genes have a similar organization, and nonclassical class I proteins have the same three extracellular domains. However, in a number of non- classical genes, the transmembrane portion and cytoplasmic tail are shortened in com- parison with those of classical genes. In the mouse nonclassical genes Q4b and Qsb, the transmembrane portion contains only 3 1 codons; and only the first 2 1 of these can be aligned with other mouse class I sequences. Since exons 1 and 5-8 are not readily aligned in comparisons between distantly related genes and since the latter four exons may be short or absent in nonclassical genes, we used sequences from only exons 2-4 in constructing an evolutionary tree.

We collected from the literature 44 DNA sequences for class I MHC genes (26 classical, 15 nonclassical, and 3 supposedly pseudogenes) from six mammalian species, the majority being from the mouse (H-2 complex) and human (HL4 complex) (table 1) . In the mouse, complete maps of the class I MHC genes are available for the H-2 b

Page 4: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

562 Hughes and Nei

and H-2d haplotypes; there are 26 class I genes in the former haplotype and 33 in the latter (fig. 1; Klein and Figueroa 1986). The classical class I loci in the mouse are H-2K, -L, and -D. The D locus is present in the H-2d haplotype but not in H-2b or in most other known haplotypes. Alleles traditionally designated D in haplotypes other than H-2d are in fact allelic to the L locus of the H-2d haplotype (Duran et al. 1987; Watts et al. 1987); in the present paper we thus designate alleles at this locus as L alleles. There are a number of pseudogene or nonclassical loci in the vicinity of the K (KI ) and D loci. In the H-2d haplotype these include the loci designated K2,02, 03, and 04 (fig. 1); not all of these loci are present in other haplotypes. However, in all mouse haplotypes, the majority of nonclassical genes are located in the Qa (or Q) and TIa regions.

The Qa region includes eight loci in the H-2d haplotype and 10 in the H-2b haplotype, while the TIa region includes 18 genes and two gene fragments in the H-2d haplotype and 13 genes in the H-2b haplotype. Of the Qa genes, published sequences are available for four loci ( Q4, Q7, Q8, and Ql0); two allelic sequences are available for the Q7 locus (table 1). The Tla region is named for the serologically detectable thymus leukemia antigens, which were initially discovered in radiation- induced leukemias in the mouse (Chen et al. 1985 ) . Our sample includes three genes encoding such antigens; these are probably not allelic to one another but encoded by separate loci. Other genes in the Tla region (e.g., T2Aa, Mbl, and 37d) are only distantly related to those encoding the thymus leukemia antigens. On the basis of hybridization with labeled probes, Rogers ( 1985a) proposed three major families of class I MHC genes. ( 1) Family 1 includes the three classical loci, plus K2, 02, 03, 04, and Qa genes. (2) Family 2 includes the expressed Tla genes plus certain related pseudogenes. ( 3) Family 3 contains certain other genes from the Tla region, such as the T2Aa pseudogene sequenced by Widmark et al. ( 1988), which are not closely related to the expressed Tla genes. Other studies suggest that at least two other families of mouse class I genes exist in addition to the three identified by Rogers. From the Tla region, Singer et al. ( 1988) have sequenced a nonclassical gene which they call Mbl; they propose that this gene represents a fourth family. The previously sequenced gene H-2-37d, also from the TIa region (Lalanne et al. 1985)) cannot be placed in any of the other families on the basis of DNA sequence similarity ( Widmark et al. 1988).

In the Syrian hamster, it has been difficult to detect polymorphism at class I loci by serological methods (Streilein 1987). Therefore, it is difficult to distinguish between classical and nonclassical genes in the hamster, but the single sequenced gene Hm1.6 is believed to be nonclassical or a pseudogene ( McGuire et al. 1986 ) .

The human MHC contains some 20 class I genes, but most of these have not been mapped in detail (Trowsdale and Campbell 1988). In addition to the three classical loci HLA-A, -B, and -C, one nonclassical locus (E) has been mapped. Only a few human nonclassical genes have been sequenced (JTWIS, HLA-E, and HLA- 6.0)) and our sample also includes a putative pseudogene sequence (~12.4). The cDNA clone JTWIS represents an allele at the locus given the name HLA-E by Koller et al. (1988).

In the case of the rabbit, a single classical sequence and a putative pseudogene sequence are available (table 1). For the pig, there are two sequences from two separate classical loci and one nonclassical sequence. For the bovine, there are two classical sequences; it is uncertain whether these are alleles at the same locus or derived from different loci. The single classical chicken sequence is used as an outgroup in tree making.

Page 5: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

I I

\ ‘.--0 ,I 0 m

3

CL

I-- L w

I: w

d

4 0 u

Page 6: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

564 Hughes and Nei

Our sample contains three genes which have been reported as pseudogenes. Dif- ferent authors have used different criteria in defining pseudogenes. In the case of the mouse gene 7’2Aa, there is abundant evidence that this is a pseudogene, including both deletion and insertion of single bases in the coding region causing frameshifts, four stop codons in exon 3, and another stop codon in exon 4 ( Widmark et al. 1988). Malissen et al. ( 1982) suggest that the human gene ~12.4 is a pseudogene because in amino acid position 164 the cysteine involved in a disulfide bridge has been replaced by phenylalanine. Furthermore, ~12.4 apparently lacks transcription initiation signals. The rabbit gene Rl4 is considered to be a pseudogene because there is no evidence of translation, although mRNA is produced. Also, one possible initiation codon might lead to production of a peptide only 16 amino acids long (Tykocinski et al. 1984).

Results Relationships among Class I Genes

To construct a phylogenetic tree for the 45 genes in table 1, we computed the number of nucleotide substitutions per site (d) in exons 2-4 for all pairs of genes by using Jukes and Cantor’s ( 1969) method. From the d values thus obtained, we con- structed a tree by using Saitou and Nei’s ( 1987) neighbor-joining method (fig. 2). Essentially the same tree was obtained by using the number of synonymous substi- tutions per site ( ds; Nei and Gojobori 1986) or by using d values obtained from exons 2-4, excluding the 57 codons for the ARS, which are apparently subject to overdom- inant selection (Hughes and Nei 1988). Figure 2 shows that in all species a nonclassical gene is more closely related to classical genes of its own species than it is to any other genes of any other species. Mouse nonclassical genes such as Qa and Tla are more closely related to mouse nonclassical or classical genes than they are to any other genes, including human and pig nonclassical genes. Thus the tree supports the hy- pothesis that nonclassical genes have arisen independently in separate evolutionary lineages.

In the mouse, the tree reveals two major clusters, one containing the classical and Qa genes and 3 7d and the other containing Tla region genes. The T2A a pseudogene and Mb1 cluster with the Tla genes, although in both cases the branches are very long, indicating a high degree of divergence at the nucleotide level. Thus, the tree does not support the hypothesis that T2A a (Rogers 1985a) or Mb1 forms a separate family of mouse class I genes, apart from Tl a. The branch lengths of nonclassical genes and pseudogenes are in general longer in the mouse than in nonrodent mammals. This is consistent with other observations that the rate of nucleotide substitution is higher in rodents than in certain other mammalian orders such as primates (Kohne 1970; Wu and Li 1985).

The human classical class I genes differ from the mouse classical genes in that each of the three classical loci (A, B, and C) forms a separate cluster. In the mouse, the K, L, and D loci do not form separate clusters (fig. 2). That is, the branch points are very close to each other, so that no significant clusters are found among mouse classical genes. It is possible that this lack of differentiation among loci in the mouse is a result of past interlocus genetic exchange in this species (Mellor et al. 1983). On the other hand, the results suggest that interlocus genetic exchange must occur rarely, if at all, in humans (also see Lawlor et al. 1988).

Our tree supports the hypothesis that the HLA-B and -C loci are more closely related to each other than either is to A (Klein and Figueroa 1986). There is evidence that the present-day A locus is a hybrid, with exons 4 and 5 having been donated by

Page 7: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

Evolution of Class I MHC Genes 565

d 0.2 0.3

-

- -.- BL 3-7 (BI

BL 3-6 (B)

I PD14 fPj

PDl IP) PDG IP’)

- 6.0 (t-i’)

E

cw2 0-i) Cwl II-i\ cw3 (Ii1

l%z?fl;” 813 (t-i1

827 (Hl 012.4 (H’I

Aw24 (Hl All (HI A3 (HI

Aw66 &I\ A2 (HI

r- R14 iR.1

Hm1.6 (Ha)

/+ Mb1 (M T2A“ (M’) (0.4)

37d (h/l’)

‘)

FIG. 2.-Phylogenetic tree constructed by the neighbor-joining method for mammalian classical and nonclassical class I MHC genes, using the chicken class I gene B-F12 as an outgroup. The species abbreviations, in parentheses, are as follows: M = mouse gene; Ha = hamster gene; R = rabbit gene; H = human gene; P = pig gene; B = bovine gene; C = chicken gene. * = Nonclassical gene or pseudogene. The branch for Mb2 is unusually long (d = 0.40) and is shortened in the figure.

an E-like gene some 20 Mya ( Hughes and Nei 1989a). In this tree, however, A clusters with other classical genes rather than with E and JTWIS, presumably because the number of nucleotides in exons 2 and 3, which are more closely related to B and C than to E, is greater than the number of nucleotides in exon 4, which is more closely related to E than to B and C.

In all cases, known pseudogenes are no more greatly diverged from classical genes than are nonclassical genes. The mouse pseudogene T2A” is less divergent than Mb1

Page 8: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

566 Hughes and Nei

and shows approximately the same degree of divergence from the classical genes as do the expressed Tla genes. The human pseudogene ~12.4 clusters with the A-locus alleles, having evidently evolved from that locus fairly recently. In the rabbit, the classical gene R9 and the pseudogene R14 are approximately as distant from each other as are alleles at different classical loci in other species.

5 ’ Regulatory Elements in Mouse Classical and Nonclassical Genes

In a study of the gene regulation of class I MHC genes in the mouse, Korber et al. ( 1988 ) identified three short nucleotide sequences located 5 ’ to the transcription initiation site of the H-2Dd gene which are believed to be important in regulating gene expression. Two are sites which bind the mammalian transcription factor AP- 1 (Lee et al. 1987); these are designated bind A/AP- 1 and bind B/AP- 1. The former is located at nucleotide positions -205 to - 188 (relative to the initiation of transcription), and the latter is located at - 108 to -90. A site denoted the interferon response sequence (IRS), which plays a role in response to interferons, is located between - 157 and - 140. Interferons are powerful inducers of expression of class I MHC molecules and of the associated B2-microglobulin molecule (Korber et al. 1987). These three sites are also known to bind murine nuclear factors (Korber et al. 1988). Just 5’ to the IRS site is a 23-bp sequence identified as an enhancer of transcription of mouse classical class I genes (Kimura et al. 1986). It has been shown that full expression of classical class I genes in response to interferons is dependent on the presence of this sequence, which is called Enhancer A. The homologous sequence from the nonclassical QlO gene is approximately one-eighth to one-quarter less responsive to interferons than are Enhancer A sequences from classical genes (Israel et al. 1986).

Sequences homologous to these four regulatory elements are readily identified in all mouse classical class I genes and nonclassical Qa region genes for which the sequence of the 5 ‘-untranslated ( 5 ’ UT) region is available. We compared these se- quences within and between mouse classical class I genes and Qa genes in order to obtain evidence regarding the evolution of nonclassical class I genes.

Figure 3 compares sequences of four 5 ’ regulatory elements (bind A/ AP- 1, En- hancer A, IRS, and bind B/ AP- 1) for three K-locus alleles, three L-locus alleles, and four Qa-region genes. These sequences are virtually identical in all classical genes illustrated (and in others not shown, such as H-2Dd). The Qa-region genes show considerable similarity with classical genes in these regions, suggesting that the Qa genes must have evolved fairly recently from genes having the universal expression characteristic of classical genes. In spite of their evident homology with classical genes in these sequences, each of the Qa genes shows nucleotide substitutions which are found in none of the classical genes. The Q4 gene has a large deletion in the Enhancer A sequence, while others have numerous nucleotide substitutions in this and other regulatory elements (fig. 3).

It is not possible, by sequence comparison alone, to pinpoint which of the sequence changes in this region are responsible for the reduced expression of Qa genes. However, comparison of the rate of nucleotide substitution in Qa genes with that in classical genes can provide evidence regarding the role of natural selection in reducing expression of nonclassical genes. In table 2, we compare rates of nucleotide substitution in the following gene regions: ( 1) the 5 ’ regulatory elements bind A/ AP- 1, Enhancer A, IRS, and bind B/ AP- 1; ( 2) the remainder of the 5 ’ UT region, including a total of 160 nucleotides to the 5’ side of the CAAT box; (3) introns 1, 2, and 4 (excluding the very long third intron, which contains numerous repeats and cannot be aligned in

Page 9: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

Allele Bind A/AP-1 Enhancer A IRS Bind B/AP-1

K(b) %gwmaaF~a~~~~-- --cagggctggggattccccatlct*ccacagtttcacttct----gacactgttgacgcgcagt K(k) g K(w29.7) g L(b) g t a tg L(k) t a a tts UP) lit * t* *a tg 44 a ****** t C a tta t g Q7(b) a t * C a t t ac Q7(d) a t t t cca * t t ac QWW c g a a c C

FIG. 3.-DNA sequence of 5’ regulatory elements (Bind A/AP-1, Enhancer A, ISR, Bind B/ AP-I ) of six mouse class I classical alleles and four Qa region genes. The full sequence is given only for the K(b) (i.e., J?‘) allele; for other alleles, only nucleotides differing from K(b) are given. * = Gap in sequence relative to K(b).

Page 10: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

568 Hughes and Nei

Table 2 Mean k SE Nucleotide Substitutions/100 Sites in Comparisons of Different Gene Regions in Mouse Classical I Genes (K and L loci) and Four Nonclassical Genes from the Qa Region

Gene Region No. of Classical vs.

Nucleotides Classical Qa vs. Qa Qa vs.

Classical

5’ Regulatory elements. ..... 5’ UT (remainder) ......... Introns 1, 2,and4 ......... Exons2-4 ................

65 5.2 + 2.1 18.7 + 4.2** 14.8 +- 3.5* 160 12.2 Ik 2.2 15.8 + 2.5 19.8 + 2.3* 383 8.0 + 1.0 10.8 + 1.3 13.9 z!X 1.4*** 822 7.8 + 0.6 9.2 + 0.8 10.9 + 0.7***

NOTE.-Number of sequences used: three alleles from K (Kl) locus, three alleles from L locus, two alleles from Q7 locus, one sequence from Q4, Q8, and QlCJ loci. Superscript asterisks denote that the mean for Qu comparisons is significantly different from the mean for corresponding comparison among classical genes, at the following levels:

* P < 0.05. ** P< 0.01. *** Pr 0.001.

interlocus comparisons); (4) exons 2-4. In the comparison among classical genes, the 5 ’ regulatory elements show a low rate of nucleotide substitution in comparison with the other gene regions examined. In the case of the Qa genes, however, there is no evidence of a reduced rate of nucleotide substitution in the 5’ regulatory sequences. The rate of nucleotide substitution in the 5’ regulatory elements is significantly higher in the comparison among Qa genes and between Qa and classical genes than in the comparison among classical genes (table 2). Thus, while the 5 ’ regulatory elements are highly conserved in the classical genes, numerous substitutions have occurred in this region in Qa genes, suggesting that selection maintaining the conserved 5’ regulatory elements has been relaxed in nonclassical genes.

Synonymous and Nonsynonymous Nucleotide Substitutions in the ARS of Nonclassical Genes

Hughes and Nei ( 1988) previously showed that in the ARS of classical class I genes the rate of nonsynonymous nucleotide substitution is two to three times as high as that of synonymous substitution, apparently because of positive Darwinian (over- dominant) selection favoring amino acid substitution. To examine whether similar positive selection is still operating in the ARS of nonclassical genes, we computed the rate of synonymous substitution per synonymous site (ds) and the rate of substitution per nonsynonymous site ( dN) in pairwise comparisons among nonclassical genes by using Nei and Gojobori’s ( 1986) method I. In tables 3 and 4 we present means of ds and dN for various sets of comparisons; the SEs of these means were computed by the method of Nei and Jin ( 1989).

In the case of nonclassical loci, the pattern of nucleotide substitution differs in a number of ways from that seen in classical loci. In the case of the Qa genes of the mouse, the pattern of nucleotide substitution seen in comparisons between classical genes and Qa genes and in comparisons among Qa genes from the four different loci in our sample is reminiscent of that seen in comparisons among classical genes (table 3). In all these cases, dN exceeds ds by a ratio of - 1.5: 1 in the ARS. However, when we look at the two allelic sequences available for the Q7 locus, we see no evidence of positive selection on the ARS, though the number of nucleotide substitutions observed is small.

In the case of the Qa loci, which have evolved from classical loci relatively recently

Page 11: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

Evolution of Class I MHC Genes 569

Table 3 Mean f. SE Nucleotide Substitutions/100 Synonymous Sites and /lOO Nonsynonymous Sites, in Comparisons among Mouse Classical and Nonclassical Class I MHC Loci

COMPARISONS (N)

REMAINING CODONS IN

ARS EXONS 2 AND 3 EXON 4 (N = 54-57) (N = 122-125) (N = 92)

ds & ds dhr ds &

Classical: vs. Classical (36) vs. Qa (45) vs. Tla (27) vs. T2A”(9) _. vs. Mb1 (9) vs. 37d (9).

Q7’: vs. Q7d(l)

Qa: vs. @z(lO) vs. Tla(l5) vs. T2A” (5) vs. Mb1 (5) vs. 37d (5)

Tla: vs. Tla (3) vs. T2A” (3) vs. Mb1 (3) vs. 37d (3).

T2A “: vs.Mbl(l). vs. 37d(l).

Mbl: vs. 37d(l).

13.8 f 3.2 21.4 f 2.4 8.3 f 1.7 6.5 f 0.8 1.9 f 1.0 4.9 + 0.9* 15.1 f 3.3 24.0 f 2.5 14.1 + 2.5 8.8 + 1.0* 6.7 + 2.3 5.5 + 0.9 60.4 f 15.0 38.2 ? 5.5 59.1 I!z 10.1 23.9 f 2.9*** 16.1 + 4.7 7.5 + 1.5 41.8 + 11.5 36.2 f 5.2 35.4 + 7.0 15.9 f 2.3** 16.1 + 4.7 7.5 + 1.5 98.0 f 26.9 97.8 f 14.2 132.1 f 28.1 41.6 + 4.6** 51.7 + 11.7 25.0 f 3.7* 68.9 f 18.1 44.9 f 6.3 38.3 + 7.4 13.9 f 2.1 9.3 f 3.7 3.7 + 1.1

2.6 * 2.6 0.0 f 0.0 3.5 IL 2.0 0.0 + 0.0

12.5 f 4.2 18.0 f 2.7 14.2 + 2.6 6.5 + l.l* 61.3 f 15.5 39.0 * 5.9 69.1 f 11.6 22.5 + 2.8*** 40.0 f 11.6 37.6 + 5.7 38.2 f 7.1 15.3 + 2.3** 98.0 f 27.6 88.8 + 13.6 121.4 + 24.2 39.7 f 4.5*** 77.6 f 20.8 46.4 f 6.9 38.2 2 7.1 12.4 f 2.0***

8.6 + 3.9 0.5 f 0.5* 11.0 f 3.1 4.4 +- 1.0* 61.6 f 17.2 46.4 + 7.8 71.1 f 13.2 27.7 k 3.5**

102.6 f 29.7 89.9 ?I 14.5 211.2 + 78.6 46.1 f 5.1* 149.7 +- 53.0 76.1 f 14.2 72.9 + 13.5 28.8 + 3.6**

92.5 + 29.8 96.7 f 16.1 103.4 f 21.1 46.8 f 5.3** 98.1 + 46.2 46.2 f 7.8 58.0 + 11.2 23.1 + 3.3**

74.1 & 22.3 87.2 f 13.9 117.8 + 25.0 42.6 f 4.9**

1.5 + 1.5 1.0 + 0.7

8.4 + 2.5 3.4 * 0.9 20.0 + 5.0 7.3 f 1.6* 23.9 f 6.2 17.7 + 3.1 62.0 + 13.2 25.3 f 3.9** 13.5 * 4.1 4.3 + 1.2*

7.3 + 2.8 3.6 f 1.1 29.7 + 7.4 15.1 + 2.8 66.8 + 14.5 24.5 + 3.8** 20.4 f 5.4 6.2 + 1.6*

71.9 f 16.5 28.5 + 4.3* 29.8 ? 7.8 16.4 f 3.1

58.7 f 13.6 24.4 + 3.9*

NOTE.-N = number of codons compared. In ARS, N = 54 for T2A”, 56 for Mbl, and 57 for other genes; in the remainder of exons 2 and 3, N = 122 for Tla”, 123 for T2Ab, 124 for the classical allele Ld, and 125 for all other genes. Comparisons among classical genes are from Hughes and Nei ( 1988). Superscript asterisks denote that the difference between ds and dN is significant at the following levels:

* P =s 0.05. ** P< 0.01. *** P < 0.001.

(fig. 2)) values of dN in excess of ds in the ARS can still be seen in the comparison among these genes and in the comparison between them and classical genes. However, when we consider nonclassical genes and pseudogenes which-are more distantly related to classical genes, dN and ds tend to be approximately the same. In comparisons between mouse classical genes and Tla, T2A ‘, Mbl, and 37d, mean ds exceeds mean dN in the ARS. The same pattern is seen in comparisons among Tla, T2A ‘, Mb1 , and 3 7d (table 3). Nonetheless, even in comparisons involving these very distantly related genes, dN is higher in the ARS than in other gene regions, suggesting that these genes also may have arisen from classical genes which were once subject to positive selection. Again there is no evidence of positive selection at the present time; in the comparison among the closely related Tla genes, ds actually is significantly higher than dN in the ARS (table 3).

In the case of human genes, nonclassical genes and pseudogenes can again be divided into those which have evolved recently from classical genes and still show evidence of past positive selection and those which evolved in the more distant past

Page 12: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

570 Hughes and Nei

Table 4 Mean + SE Nucleotide Substitutions/100 Synonymous Sites and /lOO Nonsyuonymous Sites, in Comparisons among Human, Swine, and Rabbit MHC Class I Classical, Nonclassical, and Pseudogene Loci

COMPARISONS (N)

REMAINING CODONS IN ARS EXONS 2 AND 3 EXON 4

(N = 57) (N= 125) (N = 92)

ds dry ds & ds &

Human: Classical:

vs. Classical (66). vs. E (24) vs.6.0(12). . vs.p12.4(12) .

E: vs. E(1) . . vs. 6.0 (2) . vs. ~12.4 (2) . .

6.0 vs.p12.4(1) . .

Swine: Classical:

vs. Classical (1). vs. PD6 (2) .

Rabbit: Classical vs. pRZ4 (1)

6.8 + 2.3 20.8 f 2.3*** 11.6 f 2.1 5.2 f 0.8** 22.1 r 4.4 2.4 rt 0.7*** 29.1 + 9.0 38.6 f 5.5 14.0 + 3.2 9.8 f 1.7 19.6 ?z 5.4 4.0 f l.l* 16.6 + 6.2 27.0 f 3.7 14.5 f 3.4 6.6 ?I 1.2* 19.2 Z!I 5.4 5.4 + 1.5*

8.0 + 2.2 26.2 & 3.7*** 16.1 f 3.7 6.8 If: 1.3* 18.9 + 5.1 1.9 f 0.6***

2.6 f. 0.0 0.0 f 0.0 39.2 f 12.3 28.8 + 5.5 19.0 + 7.5 47.1 f 7.8;

18.5 + 7.5 35.9 f 6.4

0.0 f 0.0 0.7 + 0.7 0.0 f 0.0 0.0 + 0.0 16.3 + 4.6 10.0 + 2.0 8.8 + 3.8 6.7 f 1.9 19.3 + 5.1 11.6 f 2.1 11.4 * 4.4 2.9 f 1.2

15.6 + 4.5 9.4 + 1.9 10.6 f 4.2 4.7 f 1.5

17.3 z!I 7.0 25.1 + 5.0 4.4 f 2.2 6.6 + 1.6 6.4 + 3.3 1.4 + 0.8 132.2 f 41.5 68.9 z!z 10.1 46.7 + 9.0 28.4 Ifr 3.6 27.1 + 7.2 8.1 -e 2.0*

36.8 + 11.6 27.6 f 5.4 9.6 + 3.4 9.3 + 1.9 4.5 -t 2.6 1.9 + 1.0

NOTE.-“’ = number of codons. Comparisons among classical genes are from Hughes and Nei (1988). Superscript asterisks denote that the difference between ds and dN is significant at the following levels:

* P s 0.05. ** PS 0.01. *** P G 0.001.

and show much weaker traces of past positive selection. The pseudogene p12.4, which is very closely related to the A locus, shows evidence of recent positive selection in the form of enhanced dN in comparison with classical genes (table 4). In the case of the more distantly related E and 6.0 genes, dN and ds do not differ significantly; the same is true of the available swine and rabbit genes (table 4). Again, there is no evidence that nonclassical genes are currently subject to positive Darwinian selection. In the ARS and other regions of the two HLA-E alleles, ds and dN do not differ significantly, and the number of rate of substitution is very low in comparison with that for alleles at classical loci (table 4).

Overall, the pattern of nucleotide substitution in nonclassical genes and pseu- dogenes is very similar. The factor that determines the dN/ds ratio in the ARS seems to be the time since divergence of a nonclassical gene or pseudogene from classical genes rather than whether it encodes a protein that is translated.

In comparisons among nonclassical genes and pseudogenes, ds values tend to be considerably higher in the ARS and in the remainder of exons 2 and 3 than they are in exon 4 in all species (tables 3, 4). This trend is associated with a trend in G+C content, particularly at the third codon position, which is seen in nonclassical class I genes and in pseudogenes for most of the species for which data are available. In almost every case, as genes become nonclassical (or pseudogenes), the third-position G+C contents in the ARS and in the remainder of exons 2 and 3 tends to be reduced in comparison with that in classical genes; the reduction in G+C content is greatest

Page 13: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

Evolution of Class I MHC Genes 57 1

Table 5 G+C Content (in %) at All Positions and at Third-Codon Positions (3d) in MHC Class I Classical, Nonclassical, and Pseudogene Loci

REMAINING CODONS IN

ARS EXONS 2 AND 3 EXON 4

SEQUENCES (N) 3d All 3d All 3d All

Mouse Classical (9). Qu (5) . . . Tla (3) . T2A” (1) Mb2 (1). 37d(l).

Hamster: Hml.6 (1) .

Human: Classical ( 12) E(2) . . . . . . 6.0(l) . . ~12.4 (1) .

Pig: Classical (2). PD6 (1) . .

Rabbit: Classical ( 1) R14 (1) .

Bovine: Classical (2) .

79.7 59.5 87.1 65.6 69.7 58.5 74.7 57.0 84.6 64.8 68.4 58.5 66.1 51.1 66.2 54.6 63.4 56.8 72.2 50.6 73.2 59.6 62.2 57.4 44.6 31.0 48.8 56.0 58.7 50.0 68.4 53.2 81.6 63.7 67.5 58.3

63.2 61.4 82.4 62.1 68.5 58.3

82.3 60.4 92.6 69.0 72.3 61.4 70.2 55.0 90.4 66.7 75.0 63.0 73.7 56.1 90.4 68.5 78.3 63.0 78.9 58.5 90.4 67.5 78.3 63.8 .

78.9 59.1 89.6 67.5 82.6 64.5 56.1 49.7 76.8 61.6 77.2 60.9

91.2 62.0 94.4 69.1 83.7 68.5 87.7 63.2 96.0 71.5 87.0 67.8

78.1 57.9 91.6 68.7 73.4 61.6

in genes (such as Mb1 in the mouse or PD6 in the pig) that are highly divergent from classical genes (table 5). In exon 4, there may be some reduction of G+C content relative to that in classical genes, but the tendency is not so marked as in exons 2 and 3. Biased codon usage is known to be associated with low & (Wolfe et al. 1989)) but it is so far unknown what factors cause variations in codon usage between genes and between gene regions. It is tempting to suggest that selection is responsible for main- taining high G+C content of exons 2 and 3 in classical genes and that this selection is relaxed once the genes become nonclassical, but it is uncertain what the basis of such selection might be.

In classical class I molecules, the ARS contains many highly polymorphic amino acid residues, but there are a few which are highly conserved. Purifying selection evidently eliminates any mutation at these conserved positions. In nonfunctional pseudogenes, such conservative selection is presumably absent, and replacements may occur in these positions. Similarly, if nonclassical genes are either nonfunctional or have a function different from that of classical genes, replacements may occur in these positions of nonclassical genes. Comparison of all available class I classical sequences (references in table 1; also see Mayer et al. 1988; Parham et al. 1988; Radojcic et al. 1989; Yuhki et al. 1989) revealed nine amino acid residues in the ARS which are conserved in all known mammalian and avian classical alleles (positions 7, 26, 59, 64, 72, 146, 154, 159, and 165). There is an additional residue (position 84) which

Page 14: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

572 Hughes and Nei

Table 6 Numbers of Amino Acid Replacements in the ARS of Nonclassical Class I Genes and Pseudogenes at Amino Acid Positions Which are 100% Conserved in Classical Class I Genes

SITES CONSERVED

SPECIES AND GENE All Vertebrates Within

(N= 9) Speciesb

Mouse:

;;b : ; : : Q7d.... Qsb.. QIOb.. . Tla" . . . Tlab . . . Tla' . . Mb1 . . . 37b . . . . T2A’. . .

Human: E...... JTWlS . 6.0 . . . . ~12.4. . .

Rabbit: pR14. . .

Pig: PD6 . . .

. . . . .

. . . . . . .

. . . . . . . . . . . . . .

0 1 1

0 (1)

5 (1) 2 2

l(l)

2 2 3 3 3 8 8 8

11” 6 5d

3

* Numbers in parentheses refer to replacements at an additional site which is conserved in all mammalian classical sequences.

b N = 14 in mouse; N = 12 in human. ’ Also, one conserved amino acid is deleted. d Includes one stop codon.

is conserved in all known mammalian classical alleles. Also, there are 14 residues which are conserved in all known mouse classical alleles and there are a nonoverlapping set of 12 additional residues which are conserved in all human classical alleles. Both nonclassical genes and pseudogenes typically show amino acid replacements at these conserved ARS positions (table 6). Many of these are replacements expected to have a major effect on protein structure; examples of such nonconservative replacements at the nine positions conserved in all classical molecules include Lys for Val at position 165 in Tla” and Arg for Gln at position 72 in 37d. Except for the fact that the mouse pseudogene T2A a has a stop codon in one of these positions, there is, with respect to substitution in these positions, little difference between nonclassical genes and known pseudogenes. In fact, the number of substitutions in these positions seems to be roughly a function of the time since a gene diverged from classical genes. Thus both the mouse Qa genes and the human pseudogene ~12.4, both of which are closely related to classical genes, show relatively few replacements in comparison with more diverged genes such as Mb1 in the mouse.

Page 15: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

Evolution of Class I MHC Genes 573

Time

Time

Time

Time FIG. 4.-Schematic diagram illustrating relationships among classical (represented by open circles)

and nonclassical (solid circles) class I loci in a hypothetical MHC. The figure shows the composition of the MHC at four times, which are separated by long periods ( -20 Myr).

Discussion

Our results support the hypothesis that nonclassical class I MHC loci have evolved from classical loci in separate mammalian orders. Further, they suggest that, unlike classical loci, nonclassical loci are not subject to positive Darwinian selection acting on the ARS and that all nonclassical genes have amino acid replacements in the ARS positions which are conserved in classical genes. In these respects, nonclassical genes are indistinguishable from known or putative pseudogenes. Although experimental evidence is needed before a final decision can be reached regarding functionality of nonclassical class I gene products, the data presented here favor the hypothesis that nonclassical genes are largely nonfunctional and are essentially pseudogenes (Klein and Figueroa 1986; Howard 1987). It is possible, however, that certain nonclassical genes have acquired new functions different from that of classical genes. The CD1 antigens represent a case in which duplicated class I genes have diverged greatly and assumed a new function. These genes do not map to the MHC; but they show evidence of homology to class I genes, suggesting that they arose by duplication from class I genes and have since assumed a new function as differentiation antigens (Martin et al. 1987; Bradbury et al. 1988). The FcRn gene in the rat, which functions in uptake of maternal immunoglobulin G by neonates, seems to have evolved by a similar process (Simister and Mostov 1989). At the nucleotide level this gene is much more divergent from the class I MHC than are the CD1 antigen genes.

If nonclassical loci evolve from classical loci, there are a number of ways this might happen (fig. 4). Initially, a classical locus might duplicate, and one copy might subsequently become nonclassical. The loss of classical function by one copy of a duplicated locus might occur shortly after duplication or after a long period during which both daughter loci functioned as classical loci. Once having become nonclassical, a locus might again be duplicated one or more times, giving rise to a family of related nonclassical loci. Over long periods of time, if nonclassical loci are largely nonfunc- tional, they would gradually be lost by accumulation of destructive mutations or un- equal crossing-over. This would explain why distantly related mammalian species do

related species (such as human and chim- not share nonclassical loci, while closely panzee) may share nonclassical loci.

Page 16: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

574 Hughes and Nei

Nonclassical class I molecules typically lack three traits characteristic of classical molecules: universal expression, high polymorphism, and antigen-presenting function. In the evolution of a nonclassical locus, all three of these traits are not likely to be lost simultaneously. It is therefore expected that nonclassical loci will occasionally retain one or more of the characteristics of classical loci. For example, the RT1.C locus in the rat appears to be a formerly classical locus which has lost universal expres- sion prior to losing the ability to present antigens. There is evidence of polymorphism at this locus, yet some alleles are known to have a very limited tissue distribution (Butcher and Howard 1986). (Polymorphism may also arise owing to neutral mu- tations without positive Darwinian selection.) At least one allele, RT1.Cd, retains the ability to present antigen to T-cells, even though it is poorly expressed (Stephenson et al. 1985; Howard 1987).

The number of classical class I genes is one to three in all mammalian species studied so far. Humans and chimpanzees each have three classical loci. In the mouse, most known haplotypes have only two classical loci, whereas some (such as H-2d) have three. In the miniature swine there are two classical loci (Singer et al. 1987), whereas in the rat (Howard 1987) and in some rabbit strains (Tykocinski et al. 1984) there is apparently only one fully classical class I locus. The fact that the numbers of classical loci are limited may appear somewhat paradoxical in light of the usual ar- guments regarding the mechanism of overdominant selection on the MHC. It has been argued that, at a given classical class I locus, a heterozygote will have an advantage over either homozygote because it produces two different antigens that can bind and present to T-cells different classes of foreign peptides ( Doherty and Zinkernagel 1975; Hughes and Nei 1988). If so, it would seem that an individual expressing a large number of MHC loci would have an advantage over one expressing only a few. In practice, however, there is natural selection that acts to limit the number of expressed MHC loci, because, for each self-MHC antigen that exists, a certain proportion of the T-cell repertoire must be eliminated as self-reactive. If the proportion of the T-cell repertoire that is eliminated becomes very large, this would impose a prohibitive burden on the organism (Howard 1987; Hughes and Nei 1989b).

The high rates of nucleotide substitution in regulatory elements of Qa-region genes (table 2) are consistent with the hypothesis that selection limits the number of classical class I genes. Note that under this hypothesis, if a classical locus is duplicated, the chromosome bearing the duplication would initially be under no disadvantage, because the duplicated loci would be identical. Only if mutation produced one or more amino acid replacements in positions responsible for T-cell recognition would there be a disadvantage. In the meantime, mutations would also occur in regulatory elements reducing expression of one copy of the duplicated loci. Genes in which such mutations had occurred would then escape purifying selection, because the genes are no longer expressed. This explains the retention of a large number of nonclassical genes in the genome.

One difference between class I and class II MHC genes is that in the latter each locus in a given mammalian species has an orthologue (homologue) in other species, an orthologue to which it is more closely related than it is to other loci of its own species. Thus, DQ in humans is orthologous to A in the mouse. That is, DQ sequences are more similar to A sequences than they are to human DR sequences (Klein and Figueroa 1986; Hughes and Nei 1989b). Except with closely related species such as the human and chimpanzee (Mayer et al. 1988)) this is not true of class I genes. There are no orthologous relationships, for example, between HLA-A, -B, and -C in humans

Page 17: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

Evolution of Class I MHC Genes 575

and H-2K, -D, and -L in the mouse. Furthermore, in the mouse it has been observed that alleles at a particular classical class I locus are not necessarily more closely related to each other than to alleles at other loci (Mellor 1987 ) . Such observations have been taken as evidence that gene conversion acts to homogenize classical class I loci within a species ( Mellor 1987). Relatively rare events of gene conversion cannot be ruled out, but there is by now substantial evidence against its widespread occurrence (Hughes and Nei 1988). In humans, the classical A, B, and Cloci form separate clusters, clearly suggesting that no conversion occurs among them (fig. 1; Lawlor et al. 1988; Hughes and Nei 1989a). Even if some homogenization occurs among classical class I genes in the mouse, it certainly does not involve such highly divergent nonclassical genes as the Tla genes and Mb1 .

On the view presented here, the composition of the class I MHC is not stable over evolutionary time. If a classical locus duplicates, both daughter loci may, as a result of a chance mutation, continue as classical loci, each binding a somewhat different class of foreign peptides. In some instances, however, one daughter locus may become nonclassical as a result of a mutation reducing its expression. Retention of more than approximately three classical class I loci seems to be disadvantageous, and one to three loci are maintained by a balance between the gain by duplication and the loss due to mutation or unequal crossing-over. It is as yet uncertain why the class I MHC should have this dynamic property, whereas the class II MHC is much more stable in composition over time. In the class II MHC, a- and P-chain genes are arranged in alternating fashion on the chromosome, and the mature glycoprotein is a heterodimer consisting of noncovalently linked a and p chains. Perhaps the need for synchronous transcription of linked a- and P-chain genes gives rise to selection against duplication of class II loci in most cases, whereas this constraint is absent in the case of class I.

Acknowledgments

This study was supported by research grants from the National Institutes of Health and the National Science Foundation.

LITERATURE CITED

ARNOLD, B., H. C. BURGERT, A. L. ARCHIBALD, and S. KVIST. 1984. Complete sequence of the murine H-2Kk gene: comparison of three H-2K locus alleles. Nucleic Acids Res. 12: 9473-9487.

BJORKMAN, P. J., M. A. SAPER, B. SAMRAOUI, W. S. BENNETT, J. L. STROMINGER, and D. C. WILEY. 1987a. The foreign antigen binding site and T cell recognition regions of class I histocompatibility antigens. Nature 329:5 12-5 18.

- 1987b. Structure of the human class I histocompatibility antigen, HLA-A2. Nature . 329:506-5 12.

BRADBURY, A., K. T. BELT, T. M. NERI, C. MILSTEIN, and F. CALABI. 1988. Mouse CD 1 is distinct from and coexists with TL in the same thymus. EMBO J. 7:3081-3086.

BUTCHER, G. W., and J. C. HOWARD. 1986. The MHC of the laboratory rat, Rattus nowegicus. Pp. 10 1. I- 10 1.18 in L. HERZENBERG and D. M. WEIR, eds., Handbook of experimental immunology. 4th ed. Blackwell, Oxford.

CHEN, Y.-T., Y. OBATA, E. STOCKERT, and L. J. OLD. 1985. Thymus-leukemia (TL) antigens of the mouse: analysis of TL mRNA and TL cDNA from TL+ and TL- strains. J. Exp. Med. 162:1134-l 148.

COWAN, E. P., M. L. JELACHICH, W. E. BIDDISON, and J. E. COLIGAN. 1987. DNA sequence of HLA-A 11: remarkable homology with HLA-A3 allows identification of residues involved in epitopes recognized by antibodies and T cells. Immunogenetics 25:24 l-250.

Page 18: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

576 Hughes and Nei

DEVLIN, J. J., E. H. WEISS, M. PAULSON, and R. A. FLAVELL. 1985. Duplicated gene pairs and alleles of class I genes in the Qa2 region of the murine major histocompatibility complex: a comparison. EMBO J. 12:3203-3207.

DOHERTY, P. C., and R. M. ZINKERNAGEL. 1975. Enhanced immunological surveillance in mice heterozygous at the H-2 gene complex. Nature 256:50-52.

DURAN, L. W., J. C. ZELLER, J. K. LUNDY, A. CHANG-MILLER, C. J. KRCO, C. S. DAVID, and L. R. PEASE. 1987. Genetic analysis of the H-2D region using a new intra-D-region recom- binant mouse strain. J. Immunol. 139:28 18-2824.

EHRLICH, R., R. LIFSHITZ, M. D. PESCOVITZ, S. RUDIKOFF, and D. S. SINGER. 1987. Tissue- specific expression and structure of a divergent member of a class I MHC gene family. J. Immunol. 139:593-602.

ENNIS, P. D., A. P. JACKSON, and P. PARHAM. 1988. Molecular cloning of bovine class I MHC cDNA. J. Immunol. 141:642-65 1.

FISHER, D. A., S. W. HUNT, and L. HOOD. 1985. The structure of a gene encoding a murine thymus leukemia antigen, and organization of Tla genes in the BALB/c mouse. J. Exp. Med. 162:528-545.

GERAGHTY, D. E., B. H. KOLLER, and H. T. ORR. 1987. A human major histocompatibility complex class I gene that encodes a protein with a shortened cytoplasmic segment. Proc. Natl. Acad. Sci. USA 84:9 145-9 149.

GUILLEMOT, F., A. BILLAULT, 0. POURQU& G. BIHAR, A.-M. CHAUSS~, R. ZOOROB, G. KREIBICH, and C. AUFFRAY. 1988. A molecular map of the chicken major histocompatibility complex: the class II p genes are closely linked to the class I genes and the nucleolar organizer. EMBO J. 9:2775-2785.

Gussow, D., R. S. REIN, I. MEIJER, W. DE HOOG, G. H. A. SEEMANN, F. M. HOCHSTENBACH, and H. L. PLOEGH. 1987. Isolation, expression, and the primary structure of HLA-Cwl and HLA-Cw2 genes: evolutionary aspects. Immunogenetics 25:3 13-322.

HOLMES, N., and P. PARHAM. 1985. Exon shuffling in vivo can generate novel HLA class I molecules. EMBO J. 4:2849-2854.

HOWARD, J. C. 1987. MHC organization in the rat: evolutionary considerations. Pp. 397-427 in G. KELSOE and D. H. SCHULZE, eds. Evolution and vertebrate immunity. University of Texas Press, Austin.

HUGHES, A. L., and M. NEI . 1988. Pattern of nucleotide substitution at major histocompatibility complex loci reveals overdominant selection. Nature 335: 167- 170.

-. 1989a. Ancient interlocus exon exchange in the history of the HLA-A locus. Genetics 122:68 l-686.

~ . 19896. Nucleotide substitution at major histocompatibility complex class II loci: evidence for overdominant selection. Proc. Natl. Acad. Sci. USA 86:958-962.

ISRAEL, A., A. KIMURA, A. FOURNIER, M. FELLOWS, and P. KOURLISKY. 1986. Interferon response sequence potentiates activity of an enhancer in the promoter region of a mouse H- 2 gene. Nature 322:743-746.

JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 2 I- 132 in H. N. MUNRO, ed. Mammalian protein metabolism. Academic Press, New York.

KIMURA, A., A. ISRAEL, 0. LE BAIL, and P. KOURLISKY. 1986. Detailed analysis of the mouse H-2Kb promoter: enhancer-like sequences and their role in the regulation of class I gene expression. Cell 44:26 l-272.

KLEIN, J. 1986. Natural history of the major histocompatibility complex. Wiley, New York. KLEIN, J., and F. FIGUEROA. 1986. Evolution of the major histocompatibility complex. CRC

Crit. Rev. Immunol. 6:295-386. KOHNE, D. E. 1970. Evolution of higher-organism DNA. Q. Rev. Biophys. 3:327-375. KOLLER, B. H., D. E. GERAGHTY, Y. SHIMIZU, R. DEMARS, and H. T. ORR. 1988. HLA-E: a

novel HLA class I gene expressed in resting T lymphocytes. J. Immunol. 141:897-904. KOLLER, B. H., and H. T. ORR. 1985. Cloning and complete sequence of an HLA-A2 gene:

analysis of the HLA-A alleles at the nucleotide level. J. Immunol. 134:2727-2733.

Page 19: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

Evolution of Class I MHC Genes 577

KORBER, B., L. HOOD, and I. STROYNOWSIU. 1987. Regulation of murine class I genes by interferons is controlled by regions located both 5’ and 3’ to the transcription initiation site. Proc. Natl. Acad. Sci. USA 84:3380-3384.

KORBER, B., N. MERMOD, L. HOOD, and I. STROYNOWSKI. 1988. Regulation of gene expression by interferons: control of H-2 promoter responses. Science 239: 1302-l 306.

KOTTMANN, A. H., G. H. A. SEEMANN, H. D. GUESSOW, and M. H. Roos. 1986. DNA sequence of the coding region of the HLkB44 gene. Immunogenetics 23:396-400.

KVIST, S., L. ROBERTS, and B. DOBBERSTEIN. 1983. Mouse histocompatibility genes: structure and organization of a Kd gene. EMBO J. 2:245-254.

LALANNE, J.-L., C. TRANSY, S. GUERIN, S. DARCHE, P. MEULIEN, and P. KOURLISKY. 1985. Expression of class I genes in the major histocompatibility complex: identification of eight distinct mRNAs in DBA/2 mouse liver. Cell 41:469-478.

LAWLOR, D. A., F. E. WARD, P. D. ENNIS, A. P. JACKSON, and P. PARHAM. 1988. HLA-A and -B polymorphisms predate the divergence of humans and chimpanzees. Nature 335:268- 271.

LEE, W., A. HASLINGER, M. KARIN, and R. TIJAN. 1987. Activation of transcription by two factors that bind promoter and enhancer sequences of the human metallothionein gene and SV40. Nature 325:368-372.

MCGUIRE, K. L., W. R. DUNCAN, and P. W. TUCKER. 1986. Structure of a class I gene from Syrian hamster. J. Immunol. 137:366-37 1.

MALISSEN, M., B. MALISSEN, and B. R. JORDAN. 1982. Exon /intron organization and complete nucleotide sequence of an HLA gene. Proc. Natl. Acad. Sci. USA 79:893-897.

MARTIN, L. H., F. CALABI, F.-A. LEFEBVRE, C. A. G. BILSLAND, and C. MILSTEIN. 1987. Structure and expression of the human thymocyte antigens CDla, CD lb, and Cdlc. Proc. Natl. Acad. Sci. USA 84:9 189-9 193.

MAYER, W. E., M. JONKER, D. KLEIN, P. IVANYI, G. VAN SEVENTER, and J. KLEIN. 1988. Nucleotide sequences of chimpanzee MHC class I alleles: evidence for trans-species mode of evolution. EMBO J. 7:2765-2774.

MELLOR, A. L. 1987. Molecular genetics of class I genes in the mammalian major histocom- patibility complex. Oxf. Surv. Eukaryotic Genet. 3:95- 140.

MELLOR, A. L., E. H. WEISS, M. KRESS, G. JAY, and R. A. FLAVELL. 1984. A nonpolymorphic class I gene in the murine major histocompatibility complex. Cell 36: 139- 144.

MELLOR, A. L., E. H. WEISS, K. RAMACHADRAN, and R. A. FLAVELL. 1983. A potential donor gene for the bml gene conversion event in the C57BL mouse. Nature 306:792-795.

MIZUNO, S., J. A. TRAPANI, B. H. KOLLER, B. DUPONT, and S. Y. YANG. 1988. Isolation and nucleotide sequence of a cDNA clone encoding a novel HLA class I gene. J. Immunol. 140: 4024-4030.

MOORE, K. W., B. TAYLOR SHER, H. SUN, K. A. EAKLE, and L. HOOD. 198 1. DNA sequence of a gene encoding a BALB/c mouse Ld transplantation antigen. Science 212:679-682.

MORITA, T., C. DELARBRE, M. KRESS, P. KOURLISKY, and G. GACHELIN. 1985. An H-2Kgene of the tw32 mutant at the T/t complex is a close parent of an H-2Kq gene. Immunogenetics 21:367-383.

NEI, M., and T. GOJOBORI. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:4 18-426.

NEI, M., and L. JIN . 1989. Variances of the average numbers of nucleotide substitutions within and between populations. Mol. Biol. Evol. 6:290-300.

N’GUYEN, C., R. SODOYER, J. TRUCY, T. STRACHAN, and B. R. JORDAN. 1985. The HLA- Aw24 gene: sequence, surroundings and comparison with the HLA-A2 and HLA-A3 gene. Immunogenetics 21:479-489.

OBATA, Y., Y.-T. CHEN, E. STOCKERT, and L. J. OLD. 1985. Structural analysis of TL genes of the mouse. Proc. Natl. Acad. Sci. USA 82:5475-5479.

PARHAM, P., C. E. LOMEN, D. A. LAWLOR, J. P. WAYS, N. HOLMES, H. L. COPPIN, R. D.

Page 20: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

578 Hughes and Nei

SALTER, A. M. WON, and P. D. ENNIS. 1988. Nature of polymorphism in HLA-A, -B, and -C molecules. Proc. Natl. Acad. Sci. USA 85:4005-4009.

RADOJCIC, A., K. S. STRANICK, J. LOCKER, H. W. KUNZ, and T. J. GILL III. 1988. Nucleotide sequence of a rat class I cDNA clone. Immunogenetics 29: 134- 137.

ROBINSON, P. J., D. BEREC, A. L. MELLOR, and E. H. WEISS. 1988. Sequence of the mouse e4 class I gene and characterization of the gene product. Immunogenetics 27:79-86.

ROGERS, J. H. 198%. Family organization of mouse H-2 class I genes. Immunogenetics 21: 343-353.

- 1985b. Mouse histocompatibility-related genes are not conserved in other mammals. EMBO J. 4:749-753.

SAITOU, N., and M. NEI . 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.

SATZ, M. L., L. C. WANG, D. S. SINGER, and S. RUDIKOFF. 1985. Structure and expression of two porcine genomic clones encoding class I MHC antigens. J. Immunol. 135:2167-2175.

SCHEPART, B. S., H. TAKAHASHI, K. M. COZARD, R. MURRAY, K. OZATO, E. APPELLA, and J. A. FRELINGER. 1986. The nucleotide sequence and comparative analysis of the H-2DP class I H-2 gene. J. Immunol. 136:3489-3495.

SIMISTER, N. E., and K. E. MOSTOV. 1989. An Fc receptor structurally related to MHC class I antigens. Nature 337: 184- 187.

SINGER, D. S., R. EHRLICH, L. SATZ, W. FRELS, J. BLUESTONE, R. HODES, and S. RUDIKOFF. 1987. Structure and expression of class I MHC genes in the miniature swine. Vet. Immunol. Immunopathol. 17:2 1 l-22 1.

SINGER, D. S., J. HARE, H. GOLDING, L. FLAHERTY, and S. RUDIKOFF. 1988. Characterization of a new subfamily of class I genes in the H-2 complex of the mouse. Immunogenetics 28: 13-21.

SODOYER, R., M. DAMOTTE, T. L. DELOVITCH, J. TRUCY, B. R. JORDAN, and T. STRACHAN. 1984. Complete nucleotide sequence of a gene encoding a functional human class I histo- compatibility antigen (HLA-CW3). EMBO J. 3:879-885.

STEINMETZ, M., K. W. MOORE, J. G. FRELINGER, B. TAYLOR SHER, F.-W. SHEN, E. A. BOYSE, and L. H. HOOD. 198 1. A pseudogene homologous to mouse transplantation antigens: trans- plantation antigens are encoded by eight exons that correlate with protein domains. Cell 25: 683-692.

STEPHENSON, S. P., R. C. MORLEY, and G. W. BUTCHER. 1985. Genetics of the rat CT system: its apparent complexity is a consequence of cross-reactivity between the distinct MHC class I antigens RT 1 .C and RT 1 .A. J. Immunogenet. 12: 10 l- 114.

STRACHAN, T., R. SODOYER, M. DUMOTTE, and B. R. JORDAN. 1984. Complete nucleotide sequence of a functional class I HLA gene, HLA-A3: implications for the evolution of HLA genes. EMBO J. 3:887-894.

STREILEIN, J. W. 1987. Studies on an MHC exhibiting limited polymorphism. Pp. 379-395 in G. KELSOE and D. H. SCHULZE, eds. Evolution and vertebrate immunity. University of Texas Press, Austin.

SZOTS, H., G. REITHM~LLER, E. WEISS, and T. MEO. 1986. Complete sequence of HLA-B27 cDNA identified through the characterization of structural markers unique to the HLA-A, -B, and -C allelic series. Proc. Natl. Acad. Sci. USA 83: 1428-1432.

TAYLOR SHER, B., R. NAIRN, J. E. COLIGAN, and L. E. HOOD. 1985. DNA sequence of the mouse H-2Dd transplantation antigen gene. Proc. Natl. Acad. Sci. USA 82: 1175-l 179.

TEWARSON, S., Z. ZALKESKA-RUTCZYNSKA, F. RGUEROA, and J. KLEIN. 1983. Polymorphism of Qa and T 1 a loci of the mouse. Tissue Antigens 22:204-2 12.

TRANSY, C., S. R. NASH, D. DAVID-WATINE, M. COCHET, S. W. HUNT III, L. E. HOOD, and P. KOURLISKY. 1987. A low polymorphic mouse H-2 class I gene from the Tla complex is expressed in a broad variety of cell types. J. Exp. Med. 166:34 l-36 1.

TROWSDALE, J. R., and R. D. CAMPBELL. 1988. Physical map of the human HLA region. Immunol. Today 9:34-35.

Page 21: Independent Origin of Nonclassical Class I Genes in Different … Publications/1989-hughes... · 2003. 11. 24. · hybridization with labeled probes, Rogers ( 1985a) proposed three

Evolution of Class I MHC Genes 579

TYKOCINSKI, M. L., P. N. MARCHE, E. E. MAX, and T. J. KINDT. 1984. Rabbit class I MHC genes: cDNA clones define full-length transcripts of an expressed gene and a putative pseu- dogene. J. Immunol. 133:226 l-2269.

WATTS, S., J. M. VOGEL, W. D. HARRIMAN, T. ITOH, H. J. STAUSS, and R. S. GOODENOW. 1987. DNA sequence analysis of the C3H H-2Kk and H-2Dk loci: evolutionary relationships to H-2 genes from four other mouse strains. J. Immunol. 139:3878-3885.

WAYS, J. P., H. L. COPPIN, and P. PARHAM. 1985. The complete primary structure of HLA- Bw58. J. Biol. Chem. 260: 11924-l 1933.

WEISS, E., L. GOLDEN, R. ZAKUT, A. MELLOR, K. FAHRNER, S. KVIST, and R. A. FLAVELL. 1983. The DNA sequence of the H-2Kb gene: evidence for gene conversion as a mechanism for the generation of polymorphism in histocompatibility antigens. EMBO J. 2:453-462.

WIDMARK, E., H. RONNE, V. HAMMERLING, B. SERVENIUS, D. LARHAMMAR, K. GUSTAFSSON, J. B~HME, P. A. PETERSON, and L. RASK. 1988. Family relationships of murine major histocompatibility complex class I genes: sequence of the T2A” pseudogene, a member of gene family 3. J. Biol. Chem. 263:7055-7059.

WOLFE, K. H., P. M. SHARP, and W.-H. LI. 1989. Mutation rates vary among regions of the mammalian genome. Nature 337:283-285.

WV, C.-I., and W.-H. LI. 1985. Evidence for higher rates of nucleotide substitution in rodents than in man. Proc. Natl. Acad. Sci. USA 82: 174 I- 1745.

YUHKI, N., G. F. HEIDECKER, and S. J. O’BRIEN. 1989. Characterization of MHC cDNA clones in the domestic cat. J. Immunol. 142:3676-3682.

ZEMMOUR, J., P. D. ENNIS, P. PARHAM, and B. DUPONT. 1988. Comparison of the structure of HIA-Bw4 7 to HIA-B13 and its relationship to 2 1 -hydroxylase deficiency. Immunogenetics 27:28 l-287.

JAN KLEIN, reviewing editor

Received March 22, 1989; revision received July 11, 1989

Accepted July 20, 1989