molecular organization of the class i genes of human major histocompatibility complex

Download Molecular Organization of the Class I Genes of Human Major Histocompatibility Complex

If you can't read please download the document

Upload: rakesh-srivastava

Post on 02-Aug-2016

222 views

Category:

Documents


6 download

TRANSCRIPT

  • Immunological Reviews 1985, No. 84Published by Munksgaard. Copenhagen, DenmarkNo part may be reproduced by any process without written permission from the author(s)

    Molecular Organization ofthe Class IGenes of Human MajorHistocompatibility ComplexRAKESH SRIVASTAVA, BARRY W. DUCEMAN, PAUL A. BIRO', ASHWANI K. SOOD^& SHERMAN M . WEISSMAN

    INTRODUCTIONThe MHC (Major Histocompatibilily Complex) is a set of linked genes encodingcell surface and plasma proteins that function in various ways to interact withthe immune system. The MHC has been most extensively studied in mouse andman where it encodes 3 types of products - the Class III genes for certaincomponents of the complement system, the Class II genes that encode bothchains of surface antigens tbat correspond to the immune response genes of mice,and the Class I genes that encode the heavy chains of a group of surface antigensthat include, but are not limited to, the classic major transplantation antigens.In both man and mouse certain of these genes, such as those for the classic HLAor H2 transplantation antigens, respectively, are among the most polymorphicgenes known within either species. In addition, in both the species there areother genes interspersed within the MHC. For example, in man the steroid 21-hydroxylase gene is closely linked to the complement genes ofthe MHC (Carrollet al. 1985), and there appears to be a gene related to the iron storage disorderhemochromatosis located within the complex (Dawkins et al. 1983). At present,it isn't clear that all these other genes are also present in the MHC of the mouse.

    The Class I genes or gene products have been studied in several other vertebratespecies including chickens, dogs, pigs, hamsters, etc. (Gotze 1977). They havevery similar structure, in so far as the data are available, in all these species, andit is likely that in higher vertebrates they represent a strongly conserved genestructure. Although speculative and tentative experimental considerations suggest

    Department of Human Genetics, Yale University School of Medicine, 333 Cedar Street,New Haven, CT 06510, USA. 'Present address: Biological Laboratories, Harvard Univer-sity, Cambridge, MA 02138. USA. Present address: Dept. of Molecular Immunology,Rosewell Park Memorial Institute, 666 Elm Street, Buffalo, NY 14263, USA.

  • 94 SRTVASTAVA ET AL.

    that Class I genes may be more widely spread in the animal kingdom (Hildemannet al. 1981, Scofield et al. 1982) no structural data have been published formaterial from other than vertebrate sources.

    Where studied, the Class I genes encode peptide chains of somewhat morethan 300 amino acids that associate on the cell surface with a smaller peptidechain, the P-2 microglobulin consisting of about 100 amino acids. While the genesfor the heavy chains of the Class I antigens and for both chains of the Class IIantigens are linked within the MHC {on the short arm of chromosome 6 in manand chromosome 17 in mouse) the gene for the jff-2 microglobulin is located ona different chromosome (chromosome 15 in man and chromosome 2 in mouse)and unlike the major Class I antigen heavy chain genes, the jff-2 microglobulingene shows little or no polymorphism, and will not be discussed further here.

    The general features of Ihe structure of Class I genes are well known, butthere are many interesting open questions at a number of levels including geneevolution, the regulation of gene expression, and the enumeration and functionsof genes other than the classic transplantation antigens of man. Specific aspectsof some of these questions will be the topic of this brief review.

    THE STRUCTURE OF A CLASS I HEAVY CHAIN GENE

    The size, exon-intron organization, and the encoded products of all the humanand murine genes whose sequences have been determined show strong similarities.The typical gene has 8 exons that, in general, correspond well to the "domains"predicted by the protein analyses done before genomic clones were available. Thefirst exon encodes a 5' untranslated sequence of approximately 18 nucleotides inthe human genes, followed by the codons for a hydrophobic leader peptideresponsible for the insertion of the heavy chain through the cell surface mem-brane. Typically this is about 24 amino acids long, although the A3 allele at theHLA-A locus has a considerably longer leader due to the insertion of 5 additionalresidues near its amino terminus (Strachan et al., 1984). Translation of the leaderpeptide in man could be initiated at either of the 2 AUG codons, separated by6 nucleotide residues (see Fig. 1). The 2nd initiation codon is in a positionanalogous to that of the translation initiation codon for the mouse genes. Anotable exception is the gene contained in our cosmid clone CosRS5. This genedoes not have an additional start codon and resembles mouse H2 genes in thisrespect.

    The 2nd, 3rd and 4th exons contain about 270 nucleotides each. The 2nd and3rd are the most polymorphic of the large exons and the fourth is the mostconserved and most resembles a domain of an immunoglobulin constant region.The 5th exon (about 122 nucleotides) encodes a transmembrane segment ofhydrophobic amino acids and immediately flanking residues including basicamino acids that may interact with the phosphates of membrane phospholipids.

  • CLASS I MHC GENE ORGANIZATION 95

    The 6th and 7th exons are small coding regions (about II and 15 codons,respectively). The encoded peptides show conservative similarities including thepresence of cysteines, and serine residues which are sites for phosphorylation inthe cell

    The 8th exon is the largest, containing over 400 residues. A tmnor variationin Class I genes is that in some, such as the HLA B7 gene, the translationtermination codon lies at the end of the 7th exon, while in other human genesand in at least some of the mouse genes, translation extends across the first fewresidues of exon 8.

    Conserved and Polymorphic RegionsExons: Although no generalized differences are observed in the base compositionof exons versus introns (Fig. 2B) of a given Class I MHC gene (HLA-B7 in thepresent case) the observation that the 5' half of the gene is GC rich, as opposedto the 3' half, holds true for other class I genes too (not shown). As shown inTable I, with the exception of the signal peptide exon (exon 1) all other codingsequences show, on an average, no more than 10% divergence in human genesand 20% divergence in the mouse gene, with respect to the derived consensussequence (Fig. 1). The extent of homology between the 2 genes may approach ashig as 100% as in case of exon 6 of HLA-CW3 and HLA-328 (see Table II)although the same pair of genes diverges in other exons. At the HLA-A locus, 2allelic genes have been sequenced (HLA-A2 and -A3) and they have about 6%differences in exon 6. Two other human genes, P12.4 and LNU show 100%similarity in this exon. No other exon of any 2 genes shows 100% similarity. Thediversity in the sequences of exons coding for cytoplasmic domains of the protein(exons 6, 7, and 8) may be as critical as that of the exons for extracellulardomains. While extracellular domains play a direct role in immune interactions,the cytoplasmic domains are of unknown function but have been speculated tobe the carrier or transmitter of the cytoplasmic signals of immune response and/or anchoring devices for proper orientation in the cytoskeletal matrix. A role ofthe cytoplasmic domains in the transport of the heavy chain to the cell surfaceor its integration into the cell membrane has been ruled out on the basis thatfunctional H-2L^ antigen could be expressed upon transfection ofthe mouse cellswith cloned H2L'' genes having altered or deleted exons which code for thecytoplasmic domains (Zuniga et al. 1983).

    The exons which code for the signal peptide and the extracellular domainsof the protein (exons 1, 2, 3 and 4) show a gradient of decreasing sequenceheterogeneity. Thus, when calculated for human genes, the average sequencehomologies increase from 89.1% for Exon I, 91.3% (Exon II), 91.8% (Exon III)to 94% for Exon IV. This gradient of heterogeneity starts further up in the 5'flanking regions of the genes which, as shown in Table I, show only about 84%

  • 96 SRIVASTAVA ET AL.

    TABLE IComparative analysis ofthe % divergence of nucleotides among various class I genes

    Region/gene

    ' 5'untranslatedHLAB7HLAA2HLAA3ALACW3HLA328LNll12.4CosRS5H2LD

    EXONlHLAB7HLAA2HLAA3HLACW3HLA32812.4LNllCosRS5H2LDEXON2HLAB7HLAA2HLAA3HLACW3HLA32812.4LNllH2LDEXON3HLAB7HLAA2HLAA3HLACW3HLA32812.4LNllH2LDEX0N4HLAB7HLAA2HLAA3HLACW3HLA328

    No. of nucleotides*compared

    182

    m'tB2

    m182192192163179

    '&

    s^8Ste8888m

    270m270270270270270im

    w277Wm277277277277

    1

    277' 377

    277277277

    No.of difrerencest,

    28fl22313434345952

    47

    2!1073

    102527

    14302(2^MU2848

    231821222525

    n48

    20209

    2119

    % divergencefrom consensus

    15.39.8

    12.018.618.617.717.736.229.0

    4.57.9

    23.811.37.93.4

    11.328.430.6

    5.111.19.68.56.38.9

    10.317.7

    8.36.57.57.99.09.09.0

    17.3

    7.27.23.27.56.8

  • CLASS I MHC GENE ORGANIZATION 97

    12.4 277 8 2.9LNll 277 12 4.3H2LD 277 41 14.8EXON5HLAB7 123 12 9.7HLAA2 123 7 5.6HLAA3 123 8 6.5HLACW3 123 18 14.6HLA328 123 16 13.012.4 123 10 8.1LNll 123 10 8.1H2LD 123 43 34.9EX0N6HLAB7 34 4 11.7HLAA2 34 3 8.8HLAA3 34 5 14.7HLACW3 34 5 14.7HLA328 34 5 14.712.4 34 3 8.8LNll 34 3 8.8H2LD 34 10 29.4EX0N7HLAB7 49 4 8.1HLAA2 49 4 8.1HLAA3 49 5 10.0HLACW3 49 6 12.2HLA328 49 6 12.212.4 49 6 12.2LNll 49 6 12.2H2LD 39 9 23.0EXONS and 3' untranslatedHLAB7 423 70 16.5HLAA2 423 56 13.2HLAA3 423 60 14.1HLACW3 423 72 17.0HLA328 423 69 16.312.4 423 80 18.9LNll 423 67 15.8H2LD 423 215 50.0INTRONlHLAB7 134 21 15.6HLAA2 134 15 11.2HLAA3 134 26 19.4HLAACW3 134 13 9.7HLA328 134 15 11.212.4 134 25 18.6LNli 134 25 18.6H2LD 134 51 38.0

  • $8 SRIVASTAVA ET AL.

    INTR0N2HLAB7 260 36 13.8HLAA2 260 30 11.5HLAA3 260 29 II.1HLACW3 260 76. 10.0HLA328 260 M 18.412.4 260 ^ 8.8LNl l 260 M 8.8H2LD 260 13 47.6

    INTR0N3HLAB7 686 1 ^ 19.4HLAA2 686 1401 20.4HLAA3 686 m ' 19.6HLACW3 m W 18.5HLA328 ^ [ ^ 17.512.4 - m 102 14.8LNll m 94 13.7H2KD 686 227 33.0INTRON4HLAB7 140 3S 20.7HLAA2 140 n 15.7HLAA3 140 17 12.1HLACW3 MO 40 28.5HLA328 10 4b 28.512.4 140 22 15.7LNll m S : 16.4H2LD 140 59 42.1INTR0N5HLAB7 455 71 15.6HLAA2 455 78 17.1HLAA3 455 65 14.2HLACW3 45$ 74 16.2HLA328 455 69 15.112.4 4 ^ 63 13.8LNll m 63 13.8H2LD 4SS 350 76.91NTR0N6HLAB7 11^ 82 50.0HLAA2 m 35 21.4HLAA3 1 ^ 37 22.6HLACW3 1 ^ 76 46.6HLA328 Ip 76 46.612.4 m 40 24.5LNll 1 0 45 27.6H2LD 163 61 37.4INTR0N7HLAB7 m 6$ 34.5HLAA2 m 4?' 25.0

  • CLASS I MHC GENE ORGANIZATION 99

    HLA A3 188 49 26.0HLACW3 188 56 29.7HLA328 188 53 28.212.4 188 49 26.0LNII 188 55 29.2H2LD 188 91 4iA

    * These numbers are as represented in consensus sequence and a gap (.) in consensus isalso counted as a number.

    I A deletion at a given position is also counted as a difference and, therefore, a gap (.)in the aligned sequence is also added to the final number of differences.

    Ambiguity symbols ie. symbols other than a, g, t and c in the consensus are ignoredas consensus nucleotides and differences at those places are taken into account.

    homology. Although the gross structure of the exons can be anticipated to beconserved in order to maintain the framework of the coded antigens, the scatterednucleotide changes in clone 328 alter the structure of the encoded protein in sucha way that it associates only weakly with mouse ^-2 microglobulin (Srivastavaet al. 1985). This has not been observed for any of the known class I antigens ofman or mouse. A Tion-(i-2 microglobulin-associated class I MHC molecule has,however, been reported in the rabbit (Wilkinson et al. 1982).

    Comparison of all Class I sequences from the human shows several regionswhose nucleotide sequences are markedly conserved not only between the allelesat a single locus but also between the loci, and even in mildly deterioratedpseudogenes. Certain of these conserved regions are retained relatively unalteredeven in the mouse genes whose sequences have been published. Such regions maybe seen readily on inspection of the consensus derived from the aligned sequences(Fig. 1). In the exons these regions include, for example, exon 2 positions 75-116, exon 3 in about the same region and from about 223-243, major internalportions of exon 4, and perhaps internal portions of exons 6 and 7. Portions ofthis apparent conservation may well be due to chance distribution of the currentset of data and of gene conversion-like events (see below) during evolution. Otherportions of the conservation appear to be partly a result of constraints on theamino acid sequence. For example, in exon 4 over 80% of the observed nucleotidesubstitutions are in the silent third position of codons. Nevertheless the markedconservation of nucleotide sequence between all alleles of man and mouse isstriking and suggests that more than chance or constraints on amino acid se-quence may be operative.

    Analysis of the translational products of the available class I gene sequenceshas led Sood et al. (1985) to conclude that the exon sequences can be furtherdivided in terms of "variable" and "homology" regions. Up to 80% of the basechanges that occur in the variable regions cause amino acid substitutions asopposed to the substitutions in homology regions which have no more than 35%

  • 100 SRIVASTAVA ET AL.

    nonsilent substitutions even though the overall frequency of mutational events isthe same for both variable and homology regions. Except for the transmembranedomain, a predominance of non-conservative amino acid substitutions was no-ticed in both variable and homology regions. This comparison is based in parton genes at different loci but the overaU homology of the genes suggests thatthey encode proteins of similar 3 dimensional structure and function.Introns: The corresponding introns of different genes exhibit relatively weakhomology in general and particularly in inter-species comparisons. Possible excep-tions include the 5' end of introns 4, 3, and 1. Several other regions of intronsshow strong homology within the human genes but diverge considerably fromthe mouse. In the large, the corresponding introns of the human and mousegenes are quite different from each other although, with the exception of intron3, corresponding Introns may be of similar lengths. Within a species there is alsoconsiderable divergence in intron sequences and, particularly in the mouse, inintron lengths. However, more stretches of patchy resemblance between genesmay be noted. Against this background it is noteworthy that the first nucleotideswithin several introns are preserved within a species, and in a few cases perhapsbetween mouse and man (see Fig. 1). These conserved sequences extend beyondthe "canonical" 5' and 3' splice site sequences that have been noted (Mount &Steitz 1983). A weak correlation of this sort has also been noted for globin genes,but it remains to be determined whether there is any functional constraint beyondgeneral splicing recognition signals in the nucleotides adjacent to these splicesites.

    The disposition of the exons in class I genes has been studied specifically bycomparing the structure of cloned genes and cDNA molecules and by primerextension studies of mRNA in HLA-B7 expressing cells (Sood et al. 1984). Severalexamples are known where the same gene may be transcribed from two differentpromoters in different tissues or at different stages of development, and - in the

    TABLE IIHomologies between nucleotide sequences of HLAC W3 and HLA328

    Region5' untranslatedEXONlEXON2EX0N3EXON4EXON 5EXON6EX0N7EXON8 and3' untranslated

    % Homology81.9S63

    mi.SNSJmi9$390.9

    RegionINTRONl1NTRON2INTRON3INTR0N4INTRON5INTRON6INTRON7

    % Homology86.586.294.085.892.870.495.0

  • CLASS I MHC GENE ORGANIZATION 101

    major Class I antigens - that are expressed in so many tissues this remains ahypothetical possibility, but without experimental evidence. On the other hand,heterogeneity of the mRNA and its encoded protein could, and in at least onecase does, arise by alternative splicing of the primary transcript of a single ClassI gene. Kress et al. (1983) have demonstrated alternative RNA splicing in anH2K gene. Kourilsky and his colleagues (Lalanne et al. 1983) found in mouseliver cells that two types of H2-K'' cDNA clones were obtained. One correspondedto the "classical" intron exon organization referred to above. The other used analternative splicing pattern in which a sequence within intron 1 was used as a 3'splice site and joined to the 5' splice site at the end of exon 1. The normal 3'splice site at the end of intron I was retained in the cDNA, but cryptic 3' and5' splice sites in exon 2 became active so that a portion of this exon was splicedout. These cryptic splice sites in exon 2 fall in concerned regions. It is quitepossible that this alternative splicing is a recurrent event, although its physiologicsignificance is unknown. However, the alternate 3' splice site within intron 1 ofthe mouse gene lies in a segment of the intron that is deleted from the humangenes.

    Sequences resembling splice sites may be found elsewhere in Class I genes. Forexample, the sequence YYYYYCACAG is found both at the demonstrablyfunctional spHce site at the 3' end of intron 5 and also in several human Class Igenes about 18 nucleotides upstream from the "true" splice site, while another39 nucleotides upstream in the intron a second potential 3' splice sequence(TCCTCTAG) is common to several of the sequenced genes. As depicted in Fig.2, many "acceptor" and "donor" splice sequences exist at alternative positionsin the HLA-B7 and the occurrence of alternatively spliced mRNAs that encodevariant cytoplasmic domains would not be surprising. The possibihty exists thatthese alternative splices are functional in some cell types or at some stages ofdevelopment, and that this provides additional constraints on the nucleotidesequence.

    From the recent studies done in yeast it is becoming clearer that the sequencesother than just the splice sites consensus sequence may be operative in splicing(Langford & Gallowitz 1983, Pikielny et al. 1983). Such sequences, termed "sphcesignals", are conserved in many higher eukaryotes including mouse and man(Keller & Noon 1984). When a search for such putative splice signals was madein the HLA-B7 gene sequence (using human globin splice signals as consensus,Keller & Noon, 1984), no definitive pattern of distribution of these sequencesemerged. The signal sequences CTGAC lay about 50 base pairs (bp) 5' to the 3'splice site of intron 2, twice in the middle of intron 3 and near the 3' end ofintron 6. The sequence CTGAT was present twice in the middle of bath introns3 and 5, and 23 nucleotides 3' to intron 6. The sequence CTCAA is representedonly once at about 100 nucleotides upstream from the end of exon 8. There isno representation of the signal sequence TTAAC. Experimental evidence is needed

  • 102 SRIVASTAVA ET AL.

    < < u

    UUUU-^UUUlfl.

    < < U U I

    It ea< ll

    (-(- V U H H 3 ?i.

    ; 3.