structural repertoire in vhpseudogenes of immunoglobulins: comparison with human germline genes and...

8
JMB—MS 305 Cust. Ref. No. CAM 236/94 [SGML] J. Mol. Biol. (1995) 246, 74–81 Structural Repertoire in V H Pseudogenes of Immunoglobulins: Comparison with Human Germline Genes and Human Amino Acid Sequences Enrique Vargas-Madrazo 1 , Juan C. Almagro 2 and Francisco Lara-Ochoa 2 * In the pool of human immunoglobulin V H gene segments, pseudogenes 1 Instituto de Investiciones amount to roughly 30% of the total number of genes. Some of them are highly Biolo ´gicas, Universidad conserved among unrelated individuals. These facts suggest a possible Veracruzana, Xalapa Veracruz functional role for pseudogenes in the human immune response diversity. Me ´xico This paper intends to provide additional information about the structure of 2 Instituto de Quı ´mica V H pseudogene sequences to evaluate the possible role of pseudogenes in the Universidad Nacional immune response. Auto ´noma de Me ´xico Mutations capable of altering framework stability in human V H Circuito Exterior, Ciudad pseudogenes were analyzed. Results indicate that V H pseudogenes are about Universitaria, C.P. 04510 14 times as divergent as human V H functional germline genes on the one Me ´xico, D.F. hand, and four times as divergent in the case of human V H amino acid sequences on the other. The high number of disruptive mutations in pseudogenes is an expected result because of the lack of functionality of these genes. In the second part of the work we analyze whether or not the same takes place in the positions that determine the existence of canonical structures in the hypervariable loops in V H pseudogenes. An extension of such analysis is applied to all species with reported V H pseudogenes. In contrast with results concerning framework positions, 69% of known human V H pseudogenes have canonical structures in the first hypervariable loop, while 48% do so in the second one. Comparison of these results with those found in human V H functional germline genes and human V H amino acid sequences shows that in the former as many as 100% and in the latter 96% have canonical structures. In V H amino acid sequences the result is similar to pseudogenes for H1. For H2, such value lies between the percentage of germline genes (96%) and the percentage of pseudogenes (48%). The possible significance of the existence of canonical structures in the hypervariable loops of V H pseudogenes is discussed. Keywords: canonical structures; canonical structure classes; gene conversion; *Corresponding author antigen-binding site; V H locus Introduction Antibody molecules are highly antigen-specific receptors of the immune system. Antigen–antibody interaction involves the former variable domains, each composed of a two b-sheet framework (Amzel & Poljak, 1979). The antigen binding site is composed of six hypervariable loops; three from the variable light domain (V L ) and three from the variable heavy domain (V H ) denoted L1, L2, L3 and H1, H2, H3, respectively (Wu & Kabat, 1970; Poljak et al ., 1973). Genetically, L1 and L2 are produced by the V L gene, while L3 is produced by the recombination of an additional gene segment, J L . In a similar way, H1 and H2 are produced by the V H gene, and H3 is a result of the recombination of two additional gene segments, D and J H (Tonegawa, 1983). In addition to functional germline genes, most species retain a pool of V H pseudogenes estimated to be approximately 30% of the total number of V H genes (Kodeira et al ., 1986; Hsu et al ., 1989). In the human V H 3 family, half of the genes are pseudogenes (Pascual & Capra, 1991). Polymor- phism studies of human V H pseudogenes report high sequence conservation among unrelated individuals (Pascual & Capra, 1991; Tomlinson et al ., Abbreviation used: Ig, immunoglobulin. 0022–2836/95/060074–08 $08.00/0 7 1995 Academic Press Limited

Upload: enrique-vargas-madrazo

Post on 11-Oct-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Structural Repertoire in VHPseudogenes of Immunoglobulins: Comparison with Human Germline Genes and Human Amino Acid Sequences

JMB—MS 305 Cust. Ref. No. CAM 236/94 [SGML]

J. Mol. Biol. (1995) 246, 74–81

Structural Repertoire in V H Pseudogenes ofImmunoglobulins: Comparison with Human GermlineGenes and Human Amino Acid Sequences

Enrique Vargas-Madrazo 1, Juan C. Almagro 2 andFrancisco Lara-Ochoa 2*

In the pool of human immunoglobulin VH gene segments, pseudogenes1Instituto de Investicionesamount to roughly 30% of the total number of genes. Some of them are highlyBiologicas, Universidadconserved among unrelated individuals. These facts suggest a possibleVeracruzana, Xalapa Veracruzfunctional role for pseudogenes in the human immune response diversity.MexicoThis paper intends to provide additional information about the structure of2Instituto de Quımica VH pseudogene sequences to evaluate the possible role of pseudogenes in the

Universidad Nacional immune response.Autonoma de Mexico Mutations capable of altering framework stability in human VHCircuito Exterior, Ciudad pseudogenes were analyzed. Results indicate that VH pseudogenes are aboutUniversitaria, C.P. 04510 14 times as divergent as human VH functional germline genes on the oneMexico, D.F. hand, and four times as divergent in the case of human VH amino acid

sequences on the other. The high number of disruptive mutations inpseudogenes is an expected result because of the lack of functionality ofthese genes. In the second part of the work we analyze whether or not thesame takes place in the positions that determine the existence of canonicalstructures in the hypervariable loops in VH pseudogenes. An extension ofsuch analysis is applied to all species with reported VH pseudogenes. Incontrast with results concerning framework positions, 69% of known humanVH pseudogenes have canonical structures in the first hypervariable loop,while 48% do so in the second one. Comparison of these results with thosefound in human VH functional germline genes and human VH amino acidsequences shows that in the former as many as 100% and in the latter 96%have canonical structures. In VH amino acid sequences the result is similarto pseudogenes for H1. For H2, such value lies between the percentage ofgermline genes (96%) and the percentage of pseudogenes (48%). Thepossible significance of the existence of canonical structures in thehypervariable loops of VH pseudogenes is discussed.

Keywords: canonical structures; canonical structure classes; gene conversion;*Corresponding author antigen-binding site; VH locus

Introduction

Antibody molecules are highly antigen-specificreceptors of the immune system. Antigen–antibodyinteraction involves the former variable domains,each composed of a two b-sheet framework (Amzel& Poljak, 1979). The antigen binding site is composedof six hypervariable loops; three from the variablelight domain (VL) and three from the variable heavydomain (VH) denoted L1, L2, L3 and H1, H2, H3,respectively (Wu & Kabat, 1970; Poljak et al., 1973).Genetically, L1 and L2 are produced by the VL gene,

while L3 is produced by the recombination of anadditional gene segment, JL. In a similar way, H1 andH2 are produced by the VH gene, and H3 is a resultof the recombination of two additional genesegments, D and JH (Tonegawa, 1983).

In addition to functional germline genes, mostspecies retain a pool of VH pseudogenes estimated tobe approximately 30% of the total number of VH

genes (Kodeira et al., 1986; Hsu et al., 1989). Inthe human VH3 family, half of the genes arepseudogenes (Pascual & Capra, 1991). Polymor-phism studies of human VH pseudogenes reporthigh sequence conservation among unrelatedindividuals (Pascual & Capra, 1991; Tomlinson et al.,Abbreviation used: Ig, immunoglobulin.

0022–2836/95/060074–08 $08.00/0 7 1995 Academic Press Limited

Page 2: Structural Repertoire in VHPseudogenes of Immunoglobulins: Comparison with Human Germline Genes and Human Amino Acid Sequences

JMB—305

Structural Repertoire in VH Pseudogenes 75

1992). This might possibly be an indication offunctionality (Blankenstein et al., 1987; Pascual &Capra, 1991). For chicken (Reynaud et al., 1989) andrabbit (Becker & Knight, 1990) it has been establishedthat one or at most only a few functional genes areexpressed. The main source of the immune responsediversity in chicken and rabbit proceeds fromdonation of pseudogene segments to the functionalgenes, possibly by a somatic gene conversionmechanism (Wysocki & Gefter, 1989; Pascual &Capra, 1991).

Analysis of antibodies of known atomic structurehas revealed a small number of main-chain con-formations or canonical structures for L1, L2, L3 aswell as H1 and H2 (Chothia & Lesk, 1987; Chothiaet al., 1989; Tramontano et al., 1990; Brunger et al.,1991; Strong et al., 1991; Fischmann et al., 1991;Fan et al., 1992; Garcia et al., 1992; Tormo et al., 1992;Rose et al., 1993). Canonical structures in five out ofsix hypervariable loops imply that only a fewmain-chain conformations are present in a large setof antibody molecules with different loop sequences.

A canonical structure is determined by (1) the loopsize and (2) the presence of certain residues at keypositions in both the loop and the framework regions(Chothia & Lesk, 1987; Chothia et al., 1989;Tramontano et al., 1990). On this basis, Chothia et al.(1992) analyzed the VH functional germline genesequences of human immunoglobulin (Ig) and haveoutlined the structural repertoire for H1 and H2.They find that most of functional germline genesequences have canonical structures for H1 and H2,and propose the concept of canonical structureclasses for the combinations of H1 and H2 canonicalstructure types.

Finding that a large percentage of human VH

functional germline genes have canonical structuresprovides evidence concerning structural restrictionsat work in the process of antigen recognition. Ifhuman pseudogenes play some functional role in theimmune response, it might be possible that canonicalstructures could also be present in these sequences.Following this proposition: (1) amino acid patterns inpositions responsible for stability of the frameworkin VH sequences and (2) amino acid patterns inpositions responsible of canonical structures of thehypervariable loops in VH sequences are analyzed inthis paper. (3) The following comparisons are made:canonical structures, presentation frequency andcanonical structure classes among human pseudoge-nes, human functional germline genes and humanamino acid of VH sequences. (4) An extension of thecanonical structure analysis is implemented toanalyze the pseudogenes from all other species withreported sequences.

Results

Structural divergence measurement ofsequence data bases

The results of structural divergence measurementcalculation for pseudogenes, functional germline

genes and the amino acid sequences are the following(standard deviation in parenthesis):

Humanfunctional Amino Other

Human germline acid speciespseudogenes genes sequences pseudogenes

6.8 (4.5) 0.5 (0.3) 1.8 (1.1) 6.1 (4.8)

As can be seen, mutations at positions governingframework stability of human VH pseudogenes areabout 14 times larger than those of human VH

functional germline genes and four times asdivergent as those in the VH amino acid sequences.Divergence measurement of the amino acid se-quences (about two mutations per sequences) withrespect to functional germline genes (less than onemutation per sequence) might be an effectcorrespondent to the hypermutation process. Pseu-dogenes from other species, mainly mouse se-quences, are very similar to human pseudogenes(roughly six mutations in each sequence).

Furthermore, the standard deviation is large inpseudogenes from human and the other speciesstudied when compared with the functionalgermline genes and amino acid sequences. Forhuman pseudogenes the high standard deviation isattributed to 12 sequences having more than tenmutations. The mutations in the remaining se-quences are distributed as follows: ten sequenceswith seven to nine mutations and 31 sequences withmutations between zero and six. Likewise, thedispersion for other species pseudogenes is similar:six sequences with more than ten mutations, sixsequences with seven to nine mutations and 20sequences with mutations from zero to sixmutations. In contrast with pseudogenes, all theamino acid sequences have, at most, four mutations(the only exception being that of sequenceDR12910-2F8 which has nine mutations). Humanfunctional germline genes have at most one mutationper sequence.

Therefore, pseudogenes are not only more noisythan functional germline genes and the amino acidsequences at framework positions, but the number ofmutations is also more variable among sequences aswell. These observations prompted us to study thesignificance of sequence patterns compatible withcanonical structures in human pseudogenes andthose from other species, as shown in the followingsections.

Canonical structures in V H sequences

Sequence patterns of canonical structures arefound in human pseudogenes in 69% of thesequences of H1 and 48% for H2 (Table 1). Thisanalysis in human VH functional germline genesdemonstrates that as many as 100% and 96% havecanonical structures in H1 and H2, respectively. Thepercentage of H1 canonical structures in the amino

Page 3: Structural Repertoire in VHPseudogenes of Immunoglobulins: Comparison with Human Germline Genes and Human Amino Acid Sequences

JMB—305

Structural Repertoire in VH Pseudogenes76

Table 1Percentage of loops in the samples with canonical structures

Samples

Humanfunctional Human

Human germline amino acid Mouse Rabbit XenopusLoop pseudogenes genes sequences pseudogenes pseudogenes pseudogenes

H1 69 100 71 86 100 100H2 48 96 70 86 80 25

acid sequences is equal to that in pseudogenes, whilefor H2 the value is between that of functionalgermline genes (96%) and pseudogenes (48%).

Considering that the structural divergencemeasurement in pseudogenes is significantly largerthan in the amino acid sequences, and even larger forfunctional germline genes, the percentage of humanVH pseudogenes with sequence patterns compatiblewith canonical structures is striking. For mousepseudogenes the frequency of canonical structures islarger than the one corresponding to human, andeven greater for the amino acid sequences (seeTable 1). The five sequences from rabbit havecanonical structure patterns in H1 and foursequences for H2. The four sequences of Xenopuspresent canonical structure in H1 and one sequencehas a canonical structure for H2. The sequence ofelops (Pa8a) has a canonical structure in H1 but forH2 the sequence has a sequence pattern that does notmatch with the canonical structure.

Canonical structure types for H1

Considering only those sequences with canonicalstructures in the first VH hypervariable loop, thepercentage of each canonical structure type for H1 isshown in the first half of Table 2. Canonical structuretype 1 represents 94% of human pseudogenes and100% of mouse pseudogenes, about 75% for human

functional germline genes and 82% for the aminoacid sequences.

It is worth mentioning that while in most caseshuman pseudogenes present the type 1 for H1 (onlytwo sequences with the type 3, see Table 2),functional germline genes possess a significativeproportion of types 2 and 3 of H1 (13% and 14%,respectively). Type 2 have low frequency in theamino acid sequences (3.7%). The observation thathuman and mouse pseudogenes, in most cases,present the type 1 for H1 contrasts with rabbit andXenopus pseudogenes which have the other twocanonical structure types.

Canonical structure types for H2

The second half of Table 2 reports the frequency ofthe canonical structure types for the second VH

hypervariable region. The frequencies of canonicalstructures for this loop vary from one sample toanother, although the canonical structure type 4 isthe least represented in all the samples.

Frequencies of canonical structure types fromhuman sequences are similar in the functionalgermline genes and in the amino acid sequences.Human pseudogenes do not present sequences withcanonical structure type 4. In mouse, the canonicalstructure type 2 is the most frequent. The aboveindicates that unlike H1, where type 1 is the most

Table 2Percentage of occurrence in the samples of different canonical structures

Samples

Pseudogenes

HumanCanonical† functionalstructure germline Human amino

Loop type genes acid sequences Human Mouse Rabbit Xenopus Elops Shark

(77)‡ (161) (35) (18) (5) (4) (1) —1 75 82 94 100 80 50 100 —

H1 2 12 4 0 0 20 0 0 —3 13 14 6 0 0 50 0 —

(74) (140) (25) (18) (4) (1) — —1 39 28 28 11 25 0 — —

H2 2 18 18 16 78 25 0 — —3 39 48 56 11 50 100 — —4 4 6 0 0 0 0 — —

† Canonical structures described by Chothia et al. (1992).‡ Number of sequences analyzed.

Page 4: Structural Repertoire in VHPseudogenes of Immunoglobulins: Comparison with Human Germline Genes and Human Amino Acid Sequences

Structural Repertoire in VH Pseudogenes 77

representative type of canonical structure and themost frequent in human pseudogenes, H2 displays aheterogeneous distribution of canonical structuretypes.

Canonical structure classes

The frequency of canonical structure classes in thethree sequence samples is shown in Table 3. Aspreviously discussed, H1 type 1 occurs in a largenumber of sequences in the three sequence databases (see Table 2). H2 types are rather variableamong different samples. Thus, most of thecombinations occur only between the canonicalstructure type 1 for H1 and any one of the othercanonical structure types of H2. In pseudogenes thisrule holds for sequences from all species, with thesole exception of one human sequence (5% of the 3–1class in Table 3) and one rabbit sequence.

For human functional germline genes and theamino acid sequences, the 2–1 and 3–1 classes occurwith a relatively high percentage (see Table 3).Correlation between these two samples is good forall classes.

Discussion

The finding of a vast amount of pseudogeneswithin the human VH and VH genes of other speciesstudied have posed several questions about itspossible function in the immune response (Wysocki& Gefter, 1989; Pascual & Capra, 1991). It has beenproposed that the number of mutations accumulatedin pseudogenes with respect to functional germlinegenes is principally a function of: (1) the time elapsedsince inactivation of the gene has occurred(Blankenstein et al., 1987); and (2) possible functionalrestrictions acting in the new state of the gene(Pascual & Capra, 1991; Tomlinson et al., 1992;McCormack et al., 1993). In pseudogenes of all thespecies analyzed (principally from human), a high

divergence was observed in residues determining theIg-fold when compared with the functional germlinegenes and amino acid sequences.

Contrasting with the fact that pseudogenes have anaverage of seven destructive mutations per sequenceoutside the antigen binding site, the finding that 69%of its sequences have canonical structures of H1 and48% of its sequences have canonical structures for H2is quite surprising. In the case of H2, having fewersequences presenting canonical structures might beintimately related to the fact that this hypervariableloop is, in general, less conserved (Chothia et al.,1992). In addition, the amino acid sequences havecanonical structure frequencies similar to those inpseudogenes. The above combined with the resultsthat show that pseudogenes from mouse and otherspecies (as distant from human as Xenopus andelops) also present a large proportion of canonicalstructures reinforce the main result of this report, thepresence of canonical structure sequence patterns inhuman pseudogenes.

Pseudogenes are not only more noisy thanfunctional germline genes and amino acid se-quences, but the number of mutations is also morevariable among sequences as well. This observationseems to point towards the need of analyzing if thepresence of a large number of destructive mutationsin the framework imply the absence of sequencepatterns of canonical structures. Therefore, discard-ing from the analysis all those sequences with morethan ten mutations (12 sequences), the percentage ofhuman pseudogene sequences with canonicalstructures was calculated. The percentages obtainedwere 68% for H1 (69% considering all sequences, seeTable 1), and 52% for H2 (48% considering allsequences). For H1, the percentage of sequencepatterns decreases, while for H2 it increases. Theseresults seem to be an indication that a strongcorrelation is not observed between large frameworkmutations and the absence of canonical structurepatterns in pseudogene hypervariable loops.

Sequence studies between functional germlinegenes and pseudogenes have shown the existence ofa strong evolutionary link between the functionalgermline genes and the pseudogenes belonging tothe same family (Pascual & Capra, 1991). Thus, theconsistency of results reported in previous sectionscan be tested by analyzing the correlation amongcanonical structures and functional germline geneand pseudogene families. The distribution ofcanonical structures codified by gene families wasanalyzed. It was found in almost all cases that thecanonical structures that are encoded by thefunctional germline genes of some families areencoded by pseudogenes that are members of thesame family (results not shown). That is to say thata close correlation in the codification of canonicalstructures and the sequence families betweenfunctional germline genes and pseudogenes exists.

It has been proposed that the sequence conserva-tion observed in pseudogenes is due to germlinegene conversion (Haino et al., 1994). As discussedpreviously, almost all the VH functional germline

Table 3Percentage of canonical structure classes

Samples

Human Humanfunctional amino

Canonical Human germline acid Mousestructure pseudogenes genes sequences pseudogenesclass (19)† (74) (116) (15)

1–1 16.0 14.7 13.8 6.71–2 11.0 17.3 19.8 86.71–3 69.0 38.7 47.4 6.71–4 13.3 4.0 0.9 0.02–1 0.0 12.0 5.2 0.02–2 0.0 0.0 0.0 0.02–3 0.0 0.0 0.0 0.02–4 0.0 0.0 0.0 0.03–1 5.0 12.0 11.2 0.03–2 0.0 0.0 0.0 0.03–3 0.0 0.0 0.9 0.03–4 0.0 0.0 0.0 0.0

† Number of sequences analyzed

Page 5: Structural Repertoire in VHPseudogenes of Immunoglobulins: Comparison with Human Germline Genes and Human Amino Acid Sequences

JMB—305

Structural Repertoire in VH Pseudogenes78

genes present canonical structures (Chothia et al.,1992). Considering this, it could be proposed that thepresence of canonical structures in VH pseudogenesreported here is due to germline gene conversionevents between functional germline genes andpseudogenes. At present no firm evidence has beenpresented about the presence of somatic geneconversion in human Ig genes (Berek, 1993).

However, if this mechanism really contributes to thediversity of the functional repertoire in humans,then the presence of canonical structures at thehypervariable loops of the donor pseudogenes couldbe convenient for maintaining the restrictions thathave been established that work within the structuralrepertoire generated by the functional germlinegenes (Chothia et al., 1992).

Figure 1. continued overleaf

Page 6: Structural Repertoire in VHPseudogenes of Immunoglobulins: Comparison with Human Germline Genes and Human Amino Acid Sequences

Structural Repertoire in VH Pseudogenes 79

Figure 1. Pseudogenes multiple sequence alignment. aNotation for canonical structure classes. The first number in thecanonical structure class corresponds to the H1 canonical structure type while the second one corresponds to the H2canonical structure type. ? Indicates that the loop has an uncertain conformation; U indicates an undetermined sequencefor this loop. bPositions that are primary responsible for the immunoglobulin fold conserved features (B, buried in b-sheets;T, in turns; I, inter-domains (Chothia et al., 1988)). cNumbering as in Chothia et al. (1992). * Indicates a stop codon. Insertionsand deletions that produce frameshift changes in the amino acid sequence were eliminated to obtain the most correct Ig-likesequences. The sequence V71-7, from human pseudogenes has, in H1, the length and sequence pattern correspondingto canonical structure 1 but, lacking the second half of the sequence, residue 94, being important for conformation, it isnot known. The identity of this sequence with respect to other pseudogene sequences available were calculated. Asequence having 82% identity with V71-7 was found. Such a sequence has an Arg at position 94. Thus, we conclude thatV71-7 has the canonical structure type 1 for H1.

Materials and Methods

Ig VH Sequences

In order to study the canonical structures in Ig VH

pseudogene sequences, a multiple alignment wasformed for human pseudogene sequences and allpseudogene sequences reported for the other species inthe Genbank data base release 83 (Figure 1). The mainsource of Homo sapiens (human) pseudogene sequenceswas the directory of pseudogene sequences reported byTomlinson et al. (1992). Human sequences obtained bysearching the nucleotide sequence data base werecompared with the above-mentioned directory. Thosesequences that have 95% or more identity with one ormore sequences of the directory of Tomlinson et al. (1992)were discarded from the analysis. Incompletesequences that do not encode at least one of the twohypervariable loops (H1 and H2) were discarded. Wefound 14 additional VH sequences out of a total of 53sequences. Concerning other species, the number ofpseudogenes sequences found were: Mus musculus(mouse), 21; Oryctolagus cunniculus (rabbit), 5; Xenopusleavis (Xenopus), 4; Heterodontus francisci (shark), 1; andElops saurus (elops), 1.

Two additional multiple alignments were formedto compare the results of human pseudogenes: onefor human VH functional germline genes, and the other onefor human VH amino acid sequences. The multiplealignment of human functional germline genes wasobtained from Tomlinson et al. (1992). The multiplealignment of human VH amino acid sequences wasconstructed considering all human VH mature sequencesreported by Kabat et al. (1991), thus making a total of 325sequences.

Classification of pseudogenes sequences byresidue identities

Sequences of VH functional germline genes have beenclassified into six established families (Pascual & Capra,1991). Tomlinson et al. (1992) have assigned humanpseudogenes to the functional germline gene families. Thisclassification was implemented according to the amino acididentity of each pseudogene sequence with members fromdifferent functional germline gene families. Applying thismethod, the 14 additional human pseudogene sequencesand pseudogenes from the other species were assigned tothe functional germline gene families established byTomlinson et al. (1992) (Figure 1).

Structural divergence measurement

Analysis of Ig of known three-dimensional structure hasshown that VL and VH domains contain a largely conservedb-sheet framework (Amzel & Poljak, 1979). Residuesmainly responsible for the structural conserved features inIg-variable domains have been proposed by Chothia et al.(1988) based on structural considerations (see top of Fig-ure 1). For each framework position, several amino acidresidues appear normally in the VH domains of Ig (Chothiaet al., 1988). For the purposes of the present analysis thosepositions not having any of these amino acid residues wereconsidered as defective or mutated. This procedure wascarried out for every sequence in each sample. The averagenumber of defective positions per sequence in each samplewas calculated by the standard method of dividing thetotal number of defective positions by the total number ofsequences in each sample. Such a description of sequencestructural divergence might be considered as an

Page 7: Structural Repertoire in VHPseudogenes of Immunoglobulins: Comparison with Human Germline Genes and Human Amino Acid Sequences

JMB—305

Structural Repertoire in VH Pseudogenes80

oversimplification of the substitution process since it is awell-established fact that proteins are capable, at leastpartially, of accommodating the effects of mutations(Chothia et al., 1985). In this analysis, however, the purposewas to estimate a divergence measurement of each samplewith respect to some standard of reference.

Determination of canonical structures and canonicalstructure classes

Canonical structures for H1 and H2 in pseudogenesequences, functional germline genes, and amino acidsequences were determined on each multiple alignment bythe sequence patterns as defined by Chothia et al. (Chothia& Lesk, 1987; Chothia et al., 1989, 1992; Tramontano et al.,1990). In order to search for these patterns in the threemultiple sequence alignments, all conventions of number-ing, placement of insertions, length and localization insequences of hypervariable loops are utilized as reportedby Chothia et al. (1992).

Canonical structure classes as defined by Chothia et al.(1992) are the different combinations of canonical structuretypes for H1 and H2. These classes are numbered in theform N–M; N being the number of the H1 canonicalstructure and M the corresponding one for the H2structure.

Acknowledgements

We gratefully acknowledge Dr Mario L. Amzel and DrCarlos Larralde for the critical revision of the manuscript.E. V. thanks V. Hernandez-Mendiola, R. Lopez-Hernandez,M. Ramırez-Benıtez and P. Reidy for technical assistance.We also thank Hector Cecena for useful comments in themanuscript preparation. E. V. was supported byCONACyT, SNI and FOMES, J. C. A. was supported byDGAPA grant no. IN-206093.

ReferencesAmzel, L. M. & Poljak, R. J. (1979). Three-dimensional

structure of immunoglobulins. Annu. Rev. Biochem. 48,961–997.

Becker, R. S. & Knight, K. L. (1990). Somatic diversificationof immunoglobulin heavy chain VDJ genes: evidencefor somatic gene conversion in rabbits. Cell, 63,987–997.

Berek, C. (1993). Somatic mutation and memory. Curr.Opin. Immunol. 5, 218–222.

Blankenstein, T., Bonhomme, F. & Krawinkel, U. (1987).Evolution of pseudogenes in the immunoglobulinVH-gene family of the mouse. Immunogenetics, 26,237–248.

Brunger, A. T., Leahy, D. J., Hynes, T. R. & Fox, R. O. (1991).2.9 A resolution structure of an anti-dinitrophenyl-spin-label monoclonal antibody Fab fragment withbound hapten. J. Mol. Biol. 221, 239–256.

Chothia, C. & Lesk, A. M. (1987). Canonical structures forthe hypervariable regions of immunoglobulins. J. Mol.Biol. 196, 901–917.

Chothia, C., Novotny, J., Bruccoleri, R. & Karplus, M.(1985). Domain association in immunoglobulinmolecules. The packing of variable domains. J. Mol.Biol. 186, 651–663.

Chothia, C., Boswell, R. & Lesk, A. M. (1988). The outlinestructure of the T cell receptor. EMBO J. 7, 3745–3755.

Chothia, C., Lesk, A. M., Tramontano, A., Levitt, M.,Smith-Gill, S. J., Air, G., Sheriff, S., Padlan, E. A.,Davies, D., Tulip, W. R., Colman, P. M., Spinelli, S.,Alzari, P. M. & Poljak, R. J. (1989). Conformations ofimmunoglobulins hypervariable regions. Nature(London), 342, 877–883.

Chothia, C., Lesk, A. M., Gherardi, E., Tomlinson, I. M.,Walter, G., Marks, J. D., Llewelyn, M. B. & Winter, G.(1992). Structural repertoire of the human VH

segments. J. Mol. Biol. 227, 799–817.Fan, Z. C., Shan, L., Guddat, L. W., He, X. M., Gray, W. R.,

Raison, R. L. & Edmunson, A. B. (1992). Three-dimen-sional structure of an Fv from an human IgMimmunoglobulin. J. Mol. Biol. 228, 188–207.

Fischmann, T. O., Bentley, G. A., Bhat, T. N., Boulot, G.,Mariuzza, R. A., Phillips, S. E. V., Tello, D. & Poljak,R. J. (1991). Crystallographic refinement of thethree-dimensional structure of the Fab D1.3 lysozymecomplex at 2.5 A resolution. J. Biol. Chem. 266,12915–12920.

Garcia, K. C., Desiderio, S. V., Ronco, P. M., Verroust, P. J.& Amzel, L. M. (1992). Recognition of angiotensin II:antibodies at different levels of a idiotypic network aresuperimposable. Science, 257, 528–531.

Haino, M., Hayasida, H., Miyata, T., Shin, E. K., Matsuda,F., Nagaoka, H., Matsumura, R., Takaishi, S., Fukita, Y.,Fujikura, J. & Honjo, T. (1994). Comparison andevolution of human immunoglobulin VH segmentslocated in the 3' 0.8-Megabase region. J. Biol. Chem. 269,2619–2626.

Hsu, E., Schwager, J. & Alt, F. W. (1989). Evolutionof immunoglobulin genes: VH families in theamphibian Xenopus. Proc. Nat. Acad. Sci., U.S.A. 86,8010–8014.

Kabat, E. A., Wu, T. T., Perry, H. M., Gottesman, K. S. &Foeller, C. (1991) Sequences of Proteins of ImmunologicalInterest. 5th edit., Public Health Service. N.I.H.Washington, DC.

Kodeira, M., Kinashi, T., Umemura, I., Matsuda, F., Noma,T., Ono, Y. & Honjo, T. (1986). Organization andevolution of the variable region genes of the humanimmunoglobulin heavy chain. J. Mol. Biol. 190,529–541.

McCormack, W. T., Hurley, E. A. & Thompson, C. B. (1993).Germ line maintenance of the pseudogene donor poolfor somatic immunoglobulin gene conversion inchickens. Mol. Cell. Biol. 13, 821–830.

Pascual, V. & Capra, D. (1991). Human immunoglobulinheavy-chain variable region genes: organization,polymorphism, and expression. Advan. Immunol. 49,1–74.

Poljak, R. J., Amzel, L. M., Avey, H. P., Chen, B. L.,Phizacherley, R. P. & Saul, F. (1973). Three-dimen-sional structure of the Fab' fragment of a humanimmunoglobulin at 2.8-A resolution. Proc. Nat. Acad.Sci., U.S.A. 70, 3305–3310.

Reynaud, C.-A., Dahan, A., Anquez, V. & Weill, J.-C. (1989).Somatic hyperconversion diversifies the single VH

gene of the chicken with a high incidence in the Dregion. Cell, 59, 171–183.

Rose, D. R., Przybylska, M., To, R. J., Kayden, C. S., Oomen,R. P., Vorberg, E., Young, N. M. & Bundle, D. R. (1993).Crystal structure to 2.45 A resolution of a monoclonalFab specific for the Brucella A cell wall polysaccharideantigen. Protein Sci. 2, 1106–1113.

Strong, R. K., Campbell, R., Rose, D. R., Petsko, G. A.,Sharon, J. & Margolies, M. N. (1991). Three-dimen-sional structure of murine anti-p-azophenylarsonateFab 36-71.1. X-ray crystallography, site-directed

Page 8: Structural Repertoire in VHPseudogenes of Immunoglobulins: Comparison with Human Germline Genes and Human Amino Acid Sequences

Structural Repertoire in VH Pseudogenes 81

mutagenesis, and modeling of the complex withhapten. Biochemistry, 30, 3739–3748.

Tomlinson, I. M., Walter, G., Marks, J. D., Llewelyn, M. B.& Winter, G. (1992). The repertoire of human germlineVH segments reveals 50 groups of VH segments withdifferent hypervariable loops. J. Mol. Biol. 227, 776–798.

Tonegawa, S. (1983). Somatic generation of antibodydiversity. Nature (London), 302, 575–581.

Tormo, J., Stadler, E., Skern, T., Auer, H., Kanzler, O.,Betzel, C., Blaas, D. & Fita, I. (1992). Three-dimen-sional structure of the Fab fragment of a neutralizingantibody to human rhinovirus serotype 2. Protein Sci.1, 1154–1161.

Tramontano, A., Chothia, C. & Lesk, A. M. (1990).Framework residue 71 is a major determinant of theposition and conformation of the second hypervari-able region in the VH domains of immunoglobulins.J. Mol. Biol. 215, 175–182.

Wu, T. T. & Kabat, E. A. (1970). An analysis of thesequences of the variable regions of Bence Jonesproteins and myeloma light chains and theirimplications for antibody complementarity. J. Exp.Med. 132, 211–250.

Wysocki, L. J. & Gefter, M. L. (1989). Gene conversion andthe generation of antibody diversity. Annu. Rev.Biochem. 58, 509–531.

Edited by J. Karn

(Received 20 May 1994; accepted in revised form 4 November 1994)