insulin gene region in type i diabetes

8
Am.J. Hum. Genet. 55:1247-1254,1994 Multiple DNA Variant Association Analysis: Application to the Insulin Gene Region in Type I Diabetes C. Julier," 4 A. Lucassen,5 P. Villedieu,' M. Delepine," 4 C. Levy-Marchal,2 P. M. Danze,7 F. Bianchi,7 C. Boitard,3 P. Froguel,' J. Bell,5'6 and G. M. Lathrop' 4 'INSERM U358, 2INSERM CJF93- 13, H6pital Robert Debre, and 'INSERM U 125, H6pital Necker, Paris; 4Wellcome Trust Centre for Human Genetics, and 'Institute of Molecular Medicine and 'Nuffield Department of Clinical Medicine, John Radcliffe Hospital, Oxford; and 7Laboratoire de Biochimie, Hopital B, CHRU Lille, Lille Summary Association and linkage studies have shown that at least one of the genetic factors involved in susceptibility to in- sulin-dependent diabetes mellitus (IDDM) is contained within a 4.1-kb region of the insulin gene. Sequence anal- ysis has led to the identification of 10 DNA variants in this region that are associated with increased risk for IDDM. These variants are in strong linkage disequilibrium with each other, and previous studies have failed to distinguish between the variant(s) that cause increased susceptibility to IDDM and others that are associated with the disease because of linkage disequilibrium. To address this prob- lem, we have undertaken a large population study of French diabetics and controls and have analyzed genotype patterns for several of the variant sites simultaneously. This has led to the identification of a subset consisting of four variants (-2733AC, -23HphI, -365VNTR, and +1140AC), at least one of which appears to be directly implicated in disease susceptibility. The multiple-DNA- variant association-analysis approach that is applied here to the problem of identifying potential susceptibility vari- ants in IDDM is likely to be important in studies of many other multifactorial diseases. Introduction Insulin-dependent diabetes mellitus (IDDM) is a multifac- torial disease with polygenic susceptibility (Thomson et al. 1988). The HLA class II region of the major histocompat- ibility complex on chromosome 6 contains one or more of the genetic determinants of the disease. Another suscepti- bility locus has been shown to reside near the insulin gene (INS) on chromosome 1lp. The existence of this suscepti- bility locus was initially suggested by the demonstration of Received April 11, 1994; accepted for publication August 3, 1994. Address for correspondence and reprints: Dr. Cecile Julier, Wellcome Trust Centre for Human Genetics, Windmill Road, Headington, Oxford OX3 7BN, England. © 1994 by The American Society of Human Genetics. All rights reserved. 0002-9297/94/5506-0023$02.00 association between genotypes of the INS 5' VNTR locus (-365VNTR) and IDDM (Bell et al. 1984). This was later confirmed by the demonstration of both linkage and asso- ciation between INS and IDDM Uulier et al. 1991; Bain et al. 1992). All the DNA variants in the INS region that are associated with increased risk for IDDM have been identi- fied through sequence analysis (Julier et al. 1991; Lucassen et al. 1993). The disease-associated variants reside within a 4.1-kb segment that includes INS; other DNA variants within the region of INS but outside this segment are not associated with increased risk of disease. Strong linkage disequilibrium is observed between all the disease-associ- ated variants, which have similar relative risks when com- pared in IDDM patients and control population. In previ- ous population studies, it has not been possible to deter- mine which variant(s) is directly responsible for disease susceptibility. Positive association between disease and polymorphic markers has been shown in several monogenic diseases with low mutation rates, such as cystic fibrosis and Hunt- ington disease. This may help define the region of interest, for further positional cloning of the gene, as in the case of Huntington disease (MacDonald et al. 1992), or to dissect the genetic variability and subphenotypes of disease, as in the case of cystic fibrosis (Kerem et al. 1989). In this situa- tion, the association results from the short distance be- tween the marker and the disease mutation and is modu- lated by recombination events and rare new mutations. Al- though association is suggestive of proximity between the marker and the disease mutation, there is not necessarily a direct relationship between the degree of association and the distance to the disease mutation. Association studies have also been used to test candi- date genes for susceptibility to multifactorial diseases, as illustrated by INS and IDDM. The situation here is quite different from that observed in most monogenetic disor- ders, which are characterized by a low frequency of the disease allele and high penetrance. In the case of IDDM, the INS alleles that are associated with increased suscepti- bility are the most frequent in the general population, but they have low penetrance. The polymorphism(s) causing the increased susceptibility is thought to be among the DNA variants that have already been identified. It should 1247

Upload: duongdang

Post on 02-Jan-2017

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Insulin Gene Region in Type I Diabetes

Am.J. Hum. Genet. 55:1247-1254,1994

Multiple DNA Variant Association Analysis: Application to theInsulin Gene Region in Type I DiabetesC. Julier," 4 A. Lucassen,5 P. Villedieu,' M. Delepine," 4 C. Levy-Marchal,2 P. M. Danze,7 F. Bianchi,7C. Boitard,3 P. Froguel,' J. Bell,5'6 and G. M. Lathrop' 4

'INSERM U358, 2INSERM CJF93- 13, H6pital Robert Debre, and 'INSERM U 125, H6pital Necker, Paris; 4Wellcome Trust Centre for Human Genetics,and 'Institute of Molecular Medicine and 'Nuffield Department of Clinical Medicine, John Radcliffe Hospital, Oxford; and 7Laboratoire de Biochimie,Hopital B, CHRU Lille, Lille

Summary

Association and linkage studies have shown that at leastone of the genetic factors involved in susceptibility to in-sulin-dependent diabetes mellitus (IDDM) is containedwithin a 4.1-kb region of the insulin gene. Sequence anal-ysis has led to the identification of 10DNA variants in thisregion that are associated with increased risk for IDDM.These variants are in strong linkage disequilibrium witheach other, and previous studies have failed to distinguishbetween the variant(s) that cause increased susceptibilityto IDDM and others that are associated with the diseasebecause of linkage disequilibrium. To address this prob-lem, we have undertaken a large population study ofFrench diabetics and controls and have analyzed genotypepatterns for several of the variant sites simultaneously.This has led to the identification of a subset consisting offour variants (-2733AC, -23HphI, -365VNTR, and+1140AC), at least one of which appears to be directlyimplicated in disease susceptibility. The multiple-DNA-variant association-analysis approach that is applied hereto the problem of identifying potential susceptibility vari-ants in IDDM is likely to be important in studies of manyother multifactorial diseases.

Introduction

Insulin-dependent diabetes mellitus (IDDM) is a multifac-torial disease with polygenic susceptibility (Thomson et al.1988). The HLA class II region of the major histocompat-ibility complex on chromosome 6 contains one or more ofthe genetic determinants of the disease. Another suscepti-bility locus has been shown to reside near the insulin gene(INS) on chromosome 1lp. The existence of this suscepti-bility locus was initially suggested by the demonstration of

Received April 11, 1994; accepted for publication August 3, 1994.Address for correspondence and reprints: Dr. Cecile Julier, Wellcome

Trust Centre for Human Genetics, Windmill Road, Headington, OxfordOX3 7BN, England.© 1994 by The American Society of Human Genetics. All rights reserved.0002-9297/94/5506-0023$02.00

association between genotypes of the INS 5' VNTR locus(-365VNTR) and IDDM (Bell et al. 1984). This was laterconfirmed by the demonstration of both linkage and asso-ciation between INS and IDDM Uulier et al. 1991; Bain etal. 1992). All the DNA variants in the INS region that areassociated with increased risk for IDDM have been identi-fied through sequence analysis (Julier et al. 1991; Lucassenet al. 1993). The disease-associated variants reside withina 4.1-kb segment that includes INS; other DNA variantswithin the region of INS but outside this segment are notassociated with increased risk of disease. Strong linkagedisequilibrium is observed between all the disease-associ-ated variants, which have similar relative risks when com-pared in IDDM patients and control population. In previ-ous population studies, it has not been possible to deter-mine which variant(s) is directly responsible for diseasesusceptibility.

Positive association between disease and polymorphicmarkers has been shown in several monogenic diseaseswith low mutation rates, such as cystic fibrosis and Hunt-ington disease. This may help define the region of interest,for further positional cloning of the gene, as in the case ofHuntington disease (MacDonald et al. 1992), or to dissectthe genetic variability and subphenotypes of disease, as inthe case of cystic fibrosis (Kerem et al. 1989). In this situa-tion, the association results from the short distance be-tween the marker and the disease mutation and is modu-lated by recombination events and rare new mutations. Al-though association is suggestive of proximity between themarker and the disease mutation, there is not necessarily adirect relationship between the degree of association andthe distance to the disease mutation.

Association studies have also been used to test candi-date genes for susceptibility to multifactorial diseases, asillustrated by INS and IDDM. The situation here is quitedifferent from that observed in most monogenetic disor-ders, which are characterized by a low frequency of thedisease allele and high penetrance. In the case of IDDM,the INS alleles that are associated with increased suscepti-bility are the most frequent in the general population, butthey have low penetrance. The polymorphism(s) causingthe increased susceptibility is thought to be among theDNA variants that have already been identified. It should

1247

Page 2: Insulin Gene Region in Type I Diabetes

Am.j. Hum. Genet. 55:1247-1254,1994

be possible to distinguish the causative variants from otherpolymorphisms, because the former should exhibit greaterassociation with the disease, although this may be difficultto determine in regions of very strong linkage disequilib-rium such as is observed in the 4.1-kb INS segment. Iden-tification of susceptibility variants among a series of fre-quent polymorphisms within a small region of DNA islikely to be a common problem in the study of multifacto-rial diseases.

Functional investigations involving in vitro expressionor regulation experiments provide one approach to iden-tifying the susceptibility variant(s) in the region. These arelikely to be complex and time-consuming, particularly insituations such as those seen in the INS region, where po-tential interactions between variants at different sites mustbe taken into account and where one of the candidate vari-ants, -365VNTR, is itself a complex polymorphism, withmore than two alleles. Therefore, it is of interest to pursuefurther population investigations, to try to reduce thenumber of variants that will need to be explored in subse-quent functional studies.We have undertaken a study of a large sample of French

IDDM patients and controls, in which this problem wasaddressed by a multiple-DNA-variant-association analysisthat takes into account genotypes at several of the INSIDDM-associated sites simultaneously. Our study pro-vides new information on the relationship between INSpolymorphisms and disease susceptibility. In particular, wedemonstrate that increased risk of IDDM is conferred byone or more of the variants at four sites within the INSregion. A single variant may be solely responsible for in-creased risk or may interact with combinations of variantsat other sites.

Subjects and Methods

Diabetic Patients and ControlsIsolated, i.e., nonfamilial, diabetics were ascertained

from three different sources: 185 from the Hopital Neckerin Paris, 125 from the Hopital Robert Debre in Paris, and95 from the Hospital Regional of Lille. The study included194 additional diabetics who were probands frommultiplex families. All diabetics were of French origin, ke-tone positive at the time of diagnosis, and required dailyinsulin replacement. Additional criteria applied to the se-lection of the diabetic population were age at onset <45years or the presence of islet cell antibodies. Seventy-fivepercent of diabetics had onset at <18 years of age. Isolateddiabetics had no siblings with IDDM at the time of entryinto the study, and probands from multiplex families hadat least one affected first-degree relative. The control pop-ulation consisted of 192 healthy French subjects fromdifferent regions in France. Part of the patient and controlpopulations have been described previously by Julier et al.(1991) and Lucassen et al. (1993).

Genotype CharacterizationPolymorphisms in the INS region were characterized by

Southern blot analysis or by PCR as described by Lucassenet al. (1993), with a slight modification for -2733AC, asexplained below. The nomenclature and the positions ofthe variant sites also have been given by Lucassen et al.(1993). Information on INS genotypes was obtained froma subset of the sites, as some variants were found alwaysto be in complete association, from previous studies (seebelow). In isolated diabetics, the following variantswere characterized: -2733AC, -2221Mspl, -23HphI,+1127PstI, and +805DraIII. In multiplex families, the-2221Mspl, -365VNTR, +1127PstI, and +805DraIIIvariants were characterized in all available family mem-bers, including unaffected individuals, and the -2733ACvariant was determined in probands. In a portion of thecontrol and diabetic population, the -2733AC site wascharacterized by ARMS amplification with a single pair ofprimers, INS68R/INS68C (C-specific amplification forthis variant), which allows differentiation of homozygotesfor the frequent allele (A/A) from heterozygotes and fromhomozygotes for the other allele (A/C and C/C).

Choice of Variant SitesPrevious exploration of polymorphisms in the INS re-

gion (Lucassen et al. 1993) revealed that, among the 10variants associated with IDDM, some had exactly the samegenotypic patterns in a panel of 156 diabetics and 96 con-trols from France. On the basis of this observation, wedivided the variant sites into five groups, two of whichcontain multiple sites that had equivalent genotypepatterns: 1, -2733AC; 2, -2221Mspl; 3, -23HphI/-365VNTR/+1140AC; 4, +805DraIII; and 5, +1127-PstI/+1355TC/+1404Fnu4HI/+1428FokI. Data on thecomplete sample of patients and controls were obtainedby characterizing the variant at one of the sites in each ofthe groups. Although we cannot exclude the possibilitythat a small number of genotypes could be discordant forone or more of the sites within groups 3 and 5, such oc-currences appear to be rare in the French population, andthey will not affect the results given below.

Statistical MethodsThe disease-associated alleles were found to be the most

frequent variant at each site. We denote this generically asthe "+" variant; other variants are denoted as "-." Thefrequency of +/+ homozygotes was increased in IDDMpatients compared with controls, whereas the relative fre-quencies of +/- heterozygotes and -/- homozygoteswere similar in the two groups, for all sites (table 1). There-fore, +/- and -/- genotypes have been combined andcompared with +/+ genotypes in most of the statisticalanalyses. For simplicity, we refer to the +/+ homozygoteas the "disease-associated" or "positive" genotype (de-noted as "+" in the tables) and to others as the "non-

1248

Page 3: Insulin Gene Region in Type I Diabetes

1 .0

+

.0'

-

+

N -

+

+

++-

+--

+

+

+

0

z

re N 0

0- " en INI

fN rs

~NO

NrN N0

_1 N N"

N 00 N N

00 kl - "

V, t- (7-

0 4

0 eCI"N (I -4

00 T N C O

0e 11 (NIr- -

en O en C) C

N 00N 00 ONr

. . .

I

OYN I" 00 00 1-

~00 0 00 00

en 00 -\

C> 0 C) C)

0 0 00 0

- N ri 00

---- -

00 0 N NrN N (I, \0

I",

0 N- -- N-t N I'\ 10oI I" I'l "

. .)

. H

. .+

.c. .~0.

.-..

. . .~

.~ .

. ++

U

4.4"0

oI

5--

'-5

H

V)

w

H

-J

u

HH

F-

0

z0(-

v

-j0

z0

u

U

-J

-I

+

+

2

0u

la£

._

._

(TS

c

cto

2

0

._

0

L.C

M

(6(6

0

0U.

05-

.0I-

Page 4: Insulin Gene Region in Type I Diabetes

Am.J. Hum. Genet. 55:1247-1254,1994

disease-associated" or "negative" genotypes (denoted as"-" in the tables). Patients and controls were grouped bythe pattern of positive and negative genotypes exhibited atone or more of the INS sites, and X2 contingency-tableanalysis was used to compare frequencies in differentgroups.

Results

Single-Site AssociationsTable 1 shows genotype frequencies in IDDM patients

and controls, for the five groups of polymorphic sites de-scribed above. x2 Contingency-table analysis showed thatgenotype frequencies in IDDM patients from multiplexfamilies and from nonfamilial cases ascertained in threedifferent centers did not differ (results not shown), and sothey have been pooled in table 1. As described elsewhere(Bell et al. 1985; Julier et al. 1991), the +/+ genotype isassociated with an increased risk ofIDDM compared witheither the +/- genotype or the -/- genotype. In ourdata, the relative frequencies of the +/- and -/- geno-types were not different in patients and controls, in thefour groups of variants where the three genotypes weredetermined (table 1); therefore, these genotypes werepooled to calculate the relative risks shown in table 1.The estimated relative risks associated with the positive

(+/+) genotype are shown, for each group of variants, intable 1. For the variants in groups 2 (-2221Mspl), 4(+805DraIII), and 5 (+1127PstI/+1355TC/+1404-Fnu4HI/+1428FokI), the estimate range is 2.8-2.9, witha 95% confidence interval (CI) of 4.0-2.0. The relative riskswere slightly greater for groups 1 (-2733AC) and 3(-23HphI/-365VNTR/+1 140AC): 3.2, with 95% CIs of4.5-2.3 and 4.4-2.3, respectively. The statistical tests ofassociation between the positive genotype and diabetes arehighly significant for all five groups of polymorphisms. Therisk estimates for the different sites cannot be distin-guished, because the magnitude of the differences is smallcompared with the breadth of the CIs.

Multiple-Variant Association AnalysisIn order to distinguish the effects of variants at different

sites, we classified individuals by the pattern of + and -genotypes for the five different groups of sites. Four pat-terns were found to account for >98% of the observationsin both the patient and control populations (table 2). Theprincipal patterns, which we denote as "A"-"D," consti-tute a hierarchy in which the disease-associated genotypeis present at a decreasing number of sites. The hierarchyof the principal genotype patterns is as follows: positivegenotypes at all sites in the five groups (pattern A); a nega-tive genotype at -2733AC (group 1) and positive geno-types at sites elsewhere (pattern B); negative genotypes at-2733AC (group 1) and -23HphI/-365VNTR/+1 140AC (group 3), otherwise positive (pattern C); and

negative genotypes at all sites (pattern D). This hierarchyallows the effects of certain genotypes or combinations ofgenotypes to be dissected as described below.

As in the case of single sites, the frequencies of the prin-cipal genotype patterns were similar in the four differentdiabetic groups (Xy = 5.0; P = .091), and consequentlythese were pooled in subsequent analyses. In contrast, di-abetic and control frequencies differed significantly (%2= 51.4; P < .00001). As shown in table 2, the frequency ofthe pattern A (disease-associated or positive genotypes atall sites) is 67% in diabetics, compared with 38% in con-trols, whereas pattern D (negative genotypes at all sites) isfound in 21% of diabetics, compared with 43% of con-trols, which gives a relative risk of 3.5. For the other pat-terns, which are composed of combinations of positiveand negative genotypes, the risks were not different fromthat for pattern D.To further dissect the effects of different sites, we calcu-

lated the risk for pattern C (negative genotypes at the-2733AC and -23HphI/-365VNTR/+1140AC sites)compared with pattern A (positive genotypes at all sites).The A pattern had a significantly higher risk than did the Cpattern (%2 = 51.4; P = .0001). This result indicates thatpositive genotypes at -2221Mspl, +1127PstI, +1355TC,+1404Fnu4HI, +1428FokI, and +805DraIII sites do notconfer increased susceptibility to IDDM in the absence ofpositive genotypes at -2733AC (group 1) and -23HphI/-365VNTR/+1140AC (group 3).The risk for the B pattern (negative genotype at the

-2733AC site and positive genotypes elsewhere) is 1.6 rel-ative to the D pattern; this does not differ significantlyfrom 1 (table 2). Moreover, the risk for the B pattern rela-tive to the C pattern, which differ by the presence or ab-sence of positive genotypes at -23HphI/-365VNTR/+1 140AC (group 3), is also not different from 1.These results suggest that the -23HphI/-365VNTR/

+1140AC sites do not increase the risk of diabetes in theabsence of a positive genotype at the -2733AC site. How-ever, the risks for patterns A and B are not significantlydifferent, despite the difference in the magnitude of therisk estimates shown in table 2. Therefore, the data do notpermit us to reach a conclusion regarding the rela-tive effects of -2733AC (group 1) and -23HphI/-365VNTR/+1140AC (group 3) sites.

Other genotype patterns were observed in <2% of theFrench diabetics and controls. In table 2, one of the rarepatterns has been included with pattern A because of thepresence of a positive genotype at -2733AC; a second hasbeen included with pattern B because of a negative geno-type at -2733AC and positive genotypes at -23HphI/-365VNTR/+1140AC. Two other rare patterns are in-cluded with class C because of negative genotypes at theabove sites and positive genotypes at sites from one or twoof the other variant classes. The results of the statisticalanalysis remain unchanged regardless of whether these ge-

1250

Page 5: Insulin Gene Region in Type I Diabetes

Julier et al.: DNA Variant Association Analysis in IDDM

Table 2

Multiple-Site Genotype Counts (Frequencies) for Different Association Patterns of the Five Groupsof Variants: (1) -2733A/C, (2) -222lMspl, (3) -23Hphl/-365VNTR/+ I 140AC, (4) +805Dralll, and(5) + I 127Pstl/+ I 35STC/+ I 1 404Fnu4HI/+ l428Fokl.

Association Pattern Diabetics Controls Relative Riska Test lb Test 2c

A:+++... ........... 398 (67%) 72 (38%) 3.5 (5.1-2.5) 45.9 -

++-. ............... 0 1 (<1%)B:-++. ............... 29 (5%) 11 (6%) 1.6 (3.4-.8) 1.5 3.2--+............... 0 1 (<1%)

C:-++. ............... 34(6%) 20 (10%) 1.1 (2.0-.6) .0 14.3-+-+............... 6 (1%) 2 (1%)-+--. ............... 3 (<1%) 2 (1%)

D:- -.......... 129 (21%) 83 (43%)

a Calculated for the principal patterns with respect to pattern D.b x2 Statistic vs. pattern D.cX2 Statistic vs. pattern A.

notype patterns are cumulated into the total for each ofthe principal groups or are excluded. Since no individualscarried positive genotypes at -2733AC or -23HphI/-365VNTR/+1140AC, in the absence of positive geno-types at the other sites, it was impossible to determine ifthe risk of diabetes is increased by group 1 or group 3 sitesin isolation.

HaplotypesThe observed genotype patterns suggest that variants at

different sites in the INS region are in strong linkage dis-equilibrium and that a small number of haplotypes ac-count for the majority of our observations. To examinehaplotype frequencies in greater detail, we divided individ-uals into the three genotype classes (+/+, +/-, and-/-) at each site (table 3A), with the exception of-2733AC, for reasons discussed above.The observations could be explained by hypothesizing

seven different haplotype patterns as shown in table 3B.Two possible haplotype interpretations were given fornegative genotypes at the -2733AC site. Except for thiscase, exact haplotypes could be deduced for all multiplexprobands, on the basis of family data, and these were allcompatible with one of the hypothesized patterns (table3A). The frequencies given in table 3B were estimatedfrom the control population by assuming Hardy-Weinbergequilibrium. The test for deviation from Hardy-Weinbergequilibrium was not significant (X21 = 17.1).From these results, we estimate that 86% of haplotypes

in the French control group fall into one of two patterns,consisting of either the positive variants at all sites (haplo-type 1) or the negative variant at all sites (haplotype 2). Inaddition, 13% fall into patterns involving either the pres-

ence of negative variants at the -2733AC sites (positivevariants elsewhere) (haplotype 4) or the presence of nega-tive variants at this site and -23HphI/-365VNTR/+1 140AC (positive variants elsewhere) (haplotype 3). Theother four haplotypes in table 3B account for <1% of theobservations. Other rare haplotypes may be present in thecontrol population or among diabetics for whom familydata were not available.

Discussion

Previous studies have shown that DNA variants at 10sites within a 4.1-kb region of INS are associated with in-creased risk of IDDM. The region is characterized bystrong linkage disequilibrium, and two haplotypes are es-timated to account for >85% of the French control popu-lation. One of these is composed of all the disease-associ-ated variants, and the other is composed of non-disease-associated variants. In previous studies, it has not beenpossible to determine which of the 10 disease-associatedvariants described here are directly implicated in diseasesusceptibility, because of the strong linkage disequilibriumthroughout the region. However, Owerbach and Gabbay(1993) concluded that two other polymorphisms in the 3'region of INS were not responsible for susceptibility, be-cause they appeared in both + and - haplotypes.

It is interesting to compare this situation with that ofHLA association in IDDM. In IDDM, several methodshave been proposed for evaluating the effects of haplo-types involving different combinations of alleles (Thomson1984; Kockum et al. 1993). However, the HLA region ap-pears to contain multiple IDDM susceptibility factors,some of which are likely to be unknown. Therefore, it is

1251

Page 6: Insulin Gene Region in Type I Diabetes

Am.J. Hum. Genet. 55:1247-1254,1994

Table 3

Haplotype Analysis of INS Variants

A. Multiple-Site Genotype Counts, Haplotype Interpretations, and Expected Numbers (Frequencies) in the Control Population,under Hardy-Weinberg Equilibriuma

GENOTYPE AT VARIANT GROUPa DIABETICS

1 2 3 4 5 HAPLOTYPEb Multipex Probands Isolated Patients CONTROLS

Pattern A .......... ++ ++ ++ ++ ++ 1/1 133 265 72 (71.1)++ ++ +- +- +- 1/5 0 0 1 (1.9)

Pattern B .......... - ++ ++ ++ ++ 1/4 or 3 26 11 (12.9)-- ++ ++ ++ ++ 4/4+- +- ++ ++ ++ 4/6 or 0 0 1 (.7)__ +- ++ ++ ++ 4/6

Pattern C .......... ++ + ++ ++ 1/3 or 12 21 16 (20.0)-- ++ +- ++ ++ 3/4-- ++ -- ++ ++ 3/3 0 1 4 (1.2)+- ++ +- +- ++ 1/7 or 0 6 2 (1.3)-- ++ +- +- ++ 4/7+- ++ +- +- +- 1/8or4/5 1 2 2(.2)

Pattern D .......... +- +- +- +- +- 1/2, 5/6 or 38 72 66 (62.2)-- +- +- +- +- 2/4-- +- -- +- +- 2/3 3 3 6 (7.4)-- -- -- -- -- 2/2 4 9 11 (11.5)

B. Haplotype Characterization and Frequencies Estimated in the Control Population

VARIANT FOR GROUP FREQUENCYESTIMATES FROM

HAPLOTYPE 1 2 3 4 5 CONTROLS

1 ................+ + + + + .612...........253 ................+ + + .084 ...........+ + + + .055 ...... + + - - - <.016 ................+ + + <.017 ................+ + <.018 ................ <.01

a Labels for the groups are as defined in table 2.b Alternative haplotype interpretations are given either in instances where the genotype at the -2733A/C site could be either +/-or -/- or in

other instances of ambiguity.

not possible to discriminate between all possible variantsthat could be associated with disease susceptibility. Forthe 4.1-kb INS region, the problem consists of identifyinga single polymorphism, or a combination of polymor-phisms, that is responsible for susceptibility, when know-ing all the variants that are potentially implicated in thedisease. This is likely to become a frequent problem in in-vestigations of other multifactorial diseases, and we referto the method used here to address it as "multiple-DNA-variant-association analysis."

Multiple-variant-association analysis in French IDDMpatients and healthy controls allowed us to examine theeffects of some variants or combinations of variants at

different sites in the 4.1-kb region of INS, which is associ-ated with IDDM. Because of strong linkage disequilib-rium, this approach requires that a large number of pa-tients and controls be studied, in order to identify a suffi-cient number that are discordant at different sites in theregion (i.e., not all + or all -). In the French population,the 10 variant sites were divided into five groups based onthe absence of discordant phenotypes between two sitesin the same group in previous studies; variants within thesame group have equivalent risks. Comparison of risks fordifferent combinations of groups led to the conclusionthat +/+ genotypes at six sites (-222lMspI, +805DraIII,+1127PstI, +1355TC, +1404Fnu4HI, and +1428Fokl)

1252

Page 7: Insulin Gene Region in Type I Diabetes

Julier et al.: DNA Variant Association Analysis in IDDM

do not lead to increased risk for IDDM in the absence of+/+ genotypes at four other sites (-2733AC, -23HphI,-365VNTR, and +1140AC). These results illustrate thepower of the approach to discriminate between variants instrong linkage disequilibrium, on the sole basis of associa-tion data. On the other hand, we did not have a sufficientnumber of discordant observations to reach a conclusionabout the effect of +/+ genotypes for the -23HphI/-365VNTR/+1140AC group in the absence of the +/+genotype at -2733AC.The four variants that are associated with increased

risk for IDDM in the present study are -2733AC,-365VNTR, -23HphI, and +1 140AC. Although no clearfunctional or physiological evidence yet exists for suspect-ing these or any of the other IDDM-associated variants(Lucassen et al. 1993), we must now consider these fourvariants more carefully, for their potential specific biolog-ical significance. The first, -2733AC, is located in the in-terval between tyrosine hydroxylase (TH) (3' of it) and in-sulin, in a segment that has been shown to contain regula-tory elements to TH (Gandelman et al. 1990); because ofits location 5' of INS it could also be involved in INS regu-lation, although it does not affect any known regulatoryelement. TH is the rate-limiting enzyme in the synthesis ofcatecholamines. As these are involved in a wide range offunctions in the CNS and in peripheric sympathetic neu-

rons, which exert control on pancreatic islet cells and par-

ticularly on insulin secretion, TH could be considered acandidate gene in IDDM.The second variant site, -23HphI, is located in the first

intron of INS, 6 bp before the intron A/exon 2 splice site,and may affect the efficiency of the mRNA processing: the3' splice site is TCCCAG for the - allele and ACCCAG forthe + allele, so that the pyrimidine track from the consen-sus 3' splice, (Py)nNPyAG, is altered in the latter. This typeof alteration has been shown to be particularly critical forefficient splicing of small exons (Dominski and Kole 1992).The third variant site, +1140AC, is located in the 3' UTRof INS mRNA and could affect mRNA stability.

In regard to the -365VNTR, it has been suggested thatit contains negative regulation activity (Takeda et al. 1989;Docherty 1992), which could be responsible for a lowerlevel of expression of INS adjacent to long VNTR alleles(i.e., class 3). In agreement with this hypothesis, Ham-mond-Kosack and colleagues have shown that the INSVNTR adopts quadriplex structure in vitro (Hammond-Kosack et al. 1993) and that it exists under different struc-tural conformations in INS secreting and nonsecretingcells (Hammond-Kosack et al. 1992), suggesting that a par-

ticular VNTR structure may be linked to INS expression.It has been shown that long and short VNTR alleles differin their repeat composition as well as in the number ofrepeated units (Owerbach and Aagard 1983; Owerbachand Gabbay 1993); this heterogeneity may also alter theconformation of the VNTR and, possibly, the level of ex-

pression of the adjacent INS.

Some arguments about functional significance of thevariants can also be derived from the consideration of in-terspecies conservation of surrounding sequences. Se-quences around -23HphI are highly conserved in otherprimates and show some conservation in most other spe-cies, as can be expected from its splice-site location andproximity to coding sequences. VNTR sequences flankingINS have been found in chimpanzee but not in other pri-mates (owl monkey and green monkey) and in neither ofthe expressed genes in rodents. Sequences around+1 140AC show very little conservation between species,including primates, and a segment of -50 bp includingthis site is deleted in chimpanzee. These observations showthat both the VNTR and +1 140AC region are not essen-tial for INS function. However, both the regulation of INSexpression and the insulin involvement in susceptibility toIDDM could be species dependent, so that we cannotdraw any firm conclusions from the latter observations.

Finally, the increased susceptibility may result from theinteraction between several of the polymorphisms. Thepredominance of two haplotypes with opposite alleles atall 10 variant sites in the region could indicate that thesecombinations have been maintained by selection, and itsuggests that combinations of alleles could have functionalsignificance. The study of different ethnic groups couldlead to the detection of other combinations that are notfound in the French population and may provide furtherinsight into IDDM susceptibility associated with othercombinations of variants, and into the mechanisms thatare responsible for producing and maintaining haplotypevariation. Since the different groups of completely associ-ated variants used here were defined in the French popula-tion, studies in other ethnic groups should consider thefull collection of polymorphisms, to obtain data on possi-ble interactions.Although linkage and association-based studies have

been able to show the existence of a susceptibility factorin the INS region and now point to four variants as candi-dates for the determination of the IDDM susceptibility,these sole methods, applied to Caucasians, will not allowfurther discrimination between these remaining variants,because of their strong association. Similar associationstudies of populations from different ethnic origins, wherethe association between INS variants may be different,could help further discrimination, if a positive associationbetween some INS variant and IDDM is first demonstratedin the corresponding population. Finally, functional stud-ies or considerations may be able to discriminate betweenvariants and may lead to an understanding of the mecha-nism responsible for the IDDM susceptibility encoded bythe INS region.

AcknowledgmentsWe thank J. Mallet for information and helpful discussions on

the TH gene. This work was supported by the Wellcome Trust,

1253

Page 8: Insulin Gene Region in Type I Diabetes

1254 Am.J. Hum. Genet. 55:1247-1254,1994

the JDFI, and Centre Hospitalier et Universitaire de Lille grant91-11.

ReferencesBain SC, Prins JB, Hearne CM, Rodrigues NR, Rowe BR, Pritch-

ard LE, Ritchie RJ, et al (1992) Insulin gene region-encodedsusceptibility to type I diabetes is not restricted to HLA-DR4-positive individuals. Nature Genet 2:212-215

Bell GI, Horita S, Karam JH (1984) A polymorphic locus nearthe human insulin gene is associated with insulin-dependentdiabetes mellitus. Diabetes 33:176-183

Bell GI, Karam JH, Raffel Li, Hitman GA, Yen PH, Galton DJ,Bottazzo GF, et al (1985) Recessive inheritance for the insulinlinked IDDM predisposing gene. Am J Hum Genet Suppl 37:A188

Docherty K (1992) 1992 RD Lawrence lecture: the regulation ofinsulin gene expression. Diabet Med 9:792-798

Dominski Z, Kole R (1992) Cooperation of pre-mRNA sequenceelements in splice site selection. Mol Cell Biol 12:2108-2114

Gandelman K-Y, Coker GT III, Moffat M, O'Mally KL (1990)Species and regional differences in the expression of cell-typespecific elements at the human and rat tyrosine hydroxylasegene loci. J Neurochem 55:2149-2152

Hammond-Kosack M, Kilpatrick M, Docherty K (1992) Analysisof DNA structure in the insulin gene-linked polymorphic re-gion in vivo. J Mol Endocrinol 9:221-225

(1993) The human insulin gene-linked polymorphic re-gion adopts a G-quartet structure in chromatin assembled invitro. J Mol Endocrinol 10:121-126

Julier C, Hyer RN, Davies J, Merlin F, Soularue P, Briant L, Ca-thelineau G, et al (1991) Insulin-IGF2 region on chromosome

1 lp encodes a gene implicated in HLA-DR4-dependent diabe-tes susceptibility. Nature 354:155-159

Kerem BS, Rommens JM, Buchanan JA, Marliewicz D, Cox TK,Chakravarti A, Buchwald M, et al (1989) Identification of thecystic fibrosis gene: genetic analysis. Science 245:1073-1080

Kockum I, Wassmuth R, Holmberg E, Michelsen B, Lernmark A(1993) HLA-DQ primarily confers protection and HLA-DRsusceptibility in type I (insulin-dependent) diabetes studied inpopulation-based affected families and controls. Am J HumGenet 53:150-167

Lucassen AM, Julier C, Beressi JP, Boitard C, Froguel P, LathropM, Bell JI (1993) Susceptibility to insulin dependent diabetesmellitus maps to a 4.1 kb segment ofDNA spanning the insulingene and associated VNTR (1993). Nature Genet 4:305-310

MacDonald ME, Novelletto A, Lin C, Tagle D, Barnes G, BatesG, Taylor S, et al (1992) The Huntington's disease candidateregion exhibits many different haplotypes. Nature Genet 1:99-103

Owerbach D, Aagard L (1983) Analysis of a 1963-bp polymor-phic region flanking the human insulin gene. Genetics 32:475-479

Owerbach D, Gabbay KH (1993) Localization of a type I diabetessusceptibility locus to the variable tandem repeat region flank-ing the insulin gene. Diabetes 42:1708-1714

Takeda J, Ishii S, Seino Y., Imamoto F, Imura H (1989) Negativeregulation of human insulin gene expression by the 5'-flankingregion in non-pancreatic cells. FEBS Lett 247:41-45

Thomson G (1984) HLA DR antigens and susceptibility to insu-lin-dependent diabetes mellitus. Am J Hum Genet 36:1309-1317.

Thomson G. Robinson WP, Kuhner MK, Joe S, MacDonald MJ,Gottschall JL, Barbosa J, et al (1988) Genetic heterogeneity,modes of inheritance, and risk estimates for a joint study ofCaucasians with insulin-dependent diabetes mellitus. Am JHum Genet 43:799-816