deciphering the ancient and complex evolutionary history ...francesca luca, antti sajantila, doron m...

16
Deciphering the ancient and complex evolutionary history of human arylamine N-acetyltransferase genes. Etienne Patin, Luis B Barreiro, Pardis C Sabeti, Fr´ ed´ eric Austerlitz, Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version: Etienne Patin, Luis B Barreiro, Pardis C Sabeti, Fr´ ed´ eric Austerlitz, Francesca Luca, et al.. De- ciphering the ancient and complex evolutionary history of human arylamine N-acetyltransferase genes.. American Journal of Human Genetics, Elsevier (Cell Press), 2006, 78 (3), pp.423-36. <10.1086/500614>. <pasteur-00169326> HAL Id: pasteur-00169326 https://hal-pasteur.archives-ouvertes.fr/pasteur-00169326 Submitted on 10 Sep 2007 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destin´ ee au d´ epˆ ot et ` a la diffusion de documents scientifiques de niveau recherche, publi´ es ou non, ´ emanant des ´ etablissements d’enseignement et de recherche fran¸cais ou ´ etrangers, des laboratoires publics ou priv´ es.

Upload: others

Post on 15-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

Deciphering the ancient and complex evolutionary

history of human arylamine N-acetyltransferase genes.

Etienne Patin, Luis B Barreiro, Pardis C Sabeti, Frederic Austerlitz,

Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj

Sakuntabhai, Nicole Guiso, et al.

To cite this version:

Etienne Patin, Luis B Barreiro, Pardis C Sabeti, Frederic Austerlitz, Francesca Luca, et al.. De-ciphering the ancient and complex evolutionary history of human arylamine N-acetyltransferasegenes.. American Journal of Human Genetics, Elsevier (Cell Press), 2006, 78 (3), pp.423-36.<10.1086/500614>. <pasteur-00169326>

HAL Id: pasteur-00169326

https://hal-pasteur.archives-ouvertes.fr/pasteur-00169326

Submitted on 10 Sep 2007

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinee au depot et a la diffusion de documentsscientifiques de niveau recherche, publies ou non,emanant des etablissements d’enseignement et derecherche francais ou etrangers, des laboratoirespublics ou prives.

Page 2: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:
Page 3: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

www.ajhg.org Patin et al.: Evolutionary Patterns in the NATs Region 423

Deciphering the Ancient and Complex Evolutionary History of HumanArylamine N-Acetyltransferase GenesEtienne Patin,1,5 Luis B. Barreiro,1 Pardis C. Sabeti,6 Frederic Austerlitz,7 Francesca Luca,8Antti Sajantila,9 Doron M. Behar,10 Ornella Semino,11 Anavaj Sakuntabhai,2 Nicole Guiso,1Brigitte Gicquel,3 Ken McElreavey,4 Rosalind M. Harding,12 Evelyne Heyer,5 andLluıs Quintana-Murci1

1Centre National de la Recherche Scientifique (CNRS) FRE 2849, Unit of Molecular Prevention and Therapy of Human Diseases, 2Unite desMaladies Infectieuses et Autoimmunes, 3Unite de Genetique Mycobacterienne, and 4Unit of Reproduction, Fertility and Populations, InstitutPasteur, and 5CNRS UMR 5145, Musee de l’Homme, Paris; 6Whitehead Institute/MIT Center for Genome Research, Cambridge, MA;7University of Paris-Sud, Orsay, France; 8University of Calabria, Rende, Italy; 9University of Helsinki, Helsinki, Finland; 10Bruce RappaportFaculty of Medicine and Research Institute, Technion and Rambam Medical Center, Haifa, Israel; 11University of Pavia, Pavia, Italy; and12University of Oxford, Oxford, United Kingdom

The human N-acetyltransferase genes NAT1 and NAT2 encode two phase-II enzymes that metabolize various drugsand carcinogens. Functional variability at these genes has been associated with adverse drug reactions and cancersusceptibility. Mutations in NAT2 leading to the so-called slow-acetylation phenotype reach high frequencies world-wide, which questions the significance of altered acetylation in human adaptation. To investigate the role ofpopulation history and natural selection in shaping NATs variation, we characterized genetic diversity through theresequencing and genotyping of NAT1, NAT2, and the pseudogene NATP in a collection of 13 different populationswith distinct ethnic backgrounds and demographic pasts. This combined study design allowed us to define a detailedmap of linkage disequilibrium of the NATs region as well as to perform a number of sequence-based neutralitytests and the long-range haplotype (LRH) test. Our data revealed distinctive patterns of variability for the twogenes: the reduced diversity observed at NAT1 is consistent with the action of purifying selection, whereas NAT2functional variation contributes to high levels of diversity. In addition, the LRH test identified a particular NAT2haplotype (NAT2*5B) under recent positive selection in western/central Eurasians. This haplotype harbors themutation 341TrC and encodes the “slowest-acetylator” NAT2 enzyme, suggesting a general selective advantagefor the slow-acetylator phenotype. Interestingly, the NAT2*5B haplotype, which seems to have conferred a selectiveadvantage during the past ∼6,500 years, exhibits today the strongest association with susceptibility to bladdercancer and adverse drug reactions. On the whole, the patterns observed for NAT2 well illustrate how geographicallyand temporally fluctuating xenobiotic environments may have influenced not only our genome variability but alsoour present-day susceptibility to disease.

Received September 9, 2005; accepted for publication December 21, 2005; electronically published January 13, 2006.Address for correspondence and reprints: Dr. Lluis Quintana-Murci, CNRS FRE 2849, Unit of Molecular Prevention and Therapy of Human

Diseases, Institut Pasteur, 25 rue Dr. Roux, 75724 Paris Cedex 15, France. E-mail: [email protected]. J. Hum. Genet. 2006;78:423–436. � 2006 by The American Society of Human Genetics. All rights reserved. 0002-9297/2006/7803-0011$15.00

The two human N-acetyltransferase genes, NAT1 (MIM108345) and NAT2 (MIM 243400), represent one ofthe first and clearest examples of the importance of ge-netic variation among individuals and across popula-tions in drug response (Weber 1987). The two homolo-gous genes are situated within a 200-kb region in 8p22,together with the NATP pseudogene (fig. 1). Both genesencode phase-II enzymes named “arylamine N-acetyl-transferases” (NATs), which catalyze the transfer of anacetyl group to different arylhydrazines and arylaminedrugs (Blum et al. 1990). Both genes carry functionalpolymorphisms whose effects on enzymatic activity havebeen well studied (Hein et al. 2000). Whereas the var-iants associated with reduced activity attain only lowfrequencies in NAT1, they constitute common poly-morphisms in NAT2 (Upton et al. 2001). Two main

classes of NAT2 phenotypes are therefore observed: the“fast-acetylation” phenotype, which refers to the wild-type acetylation activity, and the “slow-acetylation”phenotype, which results in reduced protein activity. Inaddition, NAT1 and NAT2 metabolize numerous com-mon carcinogens, and variation in these genes can resultin varying susceptibility to cancer (for a review, see thework of Hein [2002]). For example, the slow-acetylatorNAT2 phenotype has been associated with side effectsto the commonly used antitubercular isoniazid (Huanget al. 2002) and with higher risk for bladder cancer(Cartwright et al. 1982; Garcia-Closas et al. 2005). Nev-ertheless, most NAT2 mutations leading to the slow phe-notype are found at high frequencies worldwide, callinginto question the role of altered acetylation in humanadaptation. Moreover, the function of NATs in medi-

Page 4: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

424 The American Journal of Human Genetics Volume 78 March 2006 www.ajhg.org

Figure 1 Schematic representation of the NATs region spanning1200 kb. Sequenced loci are represented by boxes (black boxes pcoding regions; gray boxes p flanking regions; white boxes p inter-genic regions), and arrows indicate the positions of genotyped SNPs.

ating the interactions between humans and their xeno-biotic environment, which varies depending on diet andlifestyle, makes them excellent targets for the action ofnatural selection. Indeed, several studies have identifiedthe signature of different selective pressures in genes in-volved in the metabolism of exogenous substances, in-cluding the members of the CYP3A family (Thompsonet al. 2004), CYP1A2 (Wooding et al. 2002), LCT (Ber-saglieri et al. 2004), TAS2R16 (Soranzo et al. 2005),PTC (Wooding et al. 2004), HFE (Toomajian and Kreit-man 2002; Toomajian et al. 2003), MDR1 (Tang et al.2004), and MRP1 (Wang et al. 2005).

The main objective of the present study was to in-vestigate the evolutionary history of the NATs region byunraveling the relative influences of natural selection andhuman demography in determining its present-day var-iability. With this goal in mind, we first resequencedNAT1, NAT2, and the pseudogene NATP in a multi-ethnic panel of 80 individuals (referred to as the “re-sequencing panel”). To further investigate the globallinkage disequilibrium (LD) patterns in the NATs region,we selected 21 SNPs—including 5 NAT1 and 7 NAT2SNPs retrieved from the initial sequence-based data setas well as 9 intergenic SNPs—to cover the entire 200-kb region (fig. 1). These markers were all genotyped inan extended collection of 563 individuals (referred to asthe “genotyping panel”) originating from 13 differentethnologically well-defined human populations. Coales-cent methods, sequence-based neutrality tests, and thelong-range haplotype (LRH) test (Sabeti et al. 2002)were performed to provide insight into the role of thesegenes in human adaptation to geographically and his-torically fluctuating xenobiotic environments.

Subjects and Methods

DNA Samples

The resequencing panel consisted of 80 individuals (160chromosomes) from eight populations representing major geo-graphic regions; sub-Saharan African chromosomes were rep-resented by Bakola Pygmies from Cameroon (20) and by Bantuspeakers from Gabon (20); western Eurasian samples wererepresented by Ashkenazi Jews (20), Sardinians (12), French

(20), and Saami from Finland (20); and eastern Eurasian sam-ples were represented by Indians from Gujarat (20) and byThai (28). One chimpanzee (Pan troglodytes) was also se-quenced to define the ancestral state of each mutation. Thegenotyping panel consisted of 563 individuals (1,126 chro-mosomes) from 13 populations. Sub-Saharan African chro-mosomes were represented by Bakola Pygmies (80) and BakaPygmies (60) from Cameroon, Ateke Bantu speakers from Ga-bon (100), and Somali (48); North African and western andcentral Eurasian samples were represented by Morrocans (88),Ashkenazi Jews (80), Sardinians (98), Swedes (100), Saamifrom Finland (96), and Turkmen from Uzbekistan (100); andeastern Eurasian samples were represented by Gujarati fromIndia (100), Chinese from the Hunan and Zhejang regions(88), and Thai (88). All individuals were healthy donors fromwhom informed consent was obtained.

PCR and Sequence Determination

Six different regions were PCR amplified, for a total of ∼8.5kb per chromosome (fig. 1): the entire coding exon of theNAT1 gene (870 bp) and 1,735 bp of noncoding flanking parts(1,122 bp in 5′ end and 613 bp in 3′ end); the entire codingexon of the NAT2 gene (870 bp) and 1,950 bp of noncodingflanking parts, including 1,603 bp surrounding its first non-coding exon; the pseudogene NATP (2,145 bp); and two in-tergenic noncoding regions at 10 kb and 100 kb (1,068 bp)from NAT1 5′ end. Details about PCR and sequencing con-ditions are available on request. As a measure of quality con-trol for the data, individuals presenting singletons or ambig-uous polymorphisms were reamplified and resequenced. Se-quences were analyzed using the GENALYS software (Taka-hashi et al. 2003).

Selection and Genotyping of Polymorphisms

The newly discovered sequence-based variation was used todetermine the minimal number of SNPs able to distinguish thehaplotypic diversity (haplotype-tagging SNPs [htSNPs]) ofNAT1 and NAT2 loci in a given population. Five SNPs andone (TAA)n microsatellite were typed in NAT1 by either ge-notyping or sequencing, and seven SNPs were genotyped inthe NAT2 coding region. In addition, we genotyped nine in-tergenic SNPs selected because they were polymorphic in allhuman populations (fig. 1). These SNPs were chosen eitherfrom dbSNP (when dbSNPs met the previous criterion) or fromthe intergenic regions sequenced here. Genotyping was per-formed by either fluorescence polarization (VICTOR-2TMtechnology) or TaqMan (ABI Prism-7000 Sequence DetectionSystem) assays.

Sequence-Based Data Analysis

Allele frequencies were determined by gene counting, anddeviations from Hardy-Weinberg equilibrium were tested byArlequin v.2.001 (Schneider et al. 2000). Haplotype recon-struction was performed using the Bayesian method imple-mented in PHASE v.2.1.1 (Stephens and Donnelly 2003), andhtSNPs were defined using BEST v.1.0 (Sebastiani et al. 2003),after the exclusion of singletons because they could not bepositioned with certainty on a given haplotypic context. With

Page 5: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

www.ajhg.org Patin et al.: Evolutionary Patterns in the NATs Region 425

Table 1

Polymorphisms Identified through the ResequencingSurvey of the NATs Region

The table is available in its entirety in the onlineedition of The American Journal of Human Genetics.

the use of phased data, the neutral parameter vML and the timesince the most recent common ancestor (TMRCA) were estimatedby maximum likelihood with GENETREE (Griffiths and Ta-vare 1994), under a standard coalescent model. Since thismodel assumes no recombination, for this particular analysiswe had to exclude a few SNPs or rare recombinant haplotypes(in NAT1, the first four 5′ SNPs; in NATP, three singletonhaplotypes; in NAT2, two singleton haplotypes). Time, scaledin 2Ne units, was converted into years by use of a 25-yeargeneration time and an Ne value obtained as vML divided by4m. The mutation rate per gene per generation (m) was deducedfrom Dxy, the average number of nucleotide substitutions persite between human and chimpanzee (Nei 1987, equation10.20), calculated by DnaSP v.4.0 (Rozas et al. 2003), withconsideration that the two species diverged 200,000 genera-tions ago. Simulations were performed to estimate the prob-ability of a TMRCA greater than a given value, under a Wright-Fisher model. Fifty thousand simulations were performed usinga version of the MS program modified to obtain TMRCA values(R. Hudson, personal communication).

Using DnaSP, we calculated the nucleotide diversity (p) andWatterson’s estimator of v (vW) (Watterson 1975), and we per-formed a number of statistical tests: Tajima’s D (TD) (Tajima1989), Fu and Li’s F* (Fu and Li 1993), Fay and Wu’s H (Fayand Wu 2000), KA/KS (Kimura 1968), the Hudson-Kreitman-Aguade (HKA) test (Hudson et al. 1987), and the McDonald-Kreitman (MK) test (McDonald and Kreitman 1991). A neu-trality test based on the expected heterozygosity was alsoperformed with the Bottleneck program (Cornuet and Luikart1996) on the NAT1 3′ UTR microsatellite, by the use of co-alescent simulations (10,000 runs) and with the assumptionof different mutational models (stepwise mutation model andtwo-phased mutation model with 0%–40% of multistepchanges).

Genotyping-Based Data Analysis

Pairwise LD between the 21 genotyped SNPs was estimatedafter the exclusion, in each population, of SNPs with a minor-allele frequency (MAF) !0.10. Using DnaSP, we calculated thestatistics D′ (Lewontin 1964) and r2 (Hill and Robertson 1968)and tested their statistical significance, using a Fisher’s exacttest followed by Bonferroni corrections. To perform the LRHtest, we selected two core regions (in NAT1, SNPs 445, 1088,1095, and 1191; in NAT2, SNPs 341, 481, 590, 803, and 857)identified as haplotype blocks, following the criteria of Gabrielet al. (2002), and we assessed, for each core haplotype, itsrelative extended haplotype homozygozity (REHH) 200 kbapart. To test the significance of potentially selected core hap-lotypes, we first compared our sub-Saharan African and non-African data sets with coalescent simulations of 1-Mb regions,assuming a neutral model of evolution with recombination(Hudson 2002). Model parameters (including demographyand recombination rate) were consistent with current estimatesfor African and non-African populations (Schaffner et al.2005). Similarly, our sub-Saharan African and non-Africandata sets were compared with the empirical distribution of“core haplotype frequencies versus REHH” obtained from thescreening of the entire chromosome 8 in Yoruban and Euro-pean-descent populations, respectively (HapMap database).

To infer the population growth rate, r, and the age, g, ofNAT2 nonsynonymous mutations, we used a joint maximum-likelihood estimation of these parameters, as described in Aus-terlitz et al. (2003). We compared these results with coalescent-based estimations of the two parameters: the growth rateestimation of Slatkin and Bertorelle (2001) and the Reeve andRannala (2002) age estimation using the DMLE� v.2.2 soft-ware. One million iterations were performed for each esti-mation. The recombination parameter required for these anal-yses was estimated by comparing deCODE and Marshfieldgenetic and physical distances in the NATs region (UCSC Ge-nome Bioinformatics). The coefficient of selection, s, of theNAT2 mutation 341TrC was estimated using the determin-istic equation 3.29 of Wright (1969), which relates the fre-quency of an allele in generation to its frequency in gen-t � 1eration t. We stated the degree of dominance, h, to 0.0(recessivity) and 0.5 (codominance). We assumed the fre-quency, p0, of the C allele before selection to vary between0.05 and 0.15 (corresponding to the allele frequency in Pyg-mies and eastern Eurasians). Making these assumptions, wecalculated the s values that would yield a frequency of 0.50(the present-day frequency of the 341C allele in western Eu-rasians) from its initial p0 frequency in g generations.

Results

NATs Nucleotide Sequence Variation

The initial sequencing screening of the resequencingpanel yielded a total of 111 mutations, including 68transitions, 34 transversions, 8 insertions/deletions, and1 triallelic microsatellite (table 1) (GenBank accessionnumbers DQ305496–DQ305975). In NAT1, we ob-served 2 nonsynonymous and 4 synonymous SNPs in itscoding region and 26 SNPs and the triallelic (TAA)n

microsatellite in its flanking regions. In the NAT2 codingregion, we found two synonymous and eight nonsynon-ymous mutations, three of which were newly identified(L24I, T193M, and Y208H). These three variants weresingletons and were restricted to sub-Saharan samples.In addition, 14 SNPs and 3 indels in NAT2 flankingregions were observed. In NATP, we identified 32 SNPsand 5 indels. For all the SNPs, only 1.54% of the testsdeparted significantly from Hardy-Weinberg equilib-rium. However, these few tests would become nonsig-nificant after a correction for multiple testing.

The nucleotide diversity, p, in the NAT1 and NAT2coding and flanking regions as well as in the pseudogeneNATP is reported in table 2. Interestingly, the two du-plicated NAT genes, which share 87.3% of nucleotide

Page 6: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

426 The American Journal of Human Genetics Volume 78 March 2006 www.ajhg.org

Table 2

Diversity Indices of NAT1, NAT2, and NATP Sequences

POPULATION

NAT1 CODING

REGION

(870 bp)

NAT1 CODING

AND FLANKING

REGIONS

(2,605 bp)

NAT2 CODING

REGION

(870 bp)

NAT2 CODING

AND FLANKING

REGIONS

(2,820 bp)NATP PSEUDOGENE

(2,145 bp)

vW

(%)p

(%) HDvW

(%)p

(%) HDvW

(%)p

(%) HDvW

(%)p

(%) HDvW

(%)p

(%) HD

Bakola .065 .023 .100 .108 .118 .821 .194 .195 .768 .110 .113 .863 .263 .223 .863Bantu .000 .000 .000 .087 .101 .811 .259 .283 .789 .140 .136 .837 .368 .247 .884Ashkenazi .000 .000 .000 .043 .043 .353 .162 .295 .611 .130 .143 .742 .092 .107 .821Sardinian .000 .000 .000 .076 .091 .667 .190 .277 .621 .141 .135 .636 .093 .085 .758French .097 .034 .100 .238 .126 .516 .162 .270 .721 .130 .136 .789 .079 .079 .653Saami .000 .000 .000 .043 .065 .563 .194 .278 .747 .140 .179 .853 .092 .100 .789Gujarati .097 .034 .100 .227 .110 .432 .194 .277 .732 .130 .171 .837 .079 .089 .837Thai .012 .056 .204 .227 .170 .733 .177 .227 .728 .146 .164 .812 .084 .106 .870

Total .012 .021 .073 .217 .112 .662 .203 .275 .767 .151 .157 .841 .297 .134 .855

NOTE.—HD p haplotype diversity.

Table 3

Allelic Composition and Frequency of NAT1Haplotypes in Our Resequencing Panel

The table is available in its entirety in the onlineedition of The American Journal of Human Genetics.

identity in their coding region, showed completely dif-ferent diversity levels, with the NAT2 coding region( ) being 13 times more diverse than thep p 0.275%NAT1 coding region ( ). To evaluate wheth-p p 0.021%er these differences were due to the local variation insubstitution rates, we estimated the mutation rates pergeneration per nucleotide of NAT1 and NAT2 codingregions as well as that of NATP, which equaled 3.73 #10�8, 3.96 # 10�8, and 5.94 # 10�8, respectively.

Haplotype Diversity and TMRCA Estimates of NAT Loci

Haplotypes of the NAT genes were reconstructed firstby use of the sequence data obtained from the resequ-encing panel (tables 3 and 4). The most unusual obser-vation was the haplotype diversity of the NAT1 locus,made of two highly divergent haplotype clusters sepa-rated by 17 mutations. One cluster contained most ofNAT1 haplotype diversity (97.5%), whereas the othercontained a unique haplotype, called “NAT1*11A,”that was observed in just three non-African individuals(one heterozygous French, one heterozygous Gujarati,and one homozygous Thai). Consequently, diversity es-timates of the NAT1 locus were inflated in these threepopulations, as attested by the much higher vW valuesin the French, Gujarati, and Thai samples, as comparedwith all other populations (table 2).

As for the genotyping panel, NAT1 and NAT2 hap-lotype frequencies are reported in tables 5 and 6. TwoNAT1 haplotypes, NAT1*10 and NAT1*4, account for85%–100% of NAT1 diversity. In addition, genotypingresults confirmed the sequence data, in that the diver-gent and low-frequency NAT1*11A haplotype is re-stricted to Eurasian populations. As for NAT2, the an-cestral haplotype NAT2*4 and the remaining haplotypesassociated with the fast-acetylator phenotype (NAT2*12and NAT2*13) were most frequent in Bakola Pygmies

and eastern Eurasians. Among the derived haplotypesassociated with the slow-acetylator phenotype, NAT2*14(191GrA) was African specific, NAT2*7 (857GrA)was observed mainly in eastern Eurasians, NAT2*5(341TrC) was common in western and central Eura-sians as well as in sub-Saharan Africans (Pygmies ex-cepted), and NAT2*6 (590GrA) was found ubiqui-tously at intermediate frequencies.

To investigate the tree topology and time depth of thethree NAT loci, we next estimated the gene tree and theTMRCA of NAT1, NAT2, and NATP. The two divergentNAT1 lineages coalesced 2.01 � 0.29 million years ago(MYA) (fig. 2), one of the highest estimated TMRCA valuesin the human genome (Excoffier 2002). By contrast, theTMRCA values of NAT2 (1.01 � 0.27 MYA) and NATP(1.05 � 0.24 MYA) were in agreement with neutralexpectations, since most human neutral loci should co-alesce ∼4Ne generations ago (i.e., ∼1 MYA) (Takahata1993).

Population Variation in LD Patterns

To determine global LD patterns in the 13 populationsof our genotyping panel, we estimated D′ and r2 for theNATs region (data not shown). Both NAT1 and NAT2genes showed significant and strong intragenic LD levels:the proportion of SNP pairs in significant LD equaled87.5% and 89.6% in Africans and non-Africans, re-spectively, at the NAT1 locus and 73.7% and 84.0% atNAT2. The genomic structure of the entire 200-kb NATs

Page 7: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

www.ajhg.org Patin et al.: Evolutionary Patterns in the NATs Region 427

Table 4

Allelic Composition and Frequency of NAT2Haplotypes in Our Resequencing Panel

The table is available in its entirety in the onlineedition of The American Journal of Human Genetics.

Table 5

Allelic Composition and Frequency of NAT1Haplotypes in Our Genotyping Panel

The table is available in its entirety in the onlineedition of The American Journal of Human Genetics.

region was made of two independent haplotype blocks,one corresponding to NAT1 and the other to NATP andNAT2. Further, we observed strong population variationin LD levels when plotting the proportion of SNP pairsin significant LD against physical distance for each pop-ulation separately (fig. 3). Sub-Saharan Africans showedlower LD levels than non-Africans, with the clear ex-ception of Bakola Pygmies. Both western and easternEurasians exhibited similar LD patterns, excluding theSaami, who were, by far, the population with the highestdegree of allelic association.

Tests of the Standard Neutral Model

The absence of LD between the two NAT genes in allpopulations enabled us to study the evolutionary forcesindependently shaping NAT1 and NAT2 diversity. Forthe NAT1 coding region, tests were not feasible in fourof eight populations because of a complete absence ofexonic variation (table 2). In the remaining populations,most Tajima’s, Fu and Li’s, and Fay and Wu’s tests gavesignificant negative values (table 7). When these analyseswere extended to NAT1 flanking regions, the same testslost significance in Bakola Pygmies, whereas they turnedout to be even more significant in French, Gujarati, andThai because of an excess of singletons when mutationsare not orientated for their ancestral state (TD and F*)or because of an excess of highly frequent derived var-iants when their ancestral state is considered (H). Theseresults are mainly due to the presence of the low-fre-quency and highly divergent NAT1*11A haplotype inthe three Eurasian populations (fig. 2). As for the NAT2coding region, both Tajima’s D and Fu and Li’s F* inthe Ashkenazi, Sardinians, and French and Fu and Li’sF* in the Saami were significantly positive (table 7).However, all tests were not significant anymore whenboth flanking and exonic NAT2 variation was consid-ered. As for NATP, although it was found to be in LDwith NAT2, all tests yielded nonsignificant values.

To test significant differences in diversity levels amongthe three NAT loci, we performed the HKA test. Thecomparison between NAT1 and NATP was significantonly among western and eastern Eurasians (P p .046and .043, respectively), resulting from an excess of poly-morphisms in NAT1, compared with fixed mutations.Again, these results are the consequence of the binaryhaplotype pattern observed at this locus (fig. 2). By con-trast, the HKA tests comparing NAT1 versus NAT2 andNAT2 versus NATP yielded nonsignificant results. At

the protein level, we compared the number of synony-mous and nonsynonymous mutations in the two ho-mologous genes. NAT1 exhibited a deficit in nonsynon-ymous mutations (KA/KS p 0.242). By contrast, NAT2presented a KA/KS value closer to 1 (KA/KS p 0.802).

LRH Test for Recent Selection

We next performed the LRH test, which is designedto identify mutations/haplotypes under recent positiveselection by comparing the frequency of a given allelewith the breakdown of LD around it (Sabeti et al. 2002).Our results for both NAT1 and NAT2 are reported sep-arately for African and Eurasian populations (fig. 4Aand 4B). P values were estimated for all core haplotypesin all populations against both simulations and the em-pirical distributions of the HapMap in Yoruban and Eu-ropean-descent populations. A single haplotype ofNAT2 (NAT2*5B) appeared to deviate from neutralityin western and central Eurasian populations: P psim

( ) in Ashkenazi Jews, .0062 (.0085).0001 P p .0006emp

in Saami, .0363 (.0063) in Turkmen, .0124 (.0607) inMoroccans, and .0464 (.1836) in Swedish, with the samehaplotype in Sardinians being close to significance(.0576 [.2178]). Interestingly, the NAT2*5B haplotype,which exhibited the highest frequencies (150%) in west-ern Eurasians (table 6), bears the nonsynonymous mu-tation 341TrC (I114T), which has been shown to leadto a slow-acetylation status (Zang et al. 2004). In ad-dition, a single NAT1 haplotype (NAT1*4) appeared todeviate from the simulated distribution in the same pop-ulations that showed signals of positive selection forNAT2*5B: ( ) in AshkenaziP p .0008 P p .0319sim emp

Jews, .0115 (.1346) in Saami, .0445 (.2498) in Turkmen,.0293 (.2694) in Moroccans, and .0090 (.1987) in Swed-ish. In contrast to NAT2*5B, the protein encoded bythe NAT1*4 haplotype does not differ in enzyme activitywith those encoded by the other NAT1 haplotypes ob-served in this study (e.g., NAT1*10, NAT1*3, andNAT1*11) (Hughes et al. 1998; de Leon et al. 2000).

Growth Rate and Age of the NAT2 341TrC Mutation

The LD breakdown at the surrounding sites of a mu-tation is very informative for inferring allelic age esti-mates, through consideration of the recombination rateas a “genetic clock” (Labuda et al. 1997). The significantsignal of selection detected for the NAT2 341TrC mu-tation, together with the functional consequences asso-

Page 8: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

428 The American Journal of Human Genetics Volume 78 March 2006 www.ajhg.org

Table 6

Allelic Composition and Frequency of NAT2Haplotypes in Our Genotyping Panel

The table is available in its entirety in the onlineedition of The American Journal of Human Genetics.

Figure 2 NAT1 gene tree. Time is scaled in millions of years.Mutations are named for their physical positions along the NAT1locus. Lineage absolute frequencies in Africa and western and easternEurasia are reported. Nonsynonymous mutations are highlighted ingray.

ciated with this variant (i.e., reduced acetylation activ-ity), prompted us to estimate the age of this mutationby use of both maximum-likelihood and coalescent-based methods, both of which gave similar results. Toprovide a comparison, we performed the same analysesfor the 590GrA mutation, which is located 250 bp from341TrC and is never observed on the same haplotype.Our results indicated that these two mutations startedto increase in frequency at similar times and with com-parable growth rates in all populations (table 8). Theonly exception to this pattern was the 341TrC mutationin western/central Eurasians. Our estimations showedthat this mutation started to increase in frequency 6,315years ago (95% CI 5,797–7,005 years) at a growth rate(0.062) twice as big as the values observed for the samemutation in eastern Eurasians (0.031) and for 590GrAin the entire Eurasian sample. Indeed, the growth rateof 341TrC in western/central Eurasians was signifi-cantly different from all the others, since their 95% CIsdid not overlap (table 8).

Acetylation Phenotype Inference

In view of the 95% NAT2 genotype-phenotype con-cordance (Cascorbi et al. 1995), we inferred the distri-bution of fast/slow–acetylation phenotypes across pop-ulations from our NAT2 genotyping panel and com-pared it with the phenotyping results of 23 healthy pop-ulations worldwide, reviewed for this occasion (fig. 5).To make both data sets comparable, we considered fast/slow heterozygotes to be fast acetylators because, evenif they present a mean intermediate activity significantlydifferent from that of fast homozygotes, they are mostlyobserved in the “fast acetylator activity peak” (Cascorbiet al. 1995). Phenotype frequencies showed strong var-iation among ethnic groups (fig. 5). The slow-acetylatorphenotype is present at the lowest frequencies in easternAsian and Native American populations as well as inthe Pygmy groups studied here for the first time, whereasit exhibits the highest frequencies in Middle Eastern,European, central/south Eurasian, and African popula-tions. The highest frequencies worldwide of the slow-acetylator phenotype are observed in the Ashkenazi pop-ulation, in which it reaches 80%.

Discussion

The direct interaction of NAT1 and NAT2 gene productswith the human chemical environment makes them po-

tential targets of natural selection, provided that the ex-posure to xenobiotics has significantly influenced pop-ulation fitness over time. Our population-based geneticstudy revealed distinctive patterns of variability for thetwo NAT genes, reflecting two very different evolution-ary histories.

Reduced Variation and Deep-Rooting Genealogy inthe NAT1 Gene

The NAT1 coding region is characterized by a reducedgenetic diversity ( ), with four populationsp p 0.021%of eight showing no variation at all (table 2). In addition,the nearby 3′ UTR (TAA)n microsatellite of NAT1, whichwas also typed in the genotyping panel, presented lowlevels of heterozygosity ( ), with the alleleHz p 0.073(TAA)8 accounting for 96.3% of the overall diversity(table 5). This deficit in heterozygosity was significantunder both the stepwise mutation model and the two-phased mutation model ( ). Also, in strong con-P ! .05trast to NAT2, the functional mutations identified inNAT1 are present at very low frequencies (Upton et al.2001). These observations, together with the KA/KS valueof 0.242, are compatible with the action of purifying

Page 9: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

www.ajhg.org Patin et al.: Evolutionary Patterns in the NATs Region 429

Figure 3 Proportion of SNP pairs in significant LD against physical distance in the NATs 200-kb region. In each population, genotypedSNPs were selected to have a MAF 110%. The Fisher’s exact test was used to assess LD significance, followed by Bonferroni corrections. SNPpairs were grouped into 20-kb bins.

selection in shaping NAT1 diversity, in agreement witha previous study stating that the majority of humangenes may be under weak negative selection (Bustamanteet al. 2005). In this view, and considering that NAT1 isexpressed in many tissues early in development and mayplay an additional role in the metabolism of folate (Simet al. 2000), our genetic results suggest that its involve-ment in endogenous metabolic pathways might be moreimportant than previously thought.

Purifying selection may not be the only evolutionaryforce that has influenced NAT1 diversity. Indeed, oneof the most salient observations of this study is the highlydivergent tree topology and high TMRCA (2.01 � 0.29MYA) of this locus (fig. 2). This binary pattern is trans-lated into significant departures from neutrality in pop-ulations presenting the divergent haplotype NAT1*11A(see table 7 and results of the HKA test). The probabil-ity of finding such a high TMRCA under a Wright-Fishermodel was found to be low ( ). Different hy-P p .029potheses can be proposed to explain such long basalbranches in the NAT1 gene tree. First, long-termbalancing selection can result in divergent haplotypeclusters, by maintaining two or more alleles over time,provided that they result in functional differences. Nev-ertheless, our data do not support this hypothesis, sincethe two nonsynonymous mutations separating the twoclusters (fig. 2) have been shown to have no significanteffects on the in vivo protein activity in human cells

(Hein 2002) or on the stability and activity of the re-combinant protein in yeast (Hughes et al. 1998). Anykind of selection due to a hitchhiking effect with neigh-bor genes is equally unlikely, because the two closestgenes (ASAH1 located 5′ and NAT2 located 3′) behaveas independent haplotype blocks (this study and theHapMap database). Furthermore, our sequence datafrom the NAT1 coding region are consistent with theaction of purifying selection rather than balancing se-lection, with the first selective regime having a minorinfluence on tree topologies (Williamson and Orive2002). Second, gene conversion could also lead to suchdivergent haplotype patterns by the replacement of asegment of NAT1 with a tract from its nearby paralogs(NAT2 and/or NATP). This alternative is unlikely, how-ever, since the 17 SNPs separating the two divergentNAT1 lineages are not physically clustered (fig. 2) as onewould expect after gene conversion between duplicatedloci (Innan 2003). Thus, if gene conversion formed thebasis of such a haplotype pattern, multiple conversionevents must be invoked, with some tracts of lengths !5bp. Yet, the conversion-tract lengths have been estimatedto range from 55 bp to 290 bp, through sperm-typinganalyses (Jeffreys and May 2004).

In this view, an alternative and most likely scenarioto explain our data is a demographic event such as an-cient population structure. A number of studies haverecently reported gene genealogies that present not only

Page 10: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

430 The American Journal of Human Genetics Volume 78 March 2006 www.ajhg.org

Table 7

Sequenced-Based Neutrality Tests in NAT1, NAT2, and NATP

POPULATION

NAT1 CODING REGIONa

(870 bp)

NAT1 CODING AND

FLANKING REGIONS

(2,605 bp)NAT2 CODING REGION

(870 bp)

NAT2 CODING AND

FLANKING REGIONS

(2,820 bp)NATP PSEUDOGENE

(2,145 bp)

TD F* H TD F* H TD F* H TD F* H TD F* H

Bakola �1.513b �2.189b .190 .323 �.298 .579 .007 .456 .653 .090 .852 1.726 �.572 .228 �.947Bantu NA NA NA .563 .828 .632 .303 �.193 .179 �.091 �.282 1.273 �1.287 �1.205 �.011Ashkenazi NA NA NA �.044 .131 �1.674 2.516c 1.797c .505 .374 .080 1.557 .545 1.255 .284Sardinian NA NA NA .723 .295 .364 1.677b 1.548b �.091 �.164 �.523 1.090 �.295 .000 .576French �1.723b �2.535b �3.505b �1.799b �2.925b �17.947c 2.048b 1.646b 1.021 .176 .014 1.789 .026 �.119 �1.011Saami NA NA NA 1.443 1.383 �.305 1.365 1.480b 1.358 1.017 1.281 3.136 .290 .135 .926Gujarati �1.723b �2.535b �3.505b �1.974b �3.143c �18.600c 1.355 .895 1.242 1.135 .653 2.673 .393 .582 .326Thai �1.384 �.410 �3.106b �.904 .560 �15.016c .818 1.281 .767 .441 .828 2.386 .809 .206 .344

a NA p not applicable (no variation observed).b ..01 ! P ! .05c .P ! .01

unexpectedly old coalescent times (∼2 MYA) but alsolong basal branches (Harris and Hey 1999; Webster etal. 2003; Barreiro et al. 2005; Garrigan et al. 2005;Hayakawa et al. 2005). Our observations at NAT1, to-gether with these studies, further support the view thatsome diversity in the genome of modern humans mayhave persisted from a structured ancestral population(Harding and McVean 2004). In addition, NAT1*11Aappears to be absent in sub-Saharan Africa, since it wasnot detected in either our genotyping panel of 144 sub-Saharan Africans from distinct geographic locations or600 African American individuals reported elsewhere(Upton et al. 2001). Therefore, the observation that theNAT1 gene tree is rooted in Eurasia questions the geo-graphic location of such a structured ancestral popula-tion (Takahata et al. 2001). The origins of NAT1*11Acould thus be placed either in sub-Saharan Africa, fromwhere it must have subsequently disappeared, or in Eur-asia. Should the latter be the case, the NAT1 gene treeis at odds with the commonly accepted replacement hy-pothesis (Lewin 1987) and is more parsimoniously ex-plained by the occurrence of partial hybridization be-tween modern humans expanding from Africa and pre-existing hominids in Eurasia, as recently sustained bythe RRM2P4 locus (Garrigan et al. 2005). However,such inferences require further support from the analysesof multiple independent loci in increased numbers ofsamples and human populations.

The NAT2 Gene: An Advantage to Be a SlowAcetylator?

The significantly positive values observed for most se-quence-based neutrality tests of NAT2 in western Eu-rasians (table 7) suggest the action of natural selection.Likewise, population size reductions—such as that prob-ably experienced by non-African populations during theout-of-Africa exit (Marth et al. 2003) or, more recently,

by Ashkenazi Jews (Behar et al. 2004)—could have alsoinflated TD and F* values (Przeworski et al. 2000).However, these demographic events should have equallyinfluenced neutrality statistics for other non-Africanpopulations (i.e., eastern Eurasians), which is not thecase. These observations argue thus in favor of the actionof natural selection. Conversely, interspecific tests (i.e.,KA/KS and MK tests) do not rule out the neutral evo-lution of NAT2 diversity, which does not show any clearexcess or lack of nonsynonymous mutations. In view ofthese apparently contradictory results, it is plausible thatthe frequencies, rather than the number of these muta-tions, have been influenced by selection, suggesting moresubtle and recent fluctuations in the selective pressuresoperating on NAT2.

Such changes can be detected by the LRH test, whichaims to identify haplotypes under recent positive selec-tion. Moreover, this approach should be robust to theconfounding effects of demography, since it corrects theextended haplotype homozygosity (EHH) of a given corehaplotype by the EHH of all other haplotypes at thesame core region (Sabeti et al. 2002). Overall, the LRHtests were more significant when NAT1 and NAT2 corehaplotypes were compared with the simulated than withthe empirical distribution. Manifestly, the empirical dis-tribution also includes genes under selection that biasREHH toward higher values, and it allows the detectionof selected haplotypes in a conservative way. Indepen-dently of which background distribution was used (e.g.,simulated or empirical), the same NAT2 haplotype,NAT2*5B, was detected to depart from neutrality inwestern and central Eurasians (fig. 4B). These popula-tions also showed significant P values for a single NAT1haplotype, NAT1*4. It is worth mentioning that ∼80%of NAT2*5B haplotypes are associated with NAT1*4in western/central Eurasians (vs. 43% and 60% in sub-Saharan Africans and eastern Eurasians, respectively).

Page 11: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

Figure 4 REHH plotted against core haplotype frequencies. Circles represent NAT1 and NAT2 core haplotypes. The 95th and 99thpercentiles were calculated from both simulated data (gray lines) and an empirical distribution obtained from the screening of the entirechromosome 8 (black lines). Circles above the 95th percentile of simulated and/or empirical distributions are blackened. A, NAT1 and NAT2sub-Saharan African core haplotypes are plotted against both simulated data and the empirical distribution of ∼40,000 Yoruban core haplotypesfrom the HapMap. B, NAT1 and NAT2 core haplotypes in western/central and eastern Eurasian populations are plotted against both thesimulated data and the empirical distribution of ∼40,000 European-descent core haplotypes from the HapMap. Numbers affiliated with significantcore haplotypes refer to: (1) Moroccan NAT2*5B, (2) Swedish NAT2*5B, (3) Turkmen NAT1*4, (4) Moroccan NAT1*4, (5) Saami NAT1*4,and (6) Swedish NAT1*4.

Page 12: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

432 The American Journal of Human Genetics Volume 78 March 2006 www.ajhg.org

Table 8

Estimated Growth Rate and Age of the Mutations 341TrC and 590GrA (95% CI) of the NAT2Gene

Mutationand Parameter Sub-Saharan Africans

Western/Central Eurasians Eastern Eurasians

341TrC:p .270 .477 .181r .019 (.016–.024) .062 (.055–.075) .031 (.028–.037)g 15,652 (13,797–18,435) 6,315 (5,797–7,005) 12,627 (11,427–14,685)

590GrA:p .185 .262 .362r .023 (.020–.030) .031 (.028–.037) .034 (.031–.041)g 12,497 (10,932–14,958) 12,762 (11,657–14,385) 11,987 (10,940–13,643)

NOTE.—p p relative frequency of the mutation; r p growth rate; g p age of the mutation inyears.

Thus, the signals detected in both NAT1 and NAT2 maybe the result of a single event of selection targeting along-range haplotype composed of both NAT1*4 andNAT2*5B. Several lines of evidence, however, stronglysupport that the NAT1 haplotype does not harbor thefunctional cause of such a selective event: P values areglobally less significant for NAT1*4 than for NAT2*5B,and, most importantly, all the NAT1 haplotypes ob-served in this study encode proteins with identical en-zymatic activity (Hughes et al. 1998; de Leon et al.2000). In sharp contrast, NAT2*5B bears the 341TrCmutation that is well known to encode an altered slow-acetylator protein (Zang et al. 2004).

Further support for the action of selection on the slow-encoding 341TrC NAT2 mutation comes from itsgrowth-rate estimate, which showed that this mutationhas increased in frequency twice as quickly as expected( ) (table 8) only among western and centralr p 0.062Eurasians. Previous estimations (Slatkin and Bertorelle2001), together with our own, indicate that Eurasianpopulations have grown at a rate of ∼0.030. Becausepopulation growth and selection have additive effects onthe growth rate of a mutation (Slatkin and Rannala1997), these observations suggest that the 341C allelehas been driven by selection with a selective coefficientof ∼ . Using a more accurate ap-0.062 � 0.030 p 0.032proach (Wright 1969), we estimated the 95% CI of thisselective coefficient as 0.0124–0.0913. These figures re-inforce the idea of a selective advantage of the 341Callele, even if weaker than other examples of recent butstrong positive selection, such as lactase persistence (s∼0.09–0.19) (Bersaglieri et al. 2004) or the G6PD A�alleles ( ) (Saunders et al. 2005).s 1 0.1

Altogether, both the LRH test and the growth rate of341TrC argue in favor of positive selection acting onthis slow-encoding mutation, which would imply a gen-eral selective advantage for the slow-acetylator pheno-type. However, three other NAT2 variants (191GrA,590GrA, and 857GrA) also encode slow proteins but

do not show any significant departure from neutrality.Actually, the 341TrC mutation involves the greatest re-duction in NAT2 enzymatic activity (Hein et al. 2000).In this context, it is very plausible that all NAT2 slow-acetylating variants have been subject to weak selectivepressures, the signal of selection being detectable onlyin the mutation 341TrC causing the “slowest-acetyla-tion” phenotype. Thus, the predicted increase in fre-quency of all slow mutations, in response to directionalselection, would explain the observed excess of inter-mediate-frequency alleles in western Eurasian popula-tions, as depicted by the significant positive values ofTD (table 7).

The footprints of natural selection identified in west-ern/central Eurasians raise the question of which event(s)may have provoked fluctuations in the spectrum of xe-nobiotics inactivated/activated by NAT2 (e.g., NAT2 ac-tivates heterocyclic carcinogens found in well-cookedmeat [Hein et al. 2000; Hein 2002]) in these populations.In this context, given the geographic distribution of theslow-acetylator phenotype and the estimated expansiontime of the slowest-encoding 341TrC mutation (5,797–7,005 years ago in western/central Eurasians), it istempting to hypothesize that the emergence of agricul-ture in western Eurasia could be at the basis of suchenvironmental changes. Indeed, there is accumulatingevidence that this major transition resulted in a profoundmodification of human diets and lifestyles (Cordain etal. 2005) and, consequently, in the exposure of humansto chemical environments (Ferguson 2002). Moreover,the highest frequencies of slow acetylators are observedin the Middle East (fig. 5), one of the first regions whereagriculture originated ∼10,000 years ago, and these fre-quencies decrease toward western Europe, North Africa,and India, three regions where agriculture was subse-quently diffused from the Fertile Crescent (Harris 1996).However, the hypothesis that the transition to agricul-ture influenced both the human exposure to xenobioticenvironments and, consequently, the selective pressures

Page 13: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

www.ajhg.org Patin et al.: Evolutionary Patterns in the NATs Region 433

Figure 5 Worldwide distribution of NAT2 acetylation phenotypes in healthy individuals. Each pie represents the population proportionof fast and slow acetylators. Populations numbered from 1 to 13 were analyzed in this study: (1) Bakola Pygmies, (2) Baka Pygmies, (3) AtekeBantus, (4) Somali, (5) Morrocans, (6) Ashkenazi Jews, (7) Sardinians, (8) Swedes, (9) Saami, (10) Turkmen, (11) Gujarati, (12) Thai, and (13)Chinese. Numbers 14 to 36 refer to a reviewed population: (14) Yorubas (Jeyakumar and French 1981), (15) Zimbabweans (Nhachi 1988),(16) South Africans (Hodgkin et al. 1979), (17) Libyans (Karim et al. 1981), (18) Saudi Arabians (El-Yazigi et al. 1992), (19) Emiratis (Woolhouseet al. 1997), (20) Iranians (Sardas et al. 1993), (21) Jordanians (Irshaid et al. 1992), (22) Turkmen (Bozkurt et al. 1990), (23) Greeks (Asprodiniet al. 1998), (24) Germans (Cascorbi et al. 1995), (25) Russians (Lil’in et al. 1984), (26) Pakistanis (Saleem et al. 1989), (27) Bangladeshi (Zaidet al. 2004), (28) Thai (Kukongviriyapan et al. 1984), (29) Malaysians (Ong et al. 1990), (30) Chinese (Zhao et al. 2000), (31) Koreans (Leeet al. 2002), (32) Japanese (Hashiguchi and Ebihara 1992), (33) Papua New Guineans (Hombhanje 1990), (34) Australian Aborigines (Ilett etal. 1993), (35) Eskimos (Eidus et al. 1974), and (36) Amerindians (Jorge-Nebert et al. 2002).

at NAT2 remains tentative and requires a better char-acterization of the naturally occurring substrates of theNAT2 enzyme.

Conclusion

The diversity patterns observed at the NATs regionclearly illustrate our current vision of the human genomeas a “mosaic of discrete segments,” each with its ownindividual evolutionary history (Paabo 2003). WhereasNAT1 could belong to a small proportion of nuclear locithat kept traces of an ancient population structure, theNAT2 gene gives some insights into the evolutionaryprocesses that could make some present-day detrimentalmutations frequent. The theory of the “thrifty genotype”proposes that common diseases, such as obesity and di-abetes, are the result of a past advantage to efficientlymetabolize rare food sources that are no longer restricted(Neel 1962). By analogy, the slow NAT2 acetylator hap-lotype (NAT2*5B), which is found at high frequenciesworldwide and which we propose conferred some se-lective advantage, at least in western/central Eurasianpopulations, exhibits today the strongest association

with susceptibility to bladder cancer and adverse drugreactions (Hein et al. 2000; Hein 2002). This “evolu-tionary conflict” could be widespread among genes in-volved in carcinogen metabolism, since two major eventsin human history—the Neolithic transition from for-aging to agriculture and the more recent Industrial Rev-olution—may have dramatically changed our exposureto the damaging effects of environmental carcinogens.Consequently, dissecting the evolutionary processes thathave shaped patterns of diversity of the genes involvedin drug metabolism may represent a major analyticaltool not only to identify those having played a crucialrole in the past adaptation of Homo sapiens but alsoto better understand our present-day susceptibility todisease.

Acknowledgments

We warmly acknowledge M. Slatkin for supplying sourcecodes for growth-rate estimations, R. R. Hudson for the mod-ifications of the MS program, E. Sim for invaluable advice inthe beginning of this project, and two anonymous reviewersfor constructive criticisms on the early version of the manu-

Page 14: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

434 The American Journal of Human Genetics Volume 78 March 2006 www.ajhg.org

script. We also thank A. Novelletto, A. Froment, J. M. Hom-bert, H. Rouba, and S. Santachiara-Benerecetti, for kindly pro-viding us with DNA samples. This work was supported byCNRS and Institut Pasteur research funding. E.P. was sup-ported by the French Ministry of Research Ph.D. program.

Web ResourcesAccession numbers and URLs for data presented herein are asfollows:

Arlequin v.2.001, http://lgb.unige.ch/arlequin/BEST v.1.0, http://genomethods.org/best/Bottleneck software, http://www.montpellier.inra.fr/CBGP/softwares/

bottleneck/bottleneck.htmldbSNP, http://www.ncbi.nlm.nih.gov/projects/SNP/DMLE� v.2.2, http://www.dmle.org/DnaSP v.4.0, http://www.ub.es/dnasp/GENALYS software, http://software.cng.fr/GenBank, http://www.ncbi.nlm.nih.gov/Genbank/ (for NAT1, NAT2,

and NATP [accession numbers DQ305496–DQ305975])GENETREE software, http://www.stats.ox.ac.uk/˜griff/software.htmlInternational HapMap Project, http://www.hapmap.org/MS program, http://home.uchicago.edu/˜rhudson1/NAT Nomenclature, http://www.louisville.edu/medschool/pharmacology/

NAT.htmlOnline Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm

.nih.gov/Omim/ (for NAT1 and NAT2)PHASE v.2.1.1, http://www.stat.washington.edu/stephens/phase.htmlUCSC Genome Bioinformatics, http://genome.ucsc.edu/

ReferencesAsprodini EK, Zifa E, Papageorgiou I, Benakis A (1998) Determination

of N-acetylation phenotyping in a Greek population using caffeineas a metabolic probe. Eur J Drug Metab Pharmacokinet 23:501–506

Austerlitz F, Kalaydjieva L, Heyer E (2003) Detecting populationgrowth, selection and inherited fertility from haplotypic data in hu-mans. Genetics 165:1579–1586

Barreiro LB, Patin E, Neyrolles O, Cann HM, Gicquel B, Quintana-Murci L (2005) The heritage of pathogen pressures and ancientdemography in the human innate-immunity CD209/CD209L re-gion. Am J Hum Genet 77:869–886

Behar DM, Hammer MF, Garrigan D, Villems R, Bonne-Tamir B,Richards M, Gurwitz D, Rosengarten D, Kaplan M, Pergola SD,Quintana-Murci L, Skorecki K (2004) MtDNA evidence for a ge-netic bottleneck in the early history of the Ashkenazi Jewish pop-ulation. Eur J Hum Genet 12:355–364

Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF,Drake JA, Rhodes M, Reich DE, Hirschhorn JN (2004) Geneticsignatures of strong recent positive selection at the lactase gene. AmJ Hum Genet 74:1111–1120

Blum M, Grant DM, McBride W, Heim M, Meyer UA (1990) Humanarylamine N-acetyltransferase genes: isolation, chromosomal local-ization, and functional expression. DNA Cell Biol 9:193–203

Bozkurt A, Basci NE, Kalan S, Tuncer M, Kayaalp SO (1990) N-acetylation phenotyping with sulphadimidine in a Turkish popula-tion. Eur J Clin Pharmacol 38:53–56

Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT,Glanowski S, Tanenbaum DM, White TJ, Sninsky JJ, HernandezRD, Civello D, Adams MD, Cargill M, Clark AG (2005) Naturalselection on protein-coding genes in the human genome. Nature 437:1153–1157

Cartwright RA, Glashan RW, Rogers HJ, Ahmad RA, Barham-Hall

D, Higgins E, Kahn M (1982) A role of N-acetyltransferase phe-notypes in bladder carcinogenesis: a pharmacogenetic epidemiolog-ical approach to bladder cancer. Lancet 2:842–846

Cascorbi I, Drakoulis N, Brockmoller J, Maurer A, Sperling K, RootsI (1995) Arylamine N-acetyltransferase (NAT2) mutations and theirallelic linkage in unrelated Caucasian individuals: correlation withphenotypic activity. Am J Hum Genet 57:581–592

Cordain L, Eaton SB, Sebastian A, Mann N, Lindeberg S, WatkinsBA, O’Keefe JH, Brand-Miller J (2005) Origins and evolution ofthe Western diet: health implications for the 21st century. Am J ClinNutr 81:341–354

Cornuet JM, Luikart G (1996) Description and power analysis of twotests for detecting recent population bottlenecks from allele fre-quency data. Genetics 144:2001–2014

de Leon JH, Vatsis KP, Weber WW (2000) Characterization of nat-urally occurring and recombinant human N-acetyltransferase vari-ants encoded by NAT1. Mol Pharmacol 58:288–299

Eidus L, Hodgkin MM, Schaefer O, Jessamine AG (1974) Distributionof isoniazid inactivators determined in Eskimos and Canadian col-lege students by a urine test. Rev Can Biol 33:117–123

El-Yazigi A, Johansen K, Raines DA, Dossing M (1992) N-acetylationpolymorphism and diabetes mellitus among Saudi Arabians. J ClinPharmacol 32:905–910

Excoffier L (2002) Human demographic history: refining the recentAfrican origin model. Curr Opin Genet Dev 12:675–682

Fay JC, Wu CI (2000) Hitchhiking under positive Darwinian selection.Genetics 155:1405–1413

Ferguson LR (2002) Natural and human-made mutagens and carcin-ogens in the human diet. Toxicology 181–182:79–82

Fu YX, Li WH (1993) Statistical tests of neutrality of mutations. Ge-netics 133:693–709

Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, BlumenstielB, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN,Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ,Altshuler D (2002) The structure of haplotype blocks in the humangenome. Science 296:2225–2229

Garcia-Closas M, Malats N, Silverman D, Dosemeci M, KogevinasM, Hein DW, Tardon A, Serra C, Carrato A, Garcia-Closas R,Lloreta J, Castano-Vinyals G, Yeager M, Welch R, Chanock S, Chat-terjee N, Wacholder S, Samanic C, Tora M, Fernandez F, Real FX,Rothman N (2005) NAT2 slow acetylation, GSTM1 null genotype,and risk of bladder cancer: results from the Spanish Bladder CancerStudy and meta-analyses. Lancet 366:649–659

Garrigan D, Mobasher Z, Severson T, Wilder JA, Hammer MF (2005)Evidence for archaic Asian ancestry on the human X chromosome.Mol Biol Evol 22:189–192

Griffiths RC, Tavare S (1994) Sampling theory for neutral alleles in avarying environment. Philos Trans R Soc Lond B Biol Sci 344:403–410

Harding RM, McVean G (2004) A structured ancestral population forthe evolution of modern humans. Curr Opin Genet Dev 14:667–674

Harris DR (ed) (1996) The origins and spread of agriculture and pas-toralism in Eurasia. Smithsonian Institution Press, Washington, DC

Harris EE, Hey J (1999) X chromosome evidence for ancient humanhistories. Proc Natl Acad Sci USA 96:3320–3324

Hashiguchi M, Ebihara A (1992) Acetylation polymorphism of caf-feine in a Japanese population. Clin Pharmacol Ther 52:274–276

Hayakawa T, Aki I, Varki A, Satta Y, Takahata N (2005) Fixation ofthe human-specific CMP-N-acetylneuraminic acid hydroxylasepseudogene and implications of haplotype diversity for human evo-lution. Genetics (http://www.genetics.org/cgi/rapidpdf/genetics.105.046995v1) (electronically published November 4, 2005; accessedJanuary 12, 2006)

Hein DW (2002) Molecular genetics and function of NAT1 and NAT2:

Page 15: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

www.ajhg.org Patin et al.: Evolutionary Patterns in the NATs Region 435

role in aromatic amine metabolism and carcinogenesis. Mutat Res506–507:65–77

Hein DW, Doll MA, Fretland AJ, Leff MA, Webb SJ, Xiao GH, De-vanaboyina US, Nangju NA, Feng Y (2000) Molecular genetics andepidemiology of the NAT1 and NAT2 acetylation polymorphisms.Cancer Epidemiol Biomarkers Prev 9:29–42

Hill WG, Robertson A (1968) The effects of inbreeding at loci withheterozygote advantage. Genetics 60:615–628

Hodgkin MM, Eidus L, Bailey WC (1979) Isoniazid phenotyping ofblack as well as white patients. Can J Physiol Pharmacol 57:760–763

Hombhanje F (1990) An assessment of acetylator polymorphism andits relevance in Papua New Guinea. P N G Med J 33:107–110

Huang YS, Chern HD, Su WJ, Wu JC, Lai SL, Yang SY, Chang FY,Lee SD (2002) Polymorphism of the N-acetyltransferase 2 gene asa susceptibility risk factor for antituberculosis drug-induced hepa-titis. Hepatology 35:883–889

Hudson RR (2002) Generating samples under a Wright-Fisher neutralmodel. Bioinformatics 18:337–338

Hudson RR, Kreitman M, Aguade M (1987) A test of neutral molec-ular evolution based on nucleotide data. Genetics 116:153–159

Hughes NC, Janezic SA, McQueen KL, Jewett MA, Castranio T, BellDA, Grant DM (1998) Identification and characterization of variantalleles of human acetyltransferase NAT1 with defective functionusing p-aminosalicylate as an in-vivo and in-vitro probe. Pharma-cogenetics 8:55–66

Ilett KF, Chiswell GM, Spargo RM, Platt E, Minchin RF (1993) Acet-ylation phenotype and genotype in Aboriginal leprosy patients fromthe north-west region of western Australia. Pharmacogenetics 3:264–269

Innan H (2003) A two-locus gene conversion model with selection andits application to the human RHCE and RHD genes. Proc NatlAcad Sci USA 100:8793–8798

Irshaid Y, al-Hadidi H, Abuirjeie M, Latif A, Sartawi O, RawashdehN (1992) Acetylator phenotypes of Jordanian diabetics. Eur J ClinPharmacol 43:621–623

Jeffreys AJ, May CA (2004) Intense and highly localized gene con-version activity in human meiotic crossover hot spots. Nat Genet36:151–156

Jeyakumar LH, French MR (1981) Polymorphic acetylation of sul-famethazine in a Nigerian (Yoruba) population. Xenobiotica 11:319–321

Jorge-Nebert LF, Eichelbaum M, Griese EU, Inaba T, Arias TD (2002)Analysis of six SNPs of NAT2 in Ngawbe and Embera Amerindiansof Panama and determination of the Embera acetylation phenotypeusing caffeine. Pharmacogenetics 12:39–48

Karim AK, Elfellah MS, Evans DA (1981) Human acetylator poly-morphism: estimate of allele frequency in Libya and details of globaldistribution. J Med Genet 18:325–330

Kimura M (1968) Evolutionary rate at the molecular level. Nature217:624–626

Kukongviriyapan V, Lulitanond V, Areejitranusorn C, KongyingyoseB, Laupattarakasem P (1984) N-acetyltransferase polymorphism inThailand. Hum Hered 34:246–249

Labuda D, Zietkiewicz E, Labuda M (1997) The genetic clock andthe age of the founder effect in growing populations: a lesson fromFrench Canadians and Ashkenazim. Am J Hum Genet 61:768–771

Lee SY, Lee KA, Ki CS, Kwon OJ, Kim HJ, Chung MP, Suh GY, KimJW (2002) Complete sequencing of a genetic polymorphism inNAT2 in the Korean population. Clin Chem 48:775–777

Lewin R (1987) Africa: cradle of modern humans. Science 237:1292–1295

Lewontin RC (1964) The interaction of selection and linkage. II. Op-timum models. Genetics 50:757–782

Lil’in ET, Korsunskaia MP, Meksin VA, Drozdov ES, Nazarov VV

(1984) Distribution of acetylator phenotypes in the normal Moscowcity population and in chronic alcoholism. Genetika 20:1557–1559

Marth G, Schuler G, Yeh R, Davenport R, Agarwala R, Church D,Wheelan S, Baker J, Ward M, Kholodov M, Phan L, Czabarka E,Murvai J, Cutler D, Wooding S, Rogers A, Chakravarti A, Har-pending HC, Kwok PY, Sherry ST (2003) Sequence variations in thepublic human genome data reflect a bottlenecked population history.Proc Natl Acad Sci USA 100:376–831

McDonald JH, Kreitman M (1991) Adaptive protein evolution at theAdh locus in Drosophila. Nature 351:652–654

Neel JV (1962) Diabetes mellitus: a “thrifty” genotype rendered det-rimental by “progress”? Am J Hum Genet 14:353–362

Nei M (ed) (1987) Molecular evolutionary genetics. Columbia Uni-versity Press, New York

Nhachi CF (1988) Polymorphic acetylation of sulphamethazine in aZimbabwe population. J Med Genet 25:29–31

Ong ML, Mant TG, Veerapen K, Fitzgerald D, Wang F, ManivasagarM, Bosco JJ (1990) The lack of relationship between acetylatorphenotype and idiopathic systemic lupus erythematosus in a South-east Asian population: a study of Indians, Malays and MalaysianChinese. Br J Rheumatol 29:462–464

Paabo S (2003) The mosaic that is our genome. Nature 421:409–412Przeworski M, Hudson RR, Di Rienzo A (2000) Adjusting the focus

on human variation. Trends Genet 16:296–302Reeve JP, Rannala B (2002) DMLE�: Bayesian linkage disequilibrium

gene mapping. Bioinformatics 18:894–895Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R (2003) DnaSP,

DNA polymorphism analyses by the coalescent and other methods.Bioinformatics 19:2496–2497

Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, SchaffnerSF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, AckermanHC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D, WardR, Lander ES (2002) Detecting recent positive selection in the humangenome from haplotype structure. Nature 419:832–837

Saleem M, Malik SA, Ahmed M, Saleem N (1989) Isoniazid acetylationand polymorphism in humans. J Pak Med Assoc 39:285–286

Sardas S, Lahijany B, Cok I, Karakaya AE (1993) N-acetylation phe-notyping with sulfamethazine in an Iranian population. Pharma-cogenetics 3:131–134

Saunders MA, Slatkin M, Garner C, Hammer MF, Nachman MW(2005) The span of linkage disequilibrium caused by selection onG6PD in humans. Genetics 171:1219–1229

Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D (2005)Calibrating a coalescent simulation of human genome sequence var-iation. Genome Res 15:1576–1583

Schneider S, Roessli D, Excoffier L (2000) Arlequin version 2.000: asoftware for population genetic data analysis. Genetics and BiometryLaboratory, University of Geneva, Geneva

Sebastiani P, Lazarus R, Weiss ST, Kunkel LM, Kohane IS, RamoniMF (2003) Minimal haplotype tagging. Proc Natl Acad Sci USA100:9900–9905

Sim E, Payton M, Noble M, Minchin R (2000) An update on genetic,structural and functional studies of arylamine N-acetyltransferasesin eukaryotes and prokaryotes. Hum Mol Genet 9:2435–2441

Slatkin M, Bertorelle G (2001) The use of intraallelic variability fortesting neutrality and estimating population growth rate. Genetics158:865–874

Slatkin M, Rannala B (1997) Estimating the age of alleles by use ofintraallelic variability. Am J Hum Genet 60:447–458

Soranzo N, Bufe B, Sabeti PC, Wilson JF, Weale ME, Marguerie R,Meyerhof W, Goldstein DB (2005) Positive selection on a high-sensitivity allele of the human bitter-taste receptor TAS2R16. CurrBiol 15:1257–1265

Stephens M, Donnelly P (2003) A comparison of Bayesian methodsfor haplotype reconstruction from population genotype data. Am JHum Genet 73:1162–1169

Page 16: Deciphering the ancient and complex evolutionary history ...Francesca Luca, Antti Sajantila, Doron M Behar, Ornella Semino, Anavaj Sakuntabhai, Nicole Guiso, et al. To cite this version:

436 The American Journal of Human Genetics Volume 78 March 2006 www.ajhg.org

Tajima F (1989) Statistical method for testing the neutral mutationhypothesis by DNA polymorphism. Genetics 123:585–595

Takahashi M, Matsuda F, Margetic N, Lathrop M (2003) Automatedidentification of single nucleotide polymorphisms from sequencingdata. J Bioinform Comput Biol 1:253–265

Takahata N (1993) Allelic genealogy and human evolution. Mol BiolEvol 10:2–22

Takahata N, Lee SH, Satta Y (2001) Testing multiregionality of mod-ern human origins. Mol Biol Evol 18:172–183

Tang K, Wong LP, Lee EJ, Chong SS, Lee CG (2004) Genomic evidencefor recent positive selection at the human MDR1 gene locus. HumMol Genet 13:783–797

Thompson EE, Kuttab-Boulos H, Witonsky D, Yang L, Roe BA, DiRienzo A (2004) CYP3A variation and the evolution of salt-sensi-tivity variants. Am J Hum Genet 75:1059–1069

Toomajian C, Ajioka RS, Jorde LB, Kushner JP, Kreitman M (2003)A method for detecting recent selection in the human genome fromallele age estimates. Genetics 165:287–297

Toomajian C, Kreitman M (2002) Sequence variation and haplotypestructure at the human HFE locus. Genetics 161:1609–1623

Upton A, Johnson N, Sandy J, Sim E (2001) Arylamine N-acetyltrans-ferases: of mice, men and microorganisms. Trends Pharmacol Sci22:140–146

Wang Z, Wang B, Tang K, Lee EJ, Chong SS, Lee CG (2005) A func-tional polymorphism within the MRP1 gene locus identified throughits genomic signature of positive selection. Hum Mol Genet 14:2075–2087

Watterson GA (1975) On the number of segregating sites in geneticalmodels without recombination. Theor Popul Biol 7:256–276

Weber WW (1987) The acetylator genes and drug response. OxfordUniversity Press, New York

Webster MT, Clegg JB, Harding RM (2003) Common 5′ b-globin RFLPhaplotypes harbour a surprising level of ancestral sequence mosai-cism. Hum Genet 113:123–139

Williamson S, Orive ME (2002) The genealogy of a sequence subjectto purifying selection at multiple sites. Mol Biol Evol 19:1376–1384

Wooding S, Kim UK, Bamshad MJ, Larsen J, Jorde LB, Drayna D(2004) Natural selection and molecular evolution in PTC, a bitter-taste receptor gene. Am J Hum Genet 74:637–646

Wooding SP, Watkins WS, Bamshad MJ, Dunn DM, Weiss RB, JordeLB (2002) DNA sequence variation in a 3.7-kb noncoding sequence5′ of the CYP1A2 gene: implications for human population historyand natural selection. Am J Hum Genet 71:528–542

Woolhouse NM, Qureshi MM, Bastaki SM, Patel M, Abdulrazzaq Y,Bayoumi RA (1997) Polymorphic N-acetyltransferase (NAT2) ge-notyping of Emiratis. Pharmacogenetics 7:73–82

Wright S (ed) (1969) Evolution and the genetics of population. Uni-versity of Chicago Press, Chicago, p 33

Zaid RB, Nargis M, Neelotpol S, Hannan JM, Islam S, Akhter R, AliL, Azad-Khan AK (2004) Acetylation phenotype status in a Ban-gladeshi population and its comparison with that of other Asianpopulation data. Biopharm Drug Dispos 25:237–241

Zang Y, Zhao S, Doll MA, States JC, Hein DW (2004) The T341C(Ile114Thr) polymorphism of N-acetyltransferase 2 yields slow ace-tylator phenotype by enhanced protein degradation. Pharmacoge-netics 14:717–723

Zhao B, Seow A, Lee EJ, Lee HP (2000) Correlation between acety-lation phenotype and genotype in Chinese women. Eur J Clin Phar-macol 56:689–692