molecular evolution of glutathione s-transferases in the ......ized perl scripts, which utilized...

13
Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.107.075838 Molecular Evolution of Glutathione S-Transferases in the Genus Drosophila Wai Yee Low,* ,† Hooi Ling Ng, Craig J. Morton, Michael W. Parker, †,‡ Philip Batterham* ,† and Charles Robin* ,†,1 *Department of Genetics, University of Melbourne, Victoria 3010, Australia, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Victoria 3010, Australia and Biota Structural Biology Laboratory and the ACRF Rational Drug Discovery Facility, St. Vincent’s Institute of Medical Research, Melbourne, Victoria 3065, Australia Manuscript received May 10, 2007 Accepted for publication June 19, 2007 ABSTRACT As classical phase II detoxification enzymes, glutathione S-transferases (GSTs) have been implicated in insecticide resistance and may have evolved in response to toxins in the niche-defining feeding substrates of Drosophila species. We have annotated the GST genes of the 12 Drosophila species with recently se- quenced genomes and analyzed their molecular evolution. Gene copy number variation is attributable mainly to unequal crossing-over events in the large d and e clusters. Within these gene clusters there are also GST genes with slowly diverging orthologs. This implies that they have their own unique functions or have spatial/temporal expression patterns that impose significant selective constraints. Searches for positively selected sites within the GSTs identified G171K in GSTD1, a protein that has previously been shown to be capable of metabolizing the insecticide DDT. We find that the same radical substitution (G171K) in the substrate-binding domain has occurred at least three times in the Drosophila radiation. Homology-modeling places site 171 distant from the active site but adjacent to an alternative DDT-binding site. We propose that the parallel evolution observed at this site is an adaptive response to an envi- ronmental toxin and that sequencing of historical alleles suggests that this toxin was not a synthetic insecticide. T HE sequencing of the genomes of 12 Drosophila species provides a powerful new evolutionary con- text by which to understand insect biology. It bridges the gap between previous comparative studies of insect genomes that focus on gene composition and recent studies in Drosophila that address the extent to which adaptation has shaped molecular evolution of genes. The latter studies have used tests of neutrality that par- tition within- and between-species variation into cat- egories such as synonymous and nonsynonymous sites and have been used to estimate that 50% of the amino acid substitutions in Drosophila melanogaster and its sibling species have been adaptive (Smith and Eyre- Walker 2002; Andolfatto 2005; Welch 2006). While these studies quantify the amount of adaptation, com- parative analyses of the 12 Drosophila genomes have the power to identify specific loci that are the targets of positive selection by their patterns of sequence di- vergence (Yang and Nielsen 2002). Here we examine the molecular evolution of gluta- thione S-transferase (GST) gene family, which fulfill a number of functional roles, including detoxification. In the context of adaptation in insects, detoxification is of interest for two reasons. First, the ecological niche of Drosophila species is largely defined by the larval feeding substrate, and therefore the genes enabling Drosophila to identify, detoxify, and utilize these substrates as nu- tritional resources are obvious candidates for genes that have been the targets of natural selection. Indeed, in at least 2 of the 12 species with sequenced genomes, detoxification is believed to be important in the adap- tation to niche-defining larval substrates: D. sechellia to toxic levels of octanoic acid in Morinda fruit (Legal et al. 1994) and D. mojavensis to necrotic tissues of var- ious cactus species (Ruiz and Heed 1988). Second, any adaptive evolution involving the detoxification of natu- ral compounds over the past 50 million years of evo- lution may inform studies of insecticide resistances that have evolved over the past 60 years. The GST multigene family is one of several repeatedly implicated in in- secticide resistance and we focus exclusively on it here to integrate molecular evolutionary analyses with the excel- lent structural models available for insect GSTs. GST genes have been the subject of analysis with each new insect genome sequenced largely because they are candidates of insecticide resistance genes (Ranson et al. 2001; Tu and Akgul 2005; Claudianos et al. 2006). GSTs act by conjugating the thiol group from glutathi- one (GSH; g-glutamyl-cysteinyl-glycine) to compounds that possess an electrophilic center. In doing this, they 1 Corresponding author: Department of Genetics, University of Melbourne, Gate 12 Royal Parade, Parkville, Melbourne, VIC 3010, Australia. E-mail: [email protected] Genetics 177: 1363–1375 (November 2007)

Upload: others

Post on 13-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

Copyright � 2007 by the Genetics Society of AmericaDOI: 10.1534/genetics.107.075838

Molecular Evolution of Glutathione S-Transferases in the Genus Drosophila

Wai Yee Low,*,† Hooi Ling Ng,‡ Craig J. Morton,‡ Michael W. Parker,†,‡

Philip Batterham*,† and Charles Robin*,†,1

*Department of Genetics, University of Melbourne, Victoria 3010, Australia, †Bio21 Molecular Science and Biotechnology Institute,University of Melbourne, Victoria 3010, Australia and ‡Biota Structural Biology Laboratory and the

ACRF Rational Drug Discovery Facility, St. Vincent’s Institute of Medical Research,Melbourne, Victoria 3065, Australia

Manuscript received May 10, 2007Accepted for publication June 19, 2007

ABSTRACT

As classical phase II detoxification enzymes, glutathione S-transferases (GSTs) have been implicated ininsecticide resistance and may have evolved in response to toxins in the niche-defining feeding substratesof Drosophila species. We have annotated the GST genes of the 12 Drosophila species with recently se-quenced genomes and analyzed their molecular evolution. Gene copy number variation is attributablemainly to unequal crossing-over events in the large d and e clusters. Within these gene clusters there arealso GST genes with slowly diverging orthologs. This implies that they have their own unique functions orhave spatial/temporal expression patterns that impose significant selective constraints. Searches forpositively selected sites within the GSTs identified G171K in GSTD1, a protein that has previously beenshown to be capable of metabolizing the insecticide DDT. We find that the same radical substitution(G171K) in the substrate-binding domain has occurred at least three times in the Drosophila radiation.Homology-modeling places site 171 distant from the active site but adjacent to an alternative DDT-bindingsite. We propose that the parallel evolution observed at this site is an adaptive response to an envi-ronmental toxin and that sequencing of historical alleles suggests that this toxin was not a syntheticinsecticide.

THE sequencing of the genomes of 12 Drosophilaspecies provides a powerful new evolutionary con-

text by which to understand insect biology. It bridgesthe gap between previous comparative studies of insectgenomes that focus on gene composition and recentstudies in Drosophila that address the extent to whichadaptation has shaped molecular evolution of genes.The latter studies have used tests of neutrality that par-tition within- and between-species variation into cat-egories such as synonymous and nonsynonymous sitesand have been used to estimate that �50% of theamino acid substitutions in Drosophila melanogaster andits sibling species have been adaptive (Smith and Eyre-Walker 2002; Andolfatto 2005; Welch 2006). Whilethese studies quantify the amount of adaptation, com-parative analyses of the 12 Drosophila genomes havethe power to identify specific loci that are the targetsof positive selection by their patterns of sequence di-vergence (Yang and Nielsen 2002).

Here we examine the molecular evolution of gluta-thione S-transferase (GST) gene family, which fulfill anumber of functional roles, including detoxification. Inthe context of adaptation in insects, detoxification is of

interest for two reasons. First, the ecological niche ofDrosophila species is largely defined by the larval feedingsubstrate, and therefore the genes enabling Drosophilato identify, detoxify, and utilize these substrates as nu-tritional resources are obvious candidates for genes thathave been the targets of natural selection. Indeed, inat least 2 of the 12 species with sequenced genomes,detoxification is believed to be important in the adap-tation to niche-defining larval substrates: D. sechellia totoxic levels of octanoic acid in Morinda fruit (Legal

et al. 1994) and D. mojavensis to necrotic tissues of var-ious cactus species (Ruiz and Heed 1988). Second, anyadaptive evolution involving the detoxification of natu-ral compounds over the past 50 million years of evo-lution may inform studies of insecticide resistances thathave evolved over the past 60 years. The GST multigenefamily is one of several repeatedly implicated in in-secticide resistance and we focus exclusively on it here tointegrate molecular evolutionary analyses with the excel-lent structural models available for insect GSTs.

GST genes have been the subject of analysis with eachnew insect genome sequenced largely because they arecandidates of insecticide resistance genes (Ranson et al.2001; Tu and Akgul 2005; Claudianos et al. 2006).GSTs act by conjugating the thiol group from glutathi-one (GSH; g-glutamyl-cysteinyl-glycine) to compoundsthat possess an electrophilic center. In doing this, they

1Corresponding author: Department of Genetics, University of Melbourne,Gate 12 Royal Parade, Parkville, Melbourne, VIC 3010, Australia.E-mail: [email protected]

Genetics 177: 1363–1375 (November 2007)

Page 2: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

can eliminate toxins from a cell by rendering them morewater soluble or by targeting them to specific GSH multi-drug transporters. GSTs are reported to bio-transformorganochlorine (Clark and Shamaan 1984; Tang andTu 1994) and organophosphorous insecticides (Lewis

and Sawicki 1971; Oppenoorth et al. 1979; Huang

et al. 1998), and they confer resistance to pyrethroidinsecticides by reducing the oxidative damage that theinsecticides cause to lipids (Vontas et al. 2001). Re-sistant strains of various species have been shown to haveeither increased expression (Grant and Hammock

1992; Syvanen et al. 1994) or increased GST activity(Fournier et al. 1992). In Anopheles gambiae, an e GST hascolocalized with resistance on a genetic map (Ranson

et al. 2000).Comparative analysis of the D. melanogaster and An.

gambiae genomes revealed 36 and 28 cytosolic GSTs, re-spectively, and these have been classified into six classes(d, e, s, u, v, and z; Ranson et al. 2001). Of particularinterest to researchers are the d and e classes becausethey are insect specific, exist in some of the largest geneclusters in insect genomes, and, to date, are the onlyGSTs classes that have been implicated in insecticideresistance (Tangand Tu 1994; Ranson et al. 2001; Ding

et al. 2003). The molecular complexity of dipteran GSTsmay extend beyond the size of the gene clusters, as thereis a suggestion that diversity may be generated withinindividuals by alternative splicing and within popula-tions by gene fusions (Zhou and Syvanen 1997) andpossibly by gene amplifications (Vontas et al. 2002). Astriking feature of the phylogenetic analyses (Ranson

et al. 2002) is that, while there is a conservation of GSTclasses between D. melanogaster and An. gambiae, geneswithin a class are often more closely related to geneswithin the same species. Thus there are few cases wheretrue orthologous relationships are observed. Instead,Ranson et al. (2002) describes the situation as ‘‘ortholo-gous sets of paralogous genes.’’ Thus multiple geneduplication and/or intergene recombination events havehappened since the divergence of the D. melanogasterand An. gambiae lineages.

Here we analyze the GST genes of the 12 Drosophilaspecies with sequenced genomes to obtain a clear under-standing of the molecular evolutionary events affectingthese genes. We parse all the cytosolic GST genes pres-ent in the Drosophila genomes into those that show thehallmarks of adaptive evolution from those that showpatterns of divergence more consistent with purifyingselection. We reason that those that evolve adaptively aremore likely to have roles in detoxifying environmentalcompounds and are more likely to be preadapted to in-secticide detoxification. Finally, we test a hypothesis thata candidate adaptive substitution is associated with in-secticide resistance by sequencing alleles that predatethe use of insecticides and interpret the substitution in astructural context by docking a known substrate on ahomology-based protein structure model.

MATERIALS AND METHODS

D. melanogaster GSTs as reference set: Glutathione S-trans-ferases encoded in the D. melanogaster genome are divided intotwo nonhomologous families: the microsomal GSTs for whichthere are three genes (Toba and Aigaki 2000) and thecanonical GSTs that are cytosolic (36 genes). We limit ourstudy here to the cytosolic GSTs so that we could combinemolecular evolutionary analyses with protein structure analy-ses using known structures of cytosolic GSTs. Ranson et al.(2001, 2002) previously identified 36 D. melanogaster cytosolicGSTs and classified them into six classes (d, e, s, u, v, and z).Strictly speaking, only a subset of the 36 has been biochemi-cally tested for GST activity (specifically, GSTD1, GSTD2,GSTD3, GSTD7, GSTD9, GSTD10, GSTE1, GSTS1, CG6673,CG6776, CG6662, and CG6781; Sawicki et al. 2003; Kim et al.2006). However, homology with other GSTs suggests that theothers may have this activity. As is common in the literature, wewill refer to these as GSTs even though a more accurate de-scription would be proteins homologous to known insect GSTs(Ranson et al. 2001). To ensure that we had a complete set ofGST sequences from D. melanogaster, we performed a motifprofile-based search. Members of known D. melanogaster GST d,e, s, u, v, and z were used as training sequences for MEME(Bailey and Elkan 1994) to build a position-specific proba-bility matrix. Not surprisingly, the highest scoring matrix ispart of the SNAIL/TRAIL motif that is thought to functionas the glutathione-binding motif (Koonin et al. 1994). Thishighest scoring matrix was then used as input for MAST(Bailey and Gribskov 1998) to search the FlyBase release 4.3D. melanogaster translated sequences for matching sequences.An arbitrary e-value of 0.1 was assigned as the cutoff score andall sequences that passed this criterion were gathered. Thetraining and searching processes were repeated using newlygathered sequences as inputs until no new sequences werefound. A total of 38 cytosolic GST genes were found and werefer to these GSTs as our reference set of GSTs. This referenceset differs from the set reported in Ranson et al. (2001, 2002)and in Claudianos et al. (2006) in the following ways:

1. We included CG4623, whereas Ranson et al. (2001)excluded it, because it was too long and too diverged fromother GSTs. The reason for its inclusion was because mul-tiple alignment of this gene with other D. melanogaster GSTsshowed that it has the conserved serine in the putativeSNAIL/TRAIL motif, which was suggestive of GST catalyticactivity. It is also classified as a member of the GST multi-gene family by the INTERPRO database.

2. We kept the two d cluster pseudogenes discarded byRanson et al. (2001) so that we could identify potentiallyfunctional orthologs in other Drosophila species.

3. Our gene list counts CG6673 only once, while acknowledg-ing that two proteins appear to be produced from this locusby splicing of alternate exons.

Manual annotation: Each of 38 D. melanogaster referenceGSTs was used as input for tblastn searches against the other11 Drosophila species assemblies (Drosophila 12 Genomes

Consortium 2007). To ensure that we identified all GSTgenes, we used a low stringency e-value of 0.01 as a cutoff. Thise-value was low enough to detect orthologs, paralogs within thesame class, and some paralogs in different classes. However, itwas not high enough for a d GST to identify a s GST and viceversa. The coordinates of each tblastn hit were parsed into thecorresponding Drosophila species scaffolds using our custom-ized perl scripts, which utilized some Bioperl modules(Stajich et al. 2002). The tblastn hits were merged if theircoordinates overlapped and were used as inputs for blastxagainst translated sequences from D. melanogaster (FlyBase

1364 W. Y. Low et al.

Page 3: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

release 4.3). If the top blastx hit was not a GSTof our referenceset, then the sequence was discarded. The remaining sequenceswere manually annotated using Artemis v6 (Rutherford et al.2000).

We identified several potential pseudogenes by their dis-ablement features such as premature stop codons, frameshifts,and gene truncations (relative to the canonical GST gene).There were also sequences with low-quality sequence tracefiles and/or gaps in the genome sequence assembly and thesewe termed ‘‘partial.’’ A second round of annotation was per-formed after examining the information on gene orders,multiple alignments, and neighbor-joining distance trees.All annotated genes were checked for consistency of genelength, intron/exon structure, suspicious insertion/deletion,tree branch length, and, if necessary, the actual sequence tracefiles. A total of 418 full-length GST genes and 24 pseudogenesfrom the 12 Drosophila species were found (our annotationsare available at http://128.250.105.100/�charlesrobin/ROBI_MAN/ALL_GST).

Phylogenetic tree: Deduced amino acids were aligned usingClustalW,SEQBOOT,PROTDIST,NEIGHBOR,andCONSENSEfrom the PHYLIP v3.66 package were used to compute a boot-strapped neighbor-joining tree containing 407 GST sequen-ces (Felsenstein 2005). The set of 11 orthologous CG10065proteins were not included in this tree because the GST do-main is only part of the polypeptide encoded by this gene. Anindependent 100-replicate bootstrapped tree was done forthis set. Nodes with ,75% bootstrap support were collapsedand the remaining clades formed the basis of assignments oforthologous relationships and of the assignment to gene setsfor PAML analysis and for the dating of gene duplicationevents relative to the species phylogeny.

PAML: models for heterogeneous selective constraint oncodons: Twenty-two sets of GST orthologs were analyzedwith CODEML of PAML v3.14 (supplemental Table 1 at http:www.genetics.org/supplemental/) (Yang and Nielsen 2002).CODEML uses a maximum-likelihood method introduced byGoldman and Yang (1994), which accounts for multiple hitsand differentially weights evolutionary changes between dif-ferent codons. The amino acid sequences within the sets werealigned using ClustalW and then the nucleotide coordinateswere mapped to the corresponding amino acid alignment us-ing the program MRTRANS (Pearson 1990). Following this,all alignments were manually inspected prior to CODEML.The tree topology supplied for CODEML followed the spe-cies tree in Figure 5. For all CODEML analyses, we used theF3x4 codon model of Yang and Nielsen (2000), which cal-culates codon frequencies from the nucleotide frequencies atthe three codon positions. The key parameter that CODEMLestimates is the ratio of rates of nonsynonymous to synonymoussubstitution (v) and these can be estimated for each codon ina multiple alignment. If v . 1, then positive (or, to be specific,diversifying) selection has favored amino acid substitutions.If v , 1, then negative (or purifying selection) has preventedamino acid substitution, and if v ¼ 1, then the sequenceis evolving as if it is neutral. To detect site-specific positiveselection, we ran model M0, M1a, M2a, M3 (K ¼ 3), M7, andM8 for each of the 22 GST data sets. Model 1a assumes thatcodons fall into two types, one where 0 , v0 , 1 and the otherwhere v1 ¼ 1. This is consistent with Kimura’s neutral theoryof evolution where some sites can evolve under selectiveconstraint and others are free to change as mutations arisein them. Model 1a is thus referred to as a neutral model. Model2a is an extension of model 1a that allows a third proportion ofsites with v2 . 1 to be estimated. This is referred to as a positiveselection model. Model 3 uses a general discrete distributionwith three site classes having proportions p0, p1, and p2 withthe corresponding v0, v1, and v2, respectively. Model 7 as-

sumes a b distribution for 10 site classes of v between 0 and 1,whereas model 8 adds a class of sites that has v .1 whencompared to model 7. To avoid being trapped at local optima,three different initial v-values of 0.5, 1.1, and 2.0 were usedin the estimation of the log likelihood for model 7 and model8. The highest value among the three runs was used insignificance calculations. The significance calculations usedthe likelihood ratio test (LRT) of Yang and Nielsen (2002),in which the test statistic was two times the difference of the loglikelihood of a positive selection model and the log likelihoodof the neutral model. The specific comparisons were modelsM2a vs. M1a and M8 vs. M7. The range of the total tree length,S, of model M0 outputs for all GST sets studied should allowrelatively high power and accuracy in detection of positiveselection (1.8 , S , 8.1; Anisimova et al. 2002). We useda Bonferroni correction for multiple tests and an empiricalBayes calculation to assess confidence in the positively selectedsites (posterior Bayesian probability .95%).

Polymorphism analysis: Sequences of GstD1 from the y; cnbw sp strain of D. melanogaster and two D. simulans strains (NewCaledonia, wcl29f11.g1; w501, uya58f03.b1) were obtained fromFlyBase and the NCBI trace archives, respectively. We alsosequenced one GstD1 allele from each of 36 D. melanogasterlines (see Drosophila strains below) using direct sequencingand/or sequencing from plasmids. The cloning strategy wasused if any of the direct sequencing of PCR products displayedheterozygosity, in which case we randomly chose one plasmidto sequence. The cloning system used was pGEM T-Easy(Promega, Madison, WI). Sequencing of cloned PCR productsof GstD1 from six Australian lines from D. simulans was alsodone. PCR reactions were performed in 50 ml containing 10mm Tris–HCl (pH 8.3), 1.5 mm MgCl2, 50 mm KCl, 200 mm ofeach dNTP, 1 mm of each primer, and 5 units of Taq polymerase(Promega). The primers dmel59GstD1f: 59-AAGAAGTTGGAGATTTGTTCACTC-39 and dmel39GstD1r: 59-CTTCTTGAACTCCAGGCATC-39 were used to amplify D. melanogaster geno-mic DNA, and the primers dsGstD1f: 59-CTCCTTCAGTCATATGGCTGACTTCTACTAC-39 and dsGstD1r: 59-AAAACGTGAATTCCAGGCTTATTC-39 were used to amplify D. simulans geno-mic DNA. The PCR temperature cycling conditions were 1 cycleof 95� for 2 min, 35 cycles of 95� for 30 sec, 55� for 30 sec, and 72�for 30 sec. The program DNAsp, version 4, was used to calculateu, Tajima’s D, and the McDonald and Kreitman test (Rozas

and Rozas 1999).Drosophila strains: GstD1 was sequenced from the following

D. melanogaster lines: 11 Australian lines (5 were provided byA. Hoffmann), 6 were from Drosophila Genomic Resource Center(DGRC) (103414, 103415, 103416, 103417, 103418, 103407), 8United States lines ½7 inbred lines from 2003 in Raleigh, NC,were provided by C. H. Langley, 1 from Bloomington (stock5)�, 5 were African lines (Malawian lines from C. H. Langley),2 Papua New Guinean lines from Ehime (E-10043, E10044),4 Japanese lines ½3 from Ehime (E-10005, E-10010, E-10012)and 1 from DGRC (103408)�, 1 Kazakhstan line from DGRC(107659), 1 Swedish line from DGRC (107660), 1 Ukrainianline from Bloomington (4266), 1 Columbian line fromBloomington (3843), and 1 Spanish line from Bloomington(3844). The lines from Kazakhstan, Sweden, Ukraine, and 1 ofthe United States lines were collected before 1940 and hencerepresent pre-DDT lines. GstD1 was also sequenced from 6D.simulans lines collected from the east coast of Australia andwere provided by A. Hoffmann.

Homology modeling of GSTD1 and molecular docking:Individual domains of D. melanogaster GSTD1 and GSTD2 weremodeled on the crystal structure of a GST from Lucilia cuprina(Wilce et al. 1994) using COMPOSER as contained in Sybyl7.2(Tripos). The amino acid sequence identity of D. melanogasterGSTD1 and GSTD2 when compared to the L. cuprina u GST is

Molecular Evolution of Drosophila GSTs 1365

Page 4: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

83 and 68%, respectively. The individual models of the N- andC-terminal domains were then assembled into a domain pairand the GST dimer was constructed in Swiss-PDB viewerfollowed by energy minimization. Verify3D was used to eval-uate the quality of the models (Eisenberg et al. 1997).

AutoDock 3.0.5 was used to explore the binding of DDT tothe GSTD1 model (Morris et al. 1996). The docking region wascentered at the binding site of glutathione. The Lamarckiangenetic algorithm was used to produce 100 conformations,which were clustered on a root-mean-squared deviation of1.8 A2 and the docked conformations were inspected withVMD1.8.2. The docking simulations were carried out in thepresence of glutathione as well as without the glutathione.Another docking simulation was carried out centering at thepositively selected site Lys171.

RESULTS

Patterns of GST gene gain and loss: Our search ofthe genomic sequences of 12 Drosophila species iden-tified 429 genes and 24 pseudogenes in the GST multi-gene family. Included in the ‘‘active’’ genes are 11 genesfor which only partial sequence is available (because ofsequence gaps), and in the ‘‘pseudogene’’ category thereare 8 genes with only single inactivating mutations andthese may be only null alleles. All six classes of GSTs(d, e, s, u, v, and z) are represented throughout theradiation of the 12 Drosophila species (Figure 1). Thetotal number of GSTs within a species range from 29 to45 with most of the variation attributable to the two larged and e gene clusters (Figures 1 and 2). Apart from thevariation in these large clusters, which are described inmore detail below, there are three other copy numbervariations observed:

i. The Drosophila subgenus species (D. virilis, D.mojavensis, and D. grimshawi) have three rather thanfour v GSTs,

ii. D.grimshawi has two co-orthologs of CG9362 (a z GST).The level of divergence between these genes (�40% atsilent sites) indicates that they have not arisen from anassembly artifact and, since they are adjacent, theyappear to have arisen by unequal crossover.

iii. The copy number of genes with high sequencesimilarity to D. melanogaster CG30000 and CG30005differed across the phylogeny as shall be detailedfurther below.

To determine orthologous/paralogous relationships,we constructed a neighbor-joining tree based on theamino acid sequences (supplemental Figure 1 at http://www.genetics.org/supplemental/). To be conservativewith our assignments, we considered microsynteny con-servation and relative position and orientation of thegenes within clusters and considered only nodes with.75% bootstrap support. This left us with several typesof phylogenetic relationships. There were clades thatclearly identified sets of orthologs, and some of thesecontained a single gene from all 12 species (e.g., GstE10).The genes in these clades were named according to theD. melanogaster gene. Other ‘‘orthologous clades’’ did nothave a representative from all species (e.g., GstD4) and,if these were without a D. melanogaster gene, they weregiven a new name as shown in Figure 2 (e.g., GstE11).There were 20 genes, including 7 in the D. willistoni d GSTcluster, that were not grouped into any clade (these havean ‘‘x’’ in their names, e.g., DwilDx1). There were alsoclades that contain paralogs and these genes were namedto match the orthologous D. melanogaster gene and givenan added letter if required (e.g., GstD12 and GSTD12aare paralogs).

Figure 3 illustrates a reconstruction of gene duplica-tion and loss over the Drosophila radiation. This wasconstructed using a ‘‘tree reconciliation approach’’ inwhich the nodes with .75% on the gene tree were

Figure 1.—The total number of GSTgenes and pseudogenes in the 12 Dro-sophila species. For each species, thenumber of GSTs genes (left) and pseu-dogenes (right) are shown with the dif-ferent classes represented by differentshading (see key). The tally of pseudo-genes includes only those identified withour tblastn filtering criteria and doesnot account for gene losses inferredfrom the phylogenetic reconstructionin Figure 3.

1366 W. Y. Low et al.

Page 5: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

placed on the best-supported species tree (Pollard

et al. 2006) following the principle of parsimony (Nam

and Nei 2005). The analysis identifies 22 of the 38 GSTsin our original reference set as having orthologs in boththe Drosophila and Sophophora subgenera, and thesetherefore existed in the ancestor of the 12 species

studied. It also identifies 24 gene duplication events.Of those 24, 19 have occurred in either the d or e geneclusters, 18 (of the 24) fall on the terminal branches,and another 2 occur on the long branch shared bythe sibling species D. persimilis and D. pseudoobscura. Byconsidering only the nodes with .75% bootstrap support,

Figure 2.—The d and e GST clusters of the 12 Drosophila species are shown in A and B, respectively. Full-length GSTs are shownas arrowed boxes. The direction of the arrows represents the direction of transcription. The boxes are colored to indicate ortho-logs. Paralogous clades supported by bootstrap score $75 that lack orthologs are also colored (e.g., GstE15, GstE15a, GstE15b).The white boxes represent genes that do not occur in clades with a bootstrap score $75. The ‘‘c’’ and circled P signs below eachbox indicate GST pseudogenes and partial sequences, respectively. The numbers above the vertical bars represent the number ofbases excluded to keep the rest of the figure to scale. The ‘‘j j’’ symbol represents sequence gaps in contigs.

Molecular Evolution of Drosophila GSTs 1367

Page 6: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

many genes would not be accounted for and so Figure 3also indicates the youngest point at which the duplica-tion may have occurred. Many, if not most, of thesegenes may have occurred in the ancestral Drosophila; itis just that the phylogenetic signal is not strong enough toassert this, or there has been gene loss. A corollary is thatsome true orthologs may not be identified and may begiven different names (e.g., GSTD2 and GSTD11 could beorthologs). In such cases, the lack of bootstrap supportmay be a result of homoplasies introduced by high levelsof amino acid divergence or may reflect homoplasycaused by intergene recombination.

The reconciliations of the gene tree with the speciestrees indicate that there have been at least four dupli-cations of CG30000/CG30005 within the Drosophilalineages studied here. The most ancient of these led toCG30000 and CG30005 and this most likely happenedafter the Drosophila subgenus separated from the otherlineages (although a loss in the Drosophila subgenusis possible). The gene trees suggest a much more re-cent duplication of CG30000/30005 in the D. willistonilineage and a duplication resulting in a third genein D. ananassae. We also detected two CG30000-like se-quences in the obscura group species, which are sepa-rated by �7 kb and oriented in opposing directions.While repeats and gaps in the sequence coverage makethe genome assemblies suspect, these sequences do nothave long open reading frames, suggesting that they arepseudogenes. One of the fragments is missing an intronthat is observed in functional paralogs, indicating thatit may have arisen by retro-transposition.

Rates of GST ortholog evolution: An inspection ofamino acid alignments and phylogenetic trees revealsthat GSTs evolved at different rates across the Drosoph-ila radiation. This is unsurprising as GSTs from differentclasses have different functions, expression levels, andsites of expression and are therefore expected to exhibitdifferent levels of selective constraint. To quantify thevariation in selective constraint, maximum-likelihoodestimates of v, the dN/dS ratio, were used as imple-mented in PAML (Yang and Nielsen 2002). Thisstandardizes the amount of amino acid changes ob-served relative to the synonymous site divergence andwas performed on the 22 orthologous gene sets that con-tained sequences from both subgenera. LRTs indicatedthat a model allowing three different v per orthologousgene set (model 3 with K¼ 3) is significantly better thanmodel 0, which allowed only a single v per orthologousgene set (P , 0.05). Figure 4 represents the three mostlikely v-values estimated for each orthologous set, andthe relative proportions of sites with those values. Thisenables the GSTs to be partitioned by evolution rates withsome relatively uncharacterized genes such as CG1681,CG1702, and CG16936 among the slowest evolving, andothers such as CG4623, CG4688, and CG11784 amongthe quickly evolving.

Evidence for adaptive evolution: One of the hall-marks of adaptive evolution is a greater rate of substi-tution at nonsynonymous sites than at synonymous sites.With sets of closely related species it is possible to identifyspecific codons that exhibit an increased rate of non-synonymous substitution. Figure 4 shows that GSTD1 and

Figure 3.—GST genegain and loss is representedon the species tree. Gene du-plications that have 75% orgreater bootstrap supportare shown in boldface typeabove the branches of thespecies tree. Inferred genelosses are denoted with ‘‘c’’or, if in a terminal branch ofthe species tree, are listednext to the species name.To account for all genes, du-plications not supported by.75% bootstrap supportare indicated in italics at theyoungest possible branchwhere those duplicationscould have occurred. Thenumber of d and e genes thatare not found in clades with.75% supports are listed un-der ‘‘,75% bootstrap.’’ ‘‘Dc’’refers to the genes that haveduplicated and becomepseudogenes within singlelineage. The prefix ‘‘GST’’has been removed fromGST d and e genes for clarity.

1368 W. Y. Low et al.

Page 7: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

CG6776 have some sites with v . 1; however, to formallytest whether adaptive models fit the patterns of sequencedivergence better than neutral models, we used two stan-dard PAML comparisons (model M2a vs. M1a and M8 vs.M7; Yang and Nielsen 2002; Lynn et al. 2005; Balakirev

et al. 2006). Of 22 sets of GSTs subjected to site-specificCODEML analyses, only GstD1 was found significant afterBonferroni correction for multiple tests (P-value¼ 0.0014,which is ,0.05/22; Table 1). The amino acid state of thepositively selected site in GSTD1 across the 12 species isshown in Figure 5. This shows that a glycine / lysinesubstitution at site 171 must have occurred at least threetimes in the Drosophila radiation. This requires at leasttwo sites in the codon to change (GGN / AAR) and is abiochemically radical change.

Could DDT be the selective agent driving the G171Ksubstitution in GSTD1? Tang and Tu (1994) have pre-viously shown that D. melanogaster GSTD1 possessesDDTase activity and so it has been implicated in DDTmetabolism. This activity and the strong pattern of par-allel substitution that we observe in the Drosophila radi-ation brings to mind multiple instances where the strongselective pressure imposed by insecticides has resultedin parallel molecular evolution. These include examplesof parallel amino acid changes at target molecules ofinsecticides (e.g., Kdr and Rdl; Thompson et al. 1993;Miyazaki et al. 1996) and in detoxification proteins (e.g.,a-esterase-7; Claudianos et al. 1999) as well as regulatorymutations (e.g., mosquito Gste2 and Drosophila Cyp6g1;Schlenke and Begun 2004; Lumjuan et al. 2005).

To test whether the G171K replacement is associatedwith recent adaptive evolution to DDT, we surveyed poly-morphism in 36 D. melanogaster and 8 D. simulans alleles.From the sequence data all D. melanogaster have lysine atresidue 171 of GSTD1 and all D. simulans lines have aglycine. This shows that K171 in D. melanogaster is likelyto be fixed in worldwide populations. Among thesealleles, 4 were obtained from lines collected before theuse of DDT and so it is unlikely that the G171K changeoccurred in response to DDT selection. Furthermore,the four pre-DDT alleles have the same amino acid se-quence as the GstD1 allele found to have DDTase activity,suggesting that the DDTase activity predates DDT.

In theory, the recent selective fixation of a beneficialmutation is expected to produce patterns of reducednucleotide diversity (Smith and Haigh 1974) and anexcess of rare alleles as subsequent mutations occur(Braverman et al. 1995). As more time elapses after aselective sweep, a surplus of high-frequency-derived al-leles occurs at sites near the target of selection (Fay andWu 2000). The nucleotide diversity (u) for our globalcollection of 36 D. melanogaster alleles was 0.0035 whereasit was 0.0072 for the 8 D. simulans alleles (Table 2). Thisdifference in u (¼ 4Nem) could, at least partially, beattributed to differences in effective population size ofthe two species (Kliman et al. 2000). Tajima’s (1989) Dtest does not detect an enrichment of rare alleles amongthe D. melanogaster GstD1 alleles (P . 0.10). Furthermore,the Mcdonald and Kreitman (1991) G-test, which com-pares the ratio of replacement to silent-site polymorphisms

Figure 4.—Outputs of model 3 ofPAML are represented graphically, al-lowing comparison of selective con-straint among GST genes. Model 3uses a maximum-likelihood method toassign three v-values to each gene setwhere v0, v1, and v2 have the propor-tions p0, p1, and p2, respectively (we rep-resent the proportions as percentagesalong the x-axis). While each gene sethas only three v-values, we have dividedthe v-values observed across the whole22 gene sets into four different ranges:highly constrained sites (0 # v # 0.1),moderately constrained sites (0.1 # v# 0.5), relaxed sites (0.5 # v # 1.0),and positively selected sites (v . 1).Genes with many sites evolving slowlyhave a high proportion of sites in the0 # v # 0.1 range and are mostly solidin the figure. These values were calcu-lated for genes inferred to be presentin the ancestral Drosophila (see Figure3) and were calculated with all availableorthologs. The CG10065 here refersonly to the GST domain of this chimericgene.

Molecular Evolution of Drosophila GSTs 1369

Page 8: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

within species with the ratio of replacement to silent-site divergence between species shows no significantdeviation from neutral expectations (G-test value withWilliams’ correction ¼ 2.34, P-value ¼ 0.13). Theseanalyses support the interpretation made from the four‘‘historical’’ D. melanogaster lines—that positive selectionfor the G171K site has not happened over the past60 years in the D. melanogaster lineage.

Homology modeling and molecular docking of DDTto GSTD1: To surmise the potential structural andbiochemical role of the G171K replacement in GSTD1,homology models of GSTD1 and GSTD2 were constructed

using the crystal structure of an orthologous proteinfrom the sheep blowfly, L. cuprina (Figure 6A; Wilce

et al. 1994). This indicates that the G171K replacementoccurs at a site remote from the typical GSTactive site. Insilico docking studies of a known GSTD1 substrate, theinsecticide DDT, onto this homology model suggest thatthis typical active site is used in GSTD1. Visual in-spection shows that DDT docks best in the presence ofglutathione with the trichloro group of DDT orientatedtoward the sulfur atom on glutathione. This conforma-tion agrees with the GSH-dependent dehydrochlorinaseactivity proposed by Matsumura (1985). DDT docks lesswell into the active site of GSTD2, which does not haveDDTase activity in vitro (Tang and Tu 1994).

Despite the G171K site being distant from the activesite, we observed a crevice in the homology modeladjacent to site 171 in which we were able to dock DDT.This raises the possibility that the crevice may act as asecondary binding site that may have an allosteric effecton the active site. Interestingly, this crevice is absent in theL. cuprina structural model because of a phenylalaninerather than an isoleucine at position 174, suggesting thata substitution here, in an ancestor of Drosophila, maybe a prerequisite for the G-K substitutions.

DISCUSSION

Gene duplication and loss: Recent comparativeanalyses of insect genomes have revealed stark differ-ences in the number of GST genes. There are 38 and28 cytosolic GST genes identified in the D. melanogasterand An. gambiae genomes, respectively, while only 8 havebeen found in the Apis mellifera genome (Claudianos

et al. 2006). Hahn et al. (2005) cautioned againstmaking adaptive interpretations of such differences be-cause variation may simply be attributed to stochasticchanges in gene number over long periods of evolu-tionary time. On the other hand, genes that lack a

TABLE 1

Maximum-likelihood parameter estimates for GstD1 and their significance

ModelNo. of

parameters Estimates of parametersa ln L 2D(ln L)c P-value

M0: one ratio 1 v0 ¼ 0.04 �2393.88M1a: neutral 1 p0 ¼ 0.94, p1 ¼ 0.06 �2353.24

v0 ¼ 0.02, v1 ¼ 1.00M2a: positive selection 3 p0 ¼ 0.93, p1 ¼ 0.06, p2 ¼ 0.01 �2351.08 4.31 0.116

v0 ¼ 0.02, v1 ¼ 1.00, v2 ¼ 4.70M3: discrete 5 p0 ¼ 0.84, p1 ¼ 0.15, p2 ¼ 0.01 �2335.11

v0 ¼ 0.01, v1 ¼ 0.25, v2 ¼ 3.38M7: b 2 v from b(0.10, 1.44)b �2341.55M8: b and v 4 p0 ¼ 0.99, v from b(0.11, 2.07) �2334.94 13.22 0.0014

(p1 ¼ 0.01) v1 ¼ 3.60

a The proportion of sites (p0, p1, . . .) estimated to have v0, v1, . . . .b In M7 and M8 10 v-values that are ,1 are estimated from a b-distribution defined by b(x, y).c Significance was assessed as two times the difference in log likelihoods of corresponding positive selection

and neutral models.

Figure 5.—The state of the amino acid residue 171 ofGSTD1 in the 12 Drosophila species. This site was estimatedto be positively selected using empirical Bayes with probability.95% under model M8. The species tree is adapted fromFlyBase.

1370 W. Y. Low et al.

Page 9: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

function are rapidly lost from Drosophila genomes(Petrov et al. 1996) and so it may be presumed thatgenes that have been maintained over millions of yearswould have some function. Such functions may make aGST distinct from all others encoded in the genome ormay contribute to a threshold of general GST activity.Duplicate GST genes may have arisen as adaptations tonew environments or could have nonadaptive originsand have been maintained through, for example, com-plementary degenerative mutations accumulating induplicate copies (Force et al. 1999).

The molecular evolution of GSTs over the 50 millionyears represented in the divergence of the 12 sequencedDrosophila genomes allows us to make some inferencesabout the degree of redundancy in GST genes by theirflux in gene numbers over evolutionary time and by therelative levels of selective constraint exhibited amongorthologous comparisons. The species with the mostGSTs, D. ananassae and D. willistoni, also seem to have agreater number of genes from other multigene families(R. T. Good and C. Robin, unpublished results) and alsohave larger genomes (http://rana.lbl.gov/drosophila/wiki/index.php/Assembly_Summaries). So rather thana GST-specific phenomenon, the relative abundance ofGSTs may indicate lineage effects such as greater popu-lation sizes allowing duplications with smaller selectivecoefficients to become fixed and then be maintained.The species with the fewest GSTs are the three species of

the Drosophila subgenus and this correlates with the e

and d clusters being dispersed over multiple contigs.Rather than being an artifact of missing data, the smallernumber of genes in these species may reflect a reducedrate of gene duplication in smaller gene clusters.

The presence of pseudogenes that are orthologs tofunctional genes is interesting because they can beconsidered ‘‘natural knockouts’’ and suggests that theorthologous genes are not vital. Four of the 10 d GSTgenes in the D. melanogaster genome have orthologs withpremature stop codons or frameshifts or are truncatedin one or more members of the D. melanogaster subgroup(Dmel_GstD3, Dere_GstD4, Dmel_GstD5, Dsec_GstD6). Afifth d GST gene, GstD7 from the P2 strain of D.melanogaster, was also originally annotated as a pseudo-gene because of a two-nucleotide insertion disruptingthe open reading frame (Tounget al. 1993; accession no.M97702); however, this indel is not present in the y; cnbw sp strain of the D. melanogaster genome sequence (re-lease 5 FlyBase). Three other d genes (GstD2, GstD10,and GstD9) are pseudogenes in more distant species(D. ananassae, D.virilis, and D. mojavensis, respectively;Figure 2). Similarly, two D. melanogaster e GST genes(GstE1 and GstE6) have orthologs with inactivating mu-tations and D. melanogaster appears to have lost GstE11and GstE3a. Although we refer to these as pseudogenes,many have only one detectable inactivating mutationand these may be polymorphic, so that they may be betterdescribed as null alleles. Petrov et al. (1996) estimatedthat it would take 11 million years to loose half a pseudo-gene through deletions. If we consider only the mela-nogaster group, which is thought to have diverged withinthe last 10 million years, there are 12 putative ‘‘pseudo-genes,’’ but 7 of these have only one inactivating mutationand may actually be null alleles. This age spectrum of non-functional copies is consistent with the idea that GSTs mayexhibit a high load of null alleles because their de-toxification functions are required only sporadically.

Difference in selective constraint: At the otherextreme are the genes that have weathered the tests ofevolutionary time. This includes genes within the large d

and e gene clusters, which would have been represented

TABLE 2

Polymorphism statistics for GstD1 in D. melanogasterand D. simulans populations

Species Na Sites Sb uc pd Tajima’s D

D. melanogaster 36 681 10 0.0035 0.0029 �0.582D. simulans 8 586 11 0.0072 0.0077 �0.139

a Number of alleles studied.b Number of segregating sites.c Watterson’s u (per site) estimated from the number of seg-

regating sites.d Nucleotide diversity.

Figure 6.—Molecular docking of DDT and glu-tathione to the homology modeled D. melanogasterGSTD1 monomer showing the best-docked con-formations when site of docking centered on(A) the GSH-binding site and (B) the positively se-lected residue K171.

Molecular Evolution of Drosophila GSTs 1371

Page 10: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

by at least four d and three e GST genes in the ancestralDrosophila. The conserved genes in these large clustersoccur at the edge of the gene clusters in extant species,complementing the observation that the pseudogenestend to be located in the centers of the clusters. Four ofthe ancestral GSTs (GstD1, GstD9, GstE9, and GstE4) areamong the slowest evolving GSTs, suggesting that theseeither have their own unique functions or have spatial/temporal expression patterns that impose significantlygreater selective constraint on them than on the othergenes in these clusters (Figure 4). Therefore, the pat-terns of evolution of these specific genes indicate thatthey are not redundant in natural ecological settings.

Among the GST genes exhibiting the greatest selec-tive constraint is CG9363, which is one of two z GSTgenes in the Drosophila genome. The other z GST,CG9362, is evolving at a much faster rate. We thereforesuggest that CG9363 is more likely to perform themaleylacetoacetate isomerase activity, highly conservedacross eukaryotes, that functions in the catabolism ofphenylalanine and tyrosine (Fernandez-Canon andPenalva 1998). CG9362 is likely to have a derivedfunction, requiring less selective constraint. The dupli-cation of these genes would have arisen before thedivergence of Drosophila species but, on the basis ofcomparisons to the An. gambiae genome, probably afterthe split between higher and lower diptera.

Each of the v GSTs is evolving relatively quicklyperhaps because v GSTs have several features that makethem structurally distinct from other GSTs, includingthe fact that they bind glutathione through a cysteineresidue at position 32 rather than through a tyrosine ora serine at about position nine (Board et al. 2000). Theslowest evolving v, CG6781, has recently been shown tobe the structural gene of the sepia locus and so it has arole in the biosynthesis of red eye pigments (Kim et al.2006). This adds 6-pyruvoyl tetrahydropterin to a di-verse set of proposed substrates for v GSTs, includingproteins damaged by S-thiol adducts and dehydroascor-bic acid (Ishikawa et al. 1998; Board et al. 2000). Whilethe in vivo substrates for three of the four D. melanogasterv GSTs are unknown, it is noteworthy that the A. melliferaand An. gambiae genomes have only a single v GST, andphylogenetically this is most similar to CG6662, whichhas the lowest specific activities of all v GSTs for thetested substrates (Kim et al. 2006).

Adaptive evolution: One of the motivations of thisresearch was to identify GSTs that are evolving adap-tively as these may help us to identify the genes that areinvolved in detoxification. Our screen using site-specificmodels of PAML recovered only a single site in a singlegene (GstD1). This test can give false positives (Suzuki

and Nei 2004), particularly if the silent sites are saturated.This appears not to be the case in GstD1 because theNei and Gojobori divergence at silent sites (Ds) is great-est in the comparisons spanning the longest evolution-ary time (Sophophora vs. Drosophila subgenus) and the

maximum Ds value is 0.8 (i.e., less than one substitutionper site). Furthermore, the prior literature on GSTD1makes it an excellent candidate for a gene undergoingpositive selection. Three aspects of GSTD1 have caughtour attention. First, GSTD1 has been shown to haveDDTase activity (Tangand Tu 1994). In addition, the labstrain PSU-R, which has been selected from Oregon-Rwith increasing DDT concentrations over 2 years, showedapproximately twofold higher expression of GSTD1by Western blot analysis when compared to Oregon-R(Tang and Tu 1994). Second, GstD1 has been used as anexample of a gene with high codon bias and our analysisof codon bias suggests that this is the most biasedGST among all the Drosophila species (Bielawski andYang 2005). And third a recent microarray study onD. mojavensis, comparing the transcriptional response todifferent cactus-derived food substrates, highlighted anortholog of GstD1 as a noteworthy responder (Matzkin

et al. 2006).Here we found that the same radical amino acid

substitution has occurred at the same site three timesin the Drosophila radiation. GstD1 is one of the slowestevolving GST genes in the Drosophila genome and alysine is not seen in the homologous site in any otherDrosophila GST d paralogs. All this suggests that thissubstitution is not neutral and has a functional effect.Homology modeling indicates that this site is on thesurface, remote from the active site. However, the modelalso shows a crevice adjacent to site 171 that may be asecondary substrate-binding site and either may have anallosteric effect on active site catalysis or may act as a site ofsequestration as is seen in other GSTs (Kostaropoulos

et al. 2001). Another possibility, given that this site ispositioned away from the dimerization site, is that itcould affect protein/protein interactions. Yeast two-hybrid analyses suggest that GSTD1 interacts with theproducts of the Sala and CG10200 genes, neither ofwhich are well characterized (Giot et al. 2003).

Parallel evolution is frequently seen in instances ofinsecticide resistance. In two genera of mosquitoes,Anopheles and Aedes, an e GST is associated with DDTresistance (Ranson et al. 2004; Lumjuan et al. 2005).The enzymes have been shown to have DDTase activityin vitro, and the genes are constitutively upregulated inresistant strains. In An. gambiae, resistance maps to thee GST gene cluster. Another example of parallel evolu-tion is the detoxification enzyme, a-esterase 7, of flieswhere an amino acid substitution provides organophos-phate resistance in several species, including L. cuprinaand Musca domestica (Claudianos et al. 1999; Oakeshott

et al. 2005). The substitution of an aspartic acid in placeof a glycine in the catalytically important oxyanion holeconverts the enzyme from an esterase to a phosphatase(Newcomb et al. 1997).

There are two levels of evidence to suggest that theG171K mutation in GSTD1 has not been a responseto insecticide usage. First, the amount and distribution

1372 W. Y. Low et al.

Page 11: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

of nucleotide variation in current populations ofD. melanogaster are inconsistent with a recent selectivesweep at this locus. Second, stocks collected before theuse of the first insecticide, DDT, have the derived lysinestate at 171. We can be confident that these historicalstocks are not contaminated because they contain raresusceptible alleles at the Cyp6g1 locus, a verified DDTresistance locus (Daborn et al. 2002). Nevertheless, theG171K substitution has arisen independently three timesin GSTD1, an enzyme that is an excellent candidate fora detoxification gene. The four species with a lysineat position 171 are all found in the western United Statesand this raises the possibility that the parallel G171K sub-stitutions may have been responses to a common selectiveagent.

One of the species with a lysine at position 171 ofGSTD1 is D. mojavensis. Matzkin et al. (2006) have foundthat this gene has 1.4 times higher transcript abundanceon flies reared on food containing organ pipe cactusrelative to flies reared on food containing agria. Theseauthors also cited an unpublished nucleotide sequencesurvey suggesting that GstD1 has been evolving adap-tively in D. mojavensis. It will be interesting to knowwhether G171K exists as a polymorphism in D. mojavensisor as a site of divergence with its sibling species.

Although we can conclude from the above data thatthe G171K substitution has not evolved in response toDDT selection, the testable hypothesis that the G171Ksubstitution may affect the kinetics of its DDTase activity(or other activities) still stands. In this context, we notethat Ivarsson et al. (2003) used a PAML approach toidentify a serine/threonine substitution that was poten-tially adaptive in mammalian m class GSTs. Site-directedmutagenesis of the human ortholog at this site changedthe specific activity of the enzyme toward a knownsubstrate, trans-stilbene oxide, 1000-fold. The ease bywhich GSTs are expressed in heterologous systems,purified, and crystallized makes them excellent modelsfor studying the evolutionary significance of amino acidsubstitutions to protein structure and function relation-ships and vice versa.

We thank Robert Trygve Good for helpful discussions and help inaspects of the bioinformatics and Lisa Bardsley for cross-checking ourpseudogene analyses. We also thank the Drosophila community, in par-ticular, the Assembly, Alignment, Annotation coordinating committeeand the Washington University, Broad, Agencourt, and J. Craig VenterSequencing Centres, for providing access to the genome sequences ofthe 12 species of Drosophila. This work is supported by the AustralianResearch Council through a Discovery Project (DP0557497) and theCentre for Environmental Stress and Adaptation Research. M.W.P. is anAssociation pour la Recherche Contre le Cancer Federation Fellow andNational Health and Medical Research Council Honorary Fellow.

LITERATURE CITED

Andolfatto, P., 2005 Adaptive evolution of non-coding DNA inDrosophila. Nature 437: 1149–1152.

Anisimova, M., J. P. Bielawski and Z. Yang, 2002 Accuracy andpower of Bayes prediction of amino acid sites under positive se-lection. Mol. Biol. Evol. 19: 950–958.

Bailey, T. L., and C. Elkan, 1994 Fitting a mixture model by expec-tation maximization to discover motifs in biopolymers. Proc. Int.Conf. Intell. Syst. Mol. Biol. 2: 28–36.

Bailey, T. L., and M. Gribskov, 1998 Combining evidence usingp-values: application to sequence homology searches. Bioinfor-matics 14: 48–54.

Balakirev, E. S., M. Anisimova and F. J. Ayala, 2006 Positive andnegative selection in the beta-esterase gene cluster of the Drosoph-ila melanogaster subgroup. J. Mol. Evol. 62: 496–510.

Bielawski, J. P., and Z. Yang, 2005 Maximum likelihood methodsfor detecting adaptive protein evolution, pp. 103–124 in StatisticalMethods in Molecular Evolution, edited by R. Nielsen. Springer Sci-ences, New York.

Board, P. G., M. Coggan, G. Chelvanayagam, S. Easteal, L. S. Jermiin

et al., 2000 Identification, characterization, and crystal structureof the Omega class glutathione transferases. J. Biol. Chem. 275:24798–24806.

Braverman, J. M., R. R. Hudson, N. L. Kaplan, C. H. Langley and W.Stephan, 1995 The hitchhiking effect on the site frequencyspectrum of DNA polymorphisms. Genetics 140: 783–796.

Clark, A. G., and N. A. Shamaan, 1984 Evidence that DDT-dehydrochlorinase from the house fly is a glutathione S-transferase.Pestic. Biochem. Physiol. 22: 249–261.

Claudianos, C., R. J. Russell and J. G. Oakeshott, 1999 The sameamino acid substitution in orthologous esterases confers organ-ophosphate resistance on the house fly and a blowfly. Insect Bio-chem. Mol. Biol. 29: 675–686.

Claudianos, C., H. Ranson, R. M. Johnson, S. Biswas, M. A. Schuler

et al., 2006 A deficit of detoxification enzymes: pesticide sensi-tivity and environmental response in the honeybee. Insect Mol.Biol. 15: 615–636.

Daborn, P. J., J. L. Yen, M. R. Bogwitz, G. Le Goff, E. Feil et al.,2002 A single p450 allele associated with insecticide resistancein Drosophila. Science 297: 2253–2256.

Ding, Y., F. Ortelli, L. C. Rossiter, J. Hemingway and H. Ranson,2003 The Anopheles gambiae glutathione transferase supergenefamily: annotation, phylogeny and expression profiles. BMCGenomics 4: 35.

Drosophila 12 Genomes Consortium, 2007 Evolution of genesand genomes on the Drosophila phylogeny. Nature 450: 203–218.

Eisenberg, D., R. Luthy and J. U. Bowie, 1997 VERIFY3D: assess-ment of protein models with three-dimensional profiles. Meth-ods Enzymol. 277: 396–404.

Fay, J. C., and C. I. Wu, 2000 Hitchhiking under positive Darwinianselection. Genetics 155: 1405–1413.

Felsenstein, J., 2005 PHYLIP (Phylogeny Inference Package)(http://evolution.genetics.washington.edu/phylip/faq.html).

Fernandez-Canon, J. M., and M. A. Penalva, 1998 Characterizationof a fungal maleylacetoacetate isomerase gene and identificationof its human homologue. J. Biol. Chem. 273: 329–337.

Force, A., M. Lynch, F. B. Pickett, A. Amores, Y. L. Yan et al.,1999 Preservation of duplicate genes by complementary, de-generative mutations. Genetics 151: 1531–1545.

Fournier, D., J. M. Bride, M. Poirie, J. B. Berge and F. W. Plapp, Jr.,1992 Insect glutathione S-transferases. Biochemical characteris-tics of the major forms from houseflies susceptible and resistantto insecticides. J. Biol. Chem. 267: 1840–1845.

Giot, L., J. S. Bader, C. Brouwer, A. Chaudhuri, B. Kuang et al.,2003 A protein interaction map of Drosophila melanogaster.Science 302: 1727–1736.

Goldman, N., and Z. Yang, 1994 A codon-based model of nucleo-tide substitution for protein-coding DNA sequences. Mol. Biol.Evol. 11: 725–736.

Grant, D. F., and B. D. Hammock, 1992 Genetic and molecular ev-idence for a trans-acting regulatory locus controlling glutathioneS-transferase-2 expression in Aedes aegypti. Mol. Gen. Genet.234: 169–176.

Hahn, M. W., T. De Bie, J. E. Stajich, C. Nguyen and N. Cristianini,2005 Estimating the tempo and mode of gene family evolu-tion from comparative genomic data. Genome Res. 15: 1153–1160.

Molecular Evolution of Drosophila GSTs 1373

Page 12: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

Huang, H. S., N. T. Hu, Y. E. Yao, C. Y. Wu, S. W. Chiang et al.,1998 Molecular cloning and heterologous expression of a glu-tathione S-transferase involved in insecticide resistance from thediamondback moth, Plutella xylostella. Insect Biochem. Mol.Biol. 28: 651–658.

Ishikawa, T., A. F. Casini and M. Nishikimi, 1998 Molecularcloning and functional expression of rat liver glutathione-dependent dehydroascorbate reductase. J. Biol. Chem. 273:28708–28712.

Ivarsson, Y., A. J. Mackey, M. Edalat, W. R. Pearson andB. Mannervik, 2003 Identification of residues in glutathionetransferase capable of driving functional diversification in evolu-tion. A novel approach to protein redesign. J. Biol. Chem. 278:8733–8738.

Kim, J., H. Suh, S. Kim, K. Kim, C. Ahn et al., 2006 Identification andcharacteristics of the structural gene for the Drosophila eye col-our mutant sepia, encoding PDA synthase, a member of theOmega class glutathione S-transferases. Biochem. J. 398: 451–460.

Kliman, R. M., P. Andolfatto, J. A. Coyne, F. Depaulis, M. Kreitman

et al., 2000 The population genetics of the origin and divergenceof the Drosophila simulans complex species. Genetics 156: 1913–1931.

Koonin, E. V., A. R. Mushegian, R. L. Tatusov, S. F. Altschul, S. H.Bryant et al., 1994 Eukaryotic translation elongation factor 1gamma contains a glutathione transferase domain: study of a di-verse, ancient protein superfamily using motif search and struc-tural modeling. Protein Sci. 3: 2045–2054.

Kostaropoulos, I., A. I. Papadopoulos, A. Metaxakis, E. Boukouvala

and E. Papadopoulou-Mourkidou, 2001 The role of glutathi-one S-transferases in the detoxification of some organophospho-rus insecticides in larvae and pupae of the yellow mealworm,Tenebrio molitor (Coleoptera: Tenebrionidae). Pest. Manag. Sci.57: 501–508.

Legal, L., B. Chappe and J. M. Jallon, 1994 Molecular basis ofMorinda citrifolia, L.: toxicity on Drosophila. J. Chem. Ecol. 20:1931–1943.

Lewis, J. B., and R. M. Sawicki, 1971 Characterization of the resis-tant mechanism to diazinon, parathion and diazoxon in the or-ganophosphorous resistant SKA strain of house flies (Muscadomestica L.). Pestic. Biochem. Physiol. 1: 275–285.

Lumjuan, N., L. McCarroll, L. A. Prapanthadara, J. Hemingway

and H. Ranson, 2005 Elevated activity of an Epsilon classglutathione transferase confers DDT resistance in the denguevector, Aedes aegypti. Insect Biochem. Mol. Biol. 35: 861–871.

Lynn, D. J., A. R. Freeman, C. Murray and D. G. Bradley, 2005 Agenomics approach to the detection of positive selection in cat-tle: adaptive evolution of the T-cell and natural killer cell-surfaceprotein CD2. Genetics 170: 1189–1196.

Matsumura, F., 1985 Toxicology of Insecticides. Plenum Press, NewYork.

Matzkin, L. M., T. D. Watts, B. G. Bitler, C. A. Machado and T. A.Markow, 2006 Functional genomics of cactus host shifts inDrosophila mojavensis. Mol. Ecol. 15: 4635–4643.

McDonald, J. H., and M. Kreitman, 1991 Adaptive protein evo-lution at the Adh locus in Drosophila. Nature 351: 652–654.

Miyazaki, M., K. Ohyama, D. Y. Dunlap and F. Matsumura,1996 Cloning and sequencing of the para-type sodium channelgene from susceptible and kdr-resistant German cockroaches(Blattella germanica) and house fly (Musca domestica). Mol. Gen.Genet. 252: 61–68.

Morris, G. M., D. S. Goodsell, R. Huey and A. J. Olson,1996 Distributed automated docking of flexible ligands to pro-teins: parallel applications of AutoDock 2.4. J. Comput. AidedMol. Des. 10: 293–304.

Nam, J., and M. Nei, 2005 Evolutionary change of the numbersof homeobox genes in bilateral animals. Mol. Biol. Evol. 22: 2386–2394.

Newcomb, R. D., P. M. Campbell, D. L. Ollis, E. Cheah, R. J. Russell

et al., 1997 A single amino acid substitution converts a carboxy-lesterase to an organophosphorus hydrolase and confers insecti-cide resistance on a blowfly. Proc. Natl. Acad. Sci. USA 94:7464–7468.

Oakeshott, J. G., A. L. Devonshire, C. Claudianos, T. D. Sutherland,I. Horne et al., 2005 Comparing the organophosphorus and car-

bamate insecticide resistance mutations in cholin- and carboxyl-esterases. Chem. Biol. Interact. 157–158: 269–275.

Oppenoorth, F. J., L. J. T. Van der Pas and N. W. H. Houx,1979 Glutathione S-transferase and hydrolytic activity in a tetra-chlorvinphos-resistant strain of housefly and their influence onresistance. Pestic. Biochem. Physiol. 11: 176–188.

Pearson, W. R., 1990 Rapid and sensitive sequence comparison withFASTP and FASTA. Methods Enzymol. 183: 63–98.

Petrov, D. A., E. R. Lozovskaya and D. L. Hartl, 1996 Highintrinsic rate of DNA loss in Drosophila. Nature 384: 346–349.

Pollard,D.A.,V.N.Iyer,A.M.MosesandM.B.Eisen,2006 Widespreaddiscordance of gene trees with species tree in Drosophila: evidencefor incomplete lineage sorting. PLoS Genet. 2: e173.

Ranson, H., B. Jensen, X. Wang, L. Prapanthadara, J. Hemingway

et al., 2000 Genetic mapping of two loci affecting DDT resis-tance in the malaria vector Anopheles gambiae. Insect Mol. Biol.9: 499–507.

Ranson, H., L. Rossiter, F. Ortelli, B. Jensen, X. Wang et al.,2001 IdentificationofanovelclassofinsectglutathioneS-transferasesinvolvedinresistancetoDDTinthemalariavectorAnophelesgambiae.Biochem. J. 359: 295–304.

Ranson, H., C. Claudianos, F. Ortelli, C. Abgrall, J. Hemingway

et al., 2002 Evolution of supergene families associated with in-secticide resistance. Science 298: 179–181.

Ranson, H., M. G. Paton, B. Jensen, L. McCarroll, A. Vaughan

et al., 2004 Genetic mapping of genes conferring permethrin re-sistance in the malaria vector, Anopheles gambiae. Insect Mol. Biol.13: 379–386.

Rozas, J., and R. Rozas, 1999 DnaSP version 3: an integrated pro-gram for molecular population genetics and molecular evolutionanalysis. Bioinformatics 15: 174–175.

Ruiz, A., and W. B. Heed, 1988 Host-plant specificity in the cacto-philic Drosophila mulleri species complex. J. Anim. Ecol. 57:237–249.

Rutherford, K., J. Parkhill, J. Crook, T. Horsnell, P. Rice et al.,2000 Artemis: sequence visualization and annotation. Bioinfor-matics 16: 944–945.

Sawicki, R., S. P. Singh, A. K. Mondal, H. Benes and P. Zimniak,2003 Cloning, expression and biochemical characterization ofone epsilon and ten delta class GSTs from D. melanogaster andidentification of nine extra of the epsilon class. Biochem. J.370: 661–669.

Schlenke, T. A., and D. J. Begun, 2004 Strong selective sweep asso-ciated with a transposon insertion in Drosophila simulans. Proc.Natl. Acad. Sci. USA 101: 1626–1631.

Smith, J. M., and J. Haigh, 1974 The hitch-hiking effect of a favour-able gene. Genet. Res. 23: 23–35.

Smith, N. G., and A. Eyre-Walker, 2002 Adaptive protein evolutionin Drosophila. Nature 415: 1022–1024.

Stajich, J. E., D. Block, K. Boulez, S. E. Brenner, S. A. Chervitz

et al., 2002 The Bioperl toolkit: Perl modules for the life scien-ces. Genome Res. 12: 1611–1618.

Suzuki, Y., and M. Nei, 2004 False-positive selection identified byML-based methods: examples from the Sig1 gene of the diatomThalassiosira weissflogii and the tax gene of a human T-cell lym-photropic virus. Mol. Biol. Evol. 21: 914–921.

Syvanen, M., Z. H. Zhou and J. Y. Wang, 1994 Glutathione transfer-ase gene family from the housefly Musca domestica. Mol. Gen.Genet. 245: 25–31.

Tajima, F., 1989 Statistical method for testing the neutral muta-tion hypothesis by DNA polymorphism. Genetics 123: 585–595.

Tang, A. H., and C. P. Tu, 1994 Biochemical characterization of Dro-sophila glutathione S-transferases D1 and D21. J. Biol. Chem.269: 27876–27884.

Thompson, M., J. C. Steichen and R. H. Ffrench-Constant,1993 Conservation of cyclodiene insecticide resistance-associatedmutations in insects. Insect Mol. Biol. 2: 149–154.

Toba, G., and T. Aigaki, 2000 Disruption of the microsomal gluta-thione S-transferase-like gene reduces life span of Drosophila mel-anogaster. Gene 253: 179–187.

Toung, Y. P., T. S. Hsieh and C. P. Tu, 1993 The glutathiones-transferase D genes. J. Biol. Chem. 268(13): 9737–9746.

Tu, C. P., and B. Akgul, 2005 Drosophila glutathione S-transferases.Methods Enzymol. 401: 204–226.

1374 W. Y. Low et al.

Page 13: Molecular Evolution of Glutathione S-Transferases in the ......ized perl scripts, which utilized some Bioperl modules (Stajich et al. 2002). The tblastn hits were merged if their coordinates

Vontas, J. G., G. J. Small and J. Hemingway, 2001 GlutathioneS-transferases as antioxidant defence agents confer pyre-throid resistance in Nilaparvata lugens. Biochem. J. 357:65–72.

Vontas, J. G., G. J. Small, D. C. Nikou, H. Ranson and J. Hemingway,2002 Purification, molecular cloning and heterologous expres-sion of a glutathione S-transferase involved in insecticide resistancefrom the rice brown planthopper, Nilaparvata lugens. Biochem. J.362: 329–337.

Welch, J. J., 2006 Estimating the genomewide rate of adaptive pro-tein evolution in Drosophila. Genetics 173: 821–837.

Wilce, M. C., S. C. Feil, P. G. Board and M. W. Parker,1994 Crystallization and preliminary X-ray diffraction studies

of a glutathione S-transferase from the Australian sheep blowfly,Lucilia cuprina. J. Mol. Biol. 236: 1407–1409.

Yang, Z., and R. Nielsen, 2000 Estimating synonymous and non-synonymous rates under realistic evolutionary models. Mol.Biol. Evol. 17(1): 32–43.

Yang, Z., and R. Nielsen, 2002 Codon-substitution models for de-tecting molecular adaptation at individual sites along specific lin-eages. Mol. Biol. Evol. 19: 908–917.

Zhou, Z. H., and M. Syvanen, 1997 A complex glutathione transfer-ase gene family in the housefly Musca domestica. Mol. Gen.Genet. 256: 187–194.

Communicating editor: D. Begun

Molecular Evolution of Drosophila GSTs 1375