massive habitat-specific genomic response in d. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf ·...

12
Article Massive Habitat-Specific Genomic Response in D. melanogaster Populations during Experimental Evolution in Hot and Cold Environments Ray Tobler, 1 Susanne U. Franssen, 1 Robert Kofler, 1 Pablo Orozco-terWengel, z,1 Viola Nolte, 1 Joachim Hermisson, 2,3 and Christian Schlo ¨tterer* ,1 1 Institut fu ¨r Populationsgenetik, Vetmeduni Vienna, Vienna, Austria 2 Mathematics and Biosciences Group, Department of Mathematics, University of Vienna, Vienna, Austria 3 Max F. Perutz Laboratories, Vienna, Austria z Present address: Cardiff University, Biomedical Science Building, Museum Avenue, Cardiff, Wales, United Kingdom *Corresponding author: E-mail: [email protected]. Associate editor: Rasmus Nielsen The BAM files for all populations analyzed in the study are available from the European Sequence Read Archive (accession no. PRJEB4689). Abstract Experimental evolution in combination with whole-genome sequencing (evolve and resequence [E&R]) is a promising approach to define the genotype–phenotype map and to understand adaptation in evolving populations. Many previous studies have identified a large number of putative selected sites (i.e., candidate loci), but it remains unclear to what extent these loci are genuine targets of selection or experimental noise. To address this question, we exposed the same founder population to two different selection regimes—a hot environment and a cold environment—and quantified the genomic response in each. We detected large numbers of putative selected loci in both environments, albeit with little overlap between the two sets of candidates, indicating that most resulted from habitat-specific selection. By quantifying changes across multiple independent biological replicates, we demonstrate that most of the candidate SNPs were false positives that were linked to selected sites over distances much larger than the typical linkage disequilibrium range of Drosophila melanogaster. We show that many of these mid- to long-range associations were attributable to large seg- regating inversions and confirm by computer simulations that such patterns could be readily replicated when strong selection acts on rare haplotypes. In light of our findings, we outline recommendations to improve the performance of future Drosophila E&R studies which include using species with negligible inversion loads, such as D. mauritiana and D. simulans, instead of D. melanogaster. Key words: experimental evolution, D. melanogaster, adaptation, standing genetic variation, next-generation sequencing, evolutionary genomics. Introduction Understanding the process of adaptation at the genomic level remains one of the fundamental questions for 21st century biology (Stapley et al. 2010). The arrival of next-generation sequencing (NGS) technology has ushered in a new era of adaptation research: by allowing essentially complete genome sequence information to be assayed at the population level, the prospect of identifying the majority of causal loci under- lying adaptation is now foreseeable. To realize this objective, a series of recent studies have leveraged the power of NGS in conjunction with experimental evolution (Bergelson and Roux 2010; Parts et al. 2011; Dettman et al. 2012; Orozco- terWengel et al. 2012). The experimental evolution frame- work allows the study of phenotypic and genetic changes in experimental populations under environmental and demo- graphic parameters of the experimenters’ choice (Kawecki et al. 2012). Thus, the combination of NGS and experimental evolution—a system dubbed evolve and resequence (E&R; Turner et al. 2011)—provides a flexible and powerful approach to elucidate population genomic responses to adaptation. Many recent E&R studies have utilized Drosophila melano- gaster to elucidate the genomic architecture of specific traits (Burke et al. 2010; Turner et al. 2011; Zhou et al. 2011; Orozco- terWengel et al. 2012; Remolina et al. 2012; Turner and Miller 2012; Turner et al. 2013). In addition to being a model organ- ism for genetic research, D. melanogaster also has the advan- tage of having a well-annotated reference genome and a large collection of mutant and RNAi lines. Furthermore, natural populations of D. melanogaster maintain abundant polymor- phism with relatively low levels of linkage disequilibrium (LD) (Mackay et al. 2012), which is expected to result in high- resolution genomic maps. A common feature of published D. melanogaster E&R studies is that a very large number of SNPs (at least hundreds) are identified as targets of selection (i.e., candidate SNPs). Theoretical arguments suggest that, when taking into account the large frequency changes that ß The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Open Access 364 Mol. Biol. Evol. 31(2):364–375 doi:10.1093/molbev/mst205 Advance Access publication October 22, 2013 Downloaded from https://academic.oup.com/mbe/article-abstract/31/2/364/997936 by Cardiff University user on 19 February 2019

Upload: others

Post on 08-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

Article

Massive Habitat-Specific Genomic Response in D melanogasterPopulations during Experimental Evolution in Hot andCold EnvironmentsRay Tobler1 Susanne U Franssen1 Robert Kofler1 Pablo Orozco-terWengelz1 Viola Nolte1

Joachim Hermisson23 and Christian Schlotterer1

1Institut fur Populationsgenetik Vetmeduni Vienna Vienna Austria2Mathematics and Biosciences Group Department of Mathematics University of Vienna Vienna Austria3Max F Perutz Laboratories Vienna AustriazPresent address Cardiff University Biomedical Science Building Museum Avenue Cardiff Wales United Kingdom

Corresponding author E-mail christianschloetterervetmeduniacat

Associate editor Rasmus Nielsen

The BAM files for all populations analyzed in the study are available from the European Sequence Read Archive (accession

no PRJEB4689)

Abstract

Experimental evolution in combination with whole-genome sequencing (evolve and resequence [EampR]) is a promisingapproach to define the genotypendashphenotype map and to understand adaptation in evolving populations Many previousstudies have identified a large number of putative selected sites (ie candidate loci) but it remains unclear to whatextent these loci are genuine targets of selection or experimental noise To address this question we exposed the samefounder population to two different selection regimesmdasha hot environment and a cold environmentmdashand quantified thegenomic response in each We detected large numbers of putative selected loci in both environments albeit with littleoverlap between the two sets of candidates indicating that most resulted from habitat-specific selection By quantifyingchanges across multiple independent biological replicates we demonstrate that most of the candidate SNPs were falsepositives that were linked to selected sites over distances much larger than the typical linkage disequilibrium range ofDrosophila melanogaster We show that many of these mid- to long-range associations were attributable to large seg-regating inversions and confirm by computer simulations that such patterns could be readily replicated when strongselection acts on rare haplotypes In light of our findings we outline recommendations to improve the performance offuture Drosophila EampR studies which include using species with negligible inversion loads such as D mauritiana andD simulans instead of D melanogaster

Key words experimental evolution D melanogaster adaptation standing genetic variation next-generation sequencingevolutionary genomics

IntroductionUnderstanding the process of adaptation at the genomic levelremains one of the fundamental questions for 21st centurybiology (Stapley et al 2010) The arrival of next-generationsequencing (NGS) technology has ushered in a new era ofadaptation research by allowing essentially complete genomesequence information to be assayed at the population levelthe prospect of identifying the majority of causal loci under-lying adaptation is now foreseeable To realize this objective aseries of recent studies have leveraged the power of NGS inconjunction with experimental evolution (Bergelson andRoux 2010 Parts et al 2011 Dettman et al 2012 Orozco-terWengel et al 2012) The experimental evolution frame-work allows the study of phenotypic and genetic changesin experimental populations under environmental and demo-graphic parameters of the experimentersrsquo choice (Kaweckiet al 2012) Thus the combination of NGS and experimentalevolutionmdasha system dubbed evolve and resequence (EampR

Turner et al 2011)mdashprovides a flexible and powerfulapproach to elucidate population genomic responses toadaptation

Many recent EampR studies have utilized Drosophila melano-gaster to elucidate the genomic architecture of specific traits(Burke et al 2010 Turner et al 2011 Zhou et al 2011 Orozco-terWengel et al 2012 Remolina et al 2012 Turner and Miller2012 Turner et al 2013) In addition to being a model organ-ism for genetic research D melanogaster also has the advan-tage of having a well-annotated reference genome and a largecollection of mutant and RNAi lines Furthermore naturalpopulations of D melanogaster maintain abundant polymor-phism with relatively low levels of linkage disequilibrium (LD)(Mackay et al 2012) which is expected to result in high-resolution genomic maps A common feature of publishedD melanogaster EampR studies is that a very large number ofSNPs (at least hundreds) are identified as targets of selection(ie candidate SNPs) Theoretical arguments suggest thatwhen taking into account the large frequency changes that

The Author 2013 Published by Oxford University Press on behalf of the Society for Molecular Biology and EvolutionThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby30) which permits unrestricted reuse distribution and reproduction in any medium provided the original work isproperly cited Open Access364 Mol Biol Evol 31(2)364ndash375 doi101093molbevmst205 Advance Access publication October 22 2013

Dow

nloaded from httpsacadem

icoupcomm

bearticle-abstract312364997936 by Cardiff U

niversity user on 19 February 2019

are reported for candidate SNPs it is unlikely that all of thesesites could be independent targets of selection (Haldane 1957Nunney 2003) Rather it appears that current methods areconfounded by large numbers of false positives The reasonswhy however remain elusive

To shed light on why such large numbers of candidateSNPs are typically found in D melanogaster EampR studies wesubjected multiple replicates of the same starting populationto two different selection regimes a hot and a cold environ-ment In accordance with previous studies we detected thou-sands of putative selected sites for both environmentsalthough there was relatively little overlap between the twosets of candidates By leveraging information from additionalindependent replicates and the starting allele frequencies ofcandidate SNPs we demonstrate that our populations carrysignatures of environmentally dependent thermal selectionMoreover we demonstrate that most of the candidate SNPswere false positives that were linked to selected sites overdistances much larger than those typically observed forD melanogaster with many being coincident with largeinversions that underwent sizeable frequency shifts duringthe experiment Such long-range associations were readilyreproduced in simulations that replicated common featuresof published Drosophila EampR studies indicating that this phe-nomenon may be a general cause of false positives in thesestudies Based on our results we make several suggestionsintended to improve EampR performance which includesubstituting D melanogaster for sibling species with fewreported inversions such as D simulans or D mauritiana

ResultsTen replicated D melanogaster populations were establishedfrom 113 recently collected isofemale lines and maintained ata census size of 1000 individuals with nonoverlapping gener-ations These were evenly split between two different tem-perature regimesmdashthe ldquohotrdquo and ldquocoldrdquo treatmentsmdashthatcycled between 18ndash28 C and 10ndash20 C respectivelyGenomic DNA was sequenced from pools of ~500 adultfemales (Pool-Seq) for three different replicates in eachevolved treatment (~generation 15) and the starting popu-lation (ie the ldquobaserdquo population) After quality filtering ~145million SNPs were identified in our populations a numberthat is consistent with previous reports for D melanogaster(see Materials and Methods for more details on the experi-mental regime sequencing protocols and SNP calling)

Candidate SNPs

We identified putative selected sites in each treatmentmdashthatis candidate SNPsmdashby comparing the P values from theCochranndashMantelndashHaenszel (CMH) test (Agresti 2002) foreach observed SNP against those for sites simulated underforward WrightndashFisher evolution (see Materials andMethods) Using an empirical false discovery rate (FDR) of0001 and a simulated Ne of 250 we identified ~17000 can-didate SNPs in the hot treatment and ~3500 candidate SNPsin the cold treatment (hereafter referred to as hot candidateSNPs and cold candidate SNPs respectively supplementary

table S1 Supplementary Material online) Because our simu-lations do not account for linkage between sites and thereforemay result in biased estimates we performed additional sim-ulations of neutral evolution among segregating haplotypesBy allowing for recombination between the different haplo-types these simulations explicitly model correlated allelefrequency dynamics caused by linkage between sites (seeMaterials and Methods) The haplotype-based simulationsdid not reduce the number of inferred candidates nor didremoval of regions with low recombination rates (lt2 cMMb) produce notable differences (supplementary table S1Supplementary Material online) This suggests that ourassumption of independence among SNPs did not upwardlybias the inferred number of candidates Thus despite using aconservative significance threshold and being robust to falsepositives caused by drift in low recombining regions thedetection of such a large number of candidate SNPs suggeststhat our estimates for both treatments were probably inflatedby a large number of false positives

Different Selection Signatures in the TwoThermal Regimes

Reasoning that differences in the efficacy of selection wouldbe discernable through differences in effective population size(Ne) we estimated the Ne for each chromosome arm in bothtreatments (see Material and Methods) Indeed Ne wasconsistently larger for the cold treatment relative to the hottreatment (fig 1) This suggests that the effects of selectionmdasheither acting directly on each SNP or via linkage betweenselected and neutral sitesmdashwas more pervasive in thehot treatment overall Moreover the estimated Ne of theX-chromosome was higher than that for most autosomesin both treatments Because the Ne of the X-chromosome isexpected to be 25 smaller than that of the autosomes inneutrally evolving populations our results suggest that selec-tion was more influential on the autosomes in both treat-ments Although we cannot rule out the possibility that theintertreatment difference in Ne was caused by environmen-tally induced nongenetic fitness variation it is unlikely thatthis mechanism was responsible for the Ne variation that wasobserved across chromosome arms within each treatmentThus our results suggest that selection affected a largenumber of SNPs during the experiment potentially in ahabitat-specific manner

To evaluate the hypothesis that different genes were tar-geted in the two selection regimes we tested whether outlierSNPs were enriched among a carefully curated data set ofgenes associated to either heat or cold stress (see Materialsand Methods) To avoid biasing results by making an a prioriassumption about the best number of candidate SNPs in eachtreatment enrichment was tested over successively larger setsof candidates ranging from the top 2000 to 100000 ldquocandi-daterdquo SNPs The results show that hot candidate SNPs werehighly enriched in heat tolerance genes and that cold candi-date SNPs were moderately enriched within cold tolerancegenes (fig 2) Notably there was an order of magnitude moregenes associated with heat tolerance than cold tolerance in

365

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

our curated data sets (~320 vs ~20 supplementary table S2Supplementary Material online) such that the latter mayhave suffered from reduced power caused by having a rela-tively poorly characterized pathway In contrast there was no

significant enrichment of hot candidate SNPs in the coldtolerance genes or of cold candidate SNPs in heat tolerancegenes Thus although these results confirmed treatment-specific selection we still observed associations for sets ofcandidates that were even larger than those estimatedusing our conservative FDR cutoff This result reaffirms thatwe have a large number of false positives that need to beaccounted for

Natural D melanogaster populations have been describedto vary seasonally This pattern was first reported for mor-phological traits (Tantawy 1964) and was more recentlyextended to inversions (Stalker 1980) and SNPs (Berglandet al 2013) that exhibited cyclic frequency changes acrossseasons Molecular evidence suggests that alleles that hadbeen selected over the winter remained at relatively highfrequencies in Drosophila populations collected in thespring before decreasing again by the fall (Bergland et al2013) Because the founder population of our experimentswas collected in the early summer it probably more closelyresembles a winterspring population than a fall population(Bergland et al 2013) Therefore we reasoned that allelesinvolved in adaptation to cold or hot environments maystart at high or low frequencies respectively whereas falsepositives are expected to follow the background frequencydistribution of neutral SNPs When conditioning the allelefrequency on the rising allelemdashthe allele expected to becoming under selectionmdashwe observed an excess of low-frequency alleles in the hot treatment and conversely anenrichment of intermediate frequency alleles in the cold

23

45

X 2L 2R

100 200 300 400 500

23

45

3L

100 200 300 400 500

3R

100 200 300 400 500

Genomeminuswide

HotCold

minuslo

g 10( sum

(xob

sminus

x exp

)2 )

Effective population size

FIG 1 Estimates of effective population size in the evolving populations The sum of the mean squared differences (negative log10 transformed)between the allele frequency change distributions for the experimental and simulated data for a range of Ne values Larger values on the y axis indicate abetter fit between the observed and simulated values and the diamond indicates the single best fit overall that is the best estimate of Ne for a givenchromosome arm or whole genome Error bars are the standard errors for the three replicates

01

23

45

104 105 104 105

HeatCold5 sig

Number of candidates

minuslo

g 10(p

)

FIG 2 Enrichment of candidate SNPs within genes associated withheat and cold tolerance in Drosophila Successively larger sets of can-didate SNPs from the hot (left panel) or cold (right panel) treatmentwere tested for enrichment in gene sets either associated with cold orheat tolerance All points above the dashed line indicate that theprobability of obtaining the same number of thermal tolerancegenes with at least one candidate SNP by chance was less than 5All genes were taken from the CESAR database and categorizedaccording to whether they were uniquely associated with eitherheat or cold tolerance The enrichment tests indicate that selectionoperated in the expected direction although the large number ofcandidate SNPs for which a significant enrichment was found impliesthat these were confounded by false positives

366

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

treatment (fig 3) Given that the high-frequency alleles expe-rience very little allele frequency change before fixation thelack of predicted high-frequency alleles among the cold can-didates is likely to be a direct consequence of insufficientpower to detect such SNPs Notably this pattern was notrestricted to the top 2000 candidate SNPs but extends toa much larger fraction of loci (supplementary fig S1Supplementary Material online) This further reinforces theidea that a large number of SNPs are either directly selected orbehave similar to selected loci in each treatment

Reproducibility of Ranked SNPs

A testable consequence of directional selection is that tar-geted alleles are expected to increase in frequency across allreplicates whereas false positives should not replicate Henceto further distinguish between true positives and noiseamong our top candidates we sequenced two additionalF15 replicates in each treatment and quantified the level ofreproducibility between these and the original replicatesReproducibility was measured using a modified receiveroperating characteristic (ROC) curve which visualizes theexcess of shared SNPs of similar rank (rank being assignedby P value) between our original and new data sets (seeMaterials and Methods) For both temperature regimes wenoted a substantial concordance between the independentreplicates as indicated by the ROC curve being clearly abovethe diagonal (fig 4) In contrast neutrally evolving sites thatwere simulated to have parameters similar to our treatmentdata and which included recombination did not differ from

expectation (supplementary fig S2 Supplementary Materialonline see Materials and Methods) Nonetheless because theconcordance between the independent sets of replicatespeaked among the top 50000ndash100000 ranked SNPs in eachtreatment this analysis could not further distinguish trulyselected SNPs from false positives

The high concordance that we observed could haveresulted from strong LD between selected and linked neutralSNPs which is expected to produce correlated allele fre-quency changes across contiguous sites We tested thishypothesis by reanalyzing the data after removing low recom-bining regions and genomic regions known to contain segre-gating cosmopolitan inversions (see Materials and Methods)The resulting ROC curve exhibited a substantially reducedsignal of concordance between the two independent sets ofreplicates but was still greater than expected by chance(fig 4) Next we repeated the analysis using only short intro-nic sites (without splicing signals) within the putative low LDregions Evidence suggests that short introns are unlikely tocarry selectively favored alleles in D melanogaster (Halliganand Keightley 2006 Parsch et al 2010 Clemente and Vogl2012) whereby these regions are good proxies for neutral loci(but see Farlow et al 2012) Although SNPs in short intronswere not expected to show much consistency across thegroups the signal of concordance was similar to that of theputative low LD regions (fig 4) Consequently this patternindicates that much of the selection signal could be generatedfrom linkage between neutral sites and the true targets ofselection

00

020

050

080

11

00

010

030

05

00

030

060

090

12

00 02 04 06 08 10

00

010

020

03

00 02 04 06 08 10

00

010

020

030

04

00 02 04 06 08 10

00

030

070

10

13Fre

quen

cy

Start End Increase

FIG 3 Allele frequency distribution of candidate SNPs Distribution of the starting (left panels) end (middle panels) and increase (right panels) in allelefrequencies for the top 2000 candidates SNPs in the hot (top row) and cold (bottom row) treatment Alleles are polarized by the rising allele inthe derived treatment such that plots display the frequencies of the putative selected alleles The dashed gray line indicates the distribution of all~145 million SNPs and the diamonds on the x axis indicate the mean frequency for candidate (redblue) and all (gray) SNPs

367

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Effects of Long-Range LD on False Positives

In genome-wide association studies (GWAS) linkage betweenselected and neighboring neutral sites results in a character-istic pattern observed in Manhattan plots namely a series ofadjacent sites with significant P values that are arranged into apeak Such patterns were not readily recognizable in eithertreatment in our study however Rather the top candidatesoccurred as either isolated SNPs or broad flattened humps(fig 5) Furthermore analyses of genomic regions flankingcandidate SNPs showed that LD had a limited rangebeyond the putative selected site (fig 6) This pattern is con-sistent with the low levels of LD observed in D melanogasterpopulations (Miyashita and Langley 1988 Andolfatto andWall 2003 Mackay et al 2012) and has been previously inter-preted as evidence against linkage as a possible cause for thelarge number of candidate SNPs in Drosophila EampR studies(Orozco-terWengel et al 2012) To reconcile the anomaly ofthousands of putative selected SNPs being detected despitean apparent lack of LD in our study we propose that associ-ations beyond 200 bpmdashwhich is widely regarded the limit ofLD in D melanogaster (Mackay et al 2012)mdashmay be preva-lent within our experimental populations

Support for this idea comes from two separate findingsFirst the excess concordance observed for short introns out-lined above is still evident after exclusion of intronic siteslocated in the same genes that contained one or more can-didate SNPs Thus although excluding these intronic sites didreduce concordance to some extent the excess concordancewas still visible even after removing genes containing one ormore of the top 100000 candidate SNPs (supplementaryfig S3 Supplementary Material online) Given that short in-trons are not expected to contain selected sites (Parsch et al2010 Clemente and Vogl 2012) we suggest that linkagebetween sites in short introns and selected sites in regionsother than the enclosing gene are responsible for producingthis pattern

Second a good albeit potentially extreme example oflong-range LD is provided by a 1 Mb region located on aninversion-free part of chromosome 3R that was first describedby Orozco-terWengel et al (2012) Among our genome-widetop 2000 hot candidates 281 were located in this regionwhich represents a massive enrichment relative to its sizeThese sites increased in frequency by ~27 over the first 15generations on average and 33 of the 24K SNPs in thisregion had the same rising allele across all five replicates inthe hot treatment (cf 23 across the rest of the genomeP value lt22e16 using Fisherrsquos exact test) Furthermore alarge number of putative selected alleles in this region hadstarting frequencies lt1 (118 SNPs) and these sites dis-played highly homogeneous increases in frequency duringthe study (supplementary fig S4 Supplementary Materialonline) Thus although it is evident from the data that thisis a low-frequency haplotype that increased across all repli-cates in the hot treatment 83 of the putative candidates inthis region were separated by distances greater than 200 bpwith an average distance of 5570 bp between sites Becausesuch a broad spacing between candidate sites is well beyondthe typical range of LD in D melanogaster we suggest thatlong-range LD is responsible for these patterns and themajority of false positives more generally

To evaluate to what extent long-range LD could beobserved in a typical EampR study we simulated strong selectionon a single allele private to a rare haplotype using allele fre-quency changes and parameters similar to those in the hottreatment (see Materials and Methods) We found thatneighboring candidate SNPs were often separated by severalthousand base pairs or more and could be millions of basepairs from the selected site (fig 7) These results clearly indi-cate that long-range LD can be readily generated under asimple but plausible scenario for EampR studies

DiscussionExposure of D melanogaster to novel hot or cold environ-ments resulted in selection of alleles in thermal tolerancegenes specific to each environment Moreover the startingfrequencies of the putative selected SNPs differed mark-edly between the two environments suggesting that theyhad experienced selection within the natural progenitorpopulation prior to the start of the experiment Henceour study reveals that a potentially rich source of

00 02 04 06 08 10

00

02

04

06

08

10

Proportion SNPs

Pro

port

ion

com

mon

can

dida

tes

Treatment amp region

Hot AllHot low LDHot SICold AllCold low LDCold SI

FIG 4 Concordance between two independent groups of replicatesModified ROC curve showing the proportion of common candidateSNPs shared between two independent sets of derived populationsrelative to cumulatively larger sets of ranked SNPs (see Materials andMethods) for both hot (red lines) and cold (blue lines) treatmentsSeparate lines are plotted for full data set (~145 million SNPs All) aswell as for those SNPs only falling in regions of putatively low LD (ieregions outside of inversions and low recombining regions) or for shortintrons in these regions (SI) Lines above the black diagonal line indicatethat there are more common candidate SNPs than expected by chancesuch that our top candidates are more reproducible than those neu-trally evolving populations Because sites with low LD or in short intronswith low LD show a similar reduction in concordance relative to all sitesmuch of the excess concordance is probably due to changes in inver-sions and long-range LD between selected and neutral sites

368

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

FIG 5 Manhattan plots showing the genomic distribution of candidate SNPs The negative log10-transformed P values of all SNPs relative to genomicposition The top 2000 candidate SNPs for the hot (top panel) and cold (bottom panel) treatment are highlighted (see key) Candidates are categorizedas being unique to the hot (red) or cold (blue) treatments or common to both (green) Point size scales with significance except for common SNPswhich are shown at a standard size for emphasis Inset boxes indicate the position of inversions and a putative selected haplotype block on 3R (largedashes most proximal to centromere on 3R) The inversions are as follows In(2L)t In(2R)NS In(3L)P In(3R)Payne (short dashes) In(3R)C (dots) andIn(3R)Mo (solid line) The candidate SNPs typically appear as either isolated points or arranged in broad flattened humps and tend to be clustered on3R where three large inversions overlap These patterns further imply that inversions and long-range LD have generated many false positives in thestudy

02

46

810

minus4000 minus2000 0 2000 4000

02

46

810

Distance from candidate (bp)

minuslo

g 10(p

)

HotColdRandom200bp

FIG 6 Correlation between the top 2000 candidate and flanking SNPs The decay in LD (correlation) around candidates is measured as the rate ofdecline in neighboring (negative log10 transformed) P values to background levels Data are averaged over successive 50-bp bins flanking focal SNPs Solidblack lines indicate the bin means The red blue and green polygons around these lines indicate 95 confidence intervals for hot treatment coldtreatment and random (ie background) SNPs respectively In accordance to reported results from D melanogaster most of the LD is lost within200 bp either side of each focal SNP (indicated by dashed lines)

369

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

information lies in using the base population informationwhich has not been considered in most published EampR studiesto date

Despite strong evidence that differential thermal selectioncontributed to our top candidates in each treatment theshear number of putatively selected sites indicates that ourresults were confounded by a large number of false positivesGiven that short-range LD and strong drift in low recombin-ing regions were unlikely to be major contributors to theexcess of false positives in our study we hypothesized thatLD over longer distances was responsible for much of thisexcess Long-range LD in our experimental populations couldhave arisen as a byproduct of the founders being a smallsample of the much larger natural population In this casechance correlations between sites separated by vast distancesmay result from the finite sampling of alleles Long-range LDmay also originate within natural populations For instancestrong LD between loci separated by large distances has beenreported for Japanese D melanogaster populations (Takano-Shimizu et al 2004 Itoh et al 2010) Because the LD was muchstronger in the spring than in the autumn this finding wasattributed to exaggerated sampling effects during the previ-ous winter bottleneck although there was evidence thatrecurrent epistatic selection was also responsible for somelinked loci Finally the introgression of new haplotypesfrom migrant individuals is another natural process thatcould produce long-range LD Recent and ongoing admixture

has been reported for African (Pool and Aquadro 2006 Poolet al 2012) and North American (Caracristi and Schlotterer2003) D melanogaster populations suggesting that in prin-ciple this may also contribute to long-range LD in naturalpopulations

The systematic increase in frequency of alleles in the hottreatment that were broadly distributed across a 1 Mb regionof 3R demonstrates the effects of long-range LD in our exper-iment In combination with differences in the distribution ofthe starting allele frequencies long-range LD may also help toexplain the differences in the number of candidates and thelevel of concordance observed between the two treatmentsGiven that most candidates were rare in the hot treatmentand common in the cold treatmentmdashpresumably resultingfrom prior selection against (for) heat (cold) toleranceallelesmdashlong-range LD is expected to be more extensiveamong the hot candidates This is because rare haplotypesare expected to harbor many more private SNPs thancommon ones and all such sites will experience the samechange in allele frequency until these associations are brokendown by recombination The difference in Ne between thetwo treatments is also consistent with more extensive long--range LD producing more widespread changes in allele fre-quencies in the hot treatment

Segregating inversions are another probable source of falsepositives in our study The large decrease in concordance thatwas observed when inversions and regions with low

FIG 7 Simulated long-range LD generated by a singleton under strong selection Manhattan plots showing the distribution of the top 1000 candidateSNPs from three separate sets of simulated data (top to bottom panels) in which a singleton carried by a single chromosome-length haplotype is understrong selection (see Materials and Methods) The full chromosome is shown in the panels on the right with a zoomed-in view shown in the panels onthe left Significant SNPs occur vast distances from the selected site (dashed line) and tend to be several thousands of base pairs apart Thus it appearslikely that long-range LD between selected and neutral sites can readily inflate the number of false positives in an EampR study

370

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

recombination were removed from the data set supports thisidea (fig 4) Additional confirmation comes from a recentstudy which found that several large inversions were segre-gating in our populations over the course of the experimentincluding three partially overlapping inversions on chromo-some arm 3R (Kapun et al 2013) By tracing the trajectory ofalleles that were private to each inversion a single inversionwas found to increase across all three replicates in each treat-ment (In(3R)c by 25 in the hot and In(3R)mo by 15 in thecold) Indeed the majority of the top 2000 candidate SNPswas observed on chromosome arm 3R for both treatments(~80 for the hot and ~40 for the cold supplementarytable S3 Supplementary Material online) with many candi-date SNPs lying within these inversions Furthermore thegreatest concentration of cold candidates was situatedwithin the region where all three inversions overlapBecause recombination is likely to be suppressed in thisregion the local effects of LD and drift are expected to beexacerbated relative to the rest of the genome Thus changesin inversion frequencies are also likely to have substantiallyinflated the number of false positives in our study

Implications for Future EampR Research

Our results indicate that EampR studies with small founder pop-ulations strong selection and large segregating inversions arelikely to suffer from large numbers of false positives Thesefactors are likely to be present in other D melanogaster EampRstudiesmdashindeed several large inversions segregate at interme-diate frequencies in natural D melanogaster populations(Corbett-Detig and Hartl 2012 Kapun et al 2013)mdashmeaningthat our findings have important implications for future EampRresearch in Drosophila

First if seasonal selection is common in D melanogasterwhich accumulating evidence suggests may be the case(Stalker 1980 Takano-Shimizu et al 2004 Itoh et al 2010Bergland et al 2013) then the genetic composition of theexperimental populations will be dependent on their time ofcollection Evidence suggests that genes associated with stressresistance are highly pleiotropic in Drosophila (Rose et al1992 Hoffman et al 2001 Bubliy and Loeschcke 2005 Kolssand Kawecki 2008) implying that a broad spectrum of traitsmay be affected in this way

Second large numbers of false positives are expected inEampR studies that use small founder populations as a directresult of long-range LD between selected and neutral sitesThe effect of founder population size on the performance ofexperimental design has been explicitly addressed in a recenthaplotype-based simulation study (Kofler and Schlotterer2013) The authors demonstrate that both the power andspecificity of EampR studies is increased when larger numbersof founders are used with the gains in performance beingparticularly noticeable when the strength of selection per SNPis strong Another plausible way to reduce the impact of long-range LD is to create independent replicates from differentsets of founders Although this does not remove long-rangeLD per se it results in different associations across replicates

such that analyses that combine replicates should remove themajority of such effects

Finally our results indicate that inversions are likely to be amajor hindrance in detecting causal loci Large segregatinginversions appear to be ubiquitous in natural D melanogasterpopulations and recent work suggests that newly introducedinversions may influence genetic diversity across entire chro-mosome arms presumably by suppressing recombinationwell beyond inversion breakpoints (Corbett-Detig and Hartl2012 Pool et al 2012 Kapun et al 2013) Consequently anyimprovements in performance that might be achievedthrough the experimental design modifications suggestedabove are unlikely to be realized in chromosome arms carry-ing large inversions whereby inversion-free populations maybe necessary to ensure that such improvements are realizedover the entire genome Although inversion-free D melano-gaster populations can be obtained via karyotyping individ-uals prior to the study the time and effort expended in suchscreens may become prohibitive if large founding populationsare desired Another possibility is to use an alternativeDrosophila species for which segregating inversions are raresuch as D simulans or D mauritiana instead of D melano-gaster Although these species currently lack the detailedgenetic and genomic resources available for D melanogasterthe ongoing development of portable molecular tools hasmade functional validation of candidate loci in these speciestenable (Carroll 2011 Liu et al 2012 Yu et al 2013)

ConclusionPrevious D melanogaster EampR studies have not directly ad-dressed the role that inversions or long-range LD may havehad in generating the large numbers of reported candidateSNPs Our results imply that such estimates were probablyinflated by false positives resulting from these phenomenaConsequently Drosophila EampR study designs will need to bemodified before the true causal loci that contribute to adap-tation andor complex traits can be reliably identified Whilewe have outlined several possibilities here we also point outthat the study by Kofler and Schlotterer (2013) is intended asa general guideline for EampR study design There the authorsuse haplotype-based simulations to investigate EampR perfor-mance for wide range of experimental and evolutionary var-iables and thereby provide an invaluable reference for futureDrosophila EampR research

Materials and Methods

Experimental Populations and Evolution Regimes

Full details on the experimental setup are outlined in Orozco-terWengel et al (2012) Briefly wild D melanogaster werecollected in Northern Portugal in the summer of 2008 Thepopulation was maintained as 113 isofemale lines for fivegenerations in the laboratory prior to the creation of mass-bred populations Ten replicated mass-bred populations werecreated with each replicate combining five females from eachisofemale line This initial population of merged isofemalelines is referred to as the base population or F0 Five replicatepopulations were maintained in each of two novel

371

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

environments one hot and one cold treatment Each treat-ment cycled between two fixed temperature and light set-tings every 12 hmdashbetween 10 C and 20 C in the coldtreatment and 18 C and 28 C in the hot treatmentmdashwithlight and dark periods occurring with the hot and cold phaseof each treatment respectively The heatingcooling periodbetween the two fixed temperatures was relatively rapidtaking ~1 h before the new temperature is reached suchthat most time is spent at either of the modal temperaturesEach replicate was maintained at ~1000 individuals with~5050 sex ratios that were evenly distributed across five~225 ml bottles Each bottle contained ~70 ml of standardDrosophila medium Generations were nonoverlapping

Genome Sequencing and Mapping

Illumina full genome sequencing was performed on threereplicates for each of the base population and evolvedpopulations For the hot treatment a single replicate of the23rd generation and two from the 15th generation weresequenced (Orozco-terWengel et al 2012) whereas for thecold treatment all three sequenced replicates came from the15th generation Sequencing was performed using paired-endreads 75 bp in length in the hot treatment (GAIIx) and 100 bplong in the cold treatment (HiSeq) For each replicate andtime point ~500 flies were pooled and sequenced as onesample (ie Pool-Seq Futschik and Schlotterer 2010) Twoadditional replicates were also sequenced from the 15th gen-eration of the hot and cold treatments These were used tocheck the reproducibility of candidate detection as describedbelow Average genome-wide coverage ranged from ~35- to~90-fold for each of the sequenced replicates (supplementarytable S4 Supplementary Material online)

Mapping was carried out using a combination of purpose-built and third-party software which is described in detail inKofler Orozco-terWengel et al (2011) Briefly reads weretrimmed to remove low-quality bases and then mappedwith bwa (version 058c) (Li and Durbin 2009) against theD melanogaster reference genome (version 518) andWolbachia (NC_0029786) The alignment files were con-verted to the SAM format (Li et al 2009) with reads notmapped in proper pairs or with a mapping quality of lessthan 20 being removed SAM files were converted into thepileup format and indels and repeat sequences identified byRepeatMasker 329 (wwwrepeatmaskerorg last accessedNovember 5 2013) masked The aligned pileup files wereconverted to a synchronized file (Kofler Pandey et al 2011)from which SNPs and their accompanying allele frequencieswere derived All allele frequency changes presented hereinwere averaged across all three replicates (weighted by cover-age) We further removed all SNPs whose coverage was eitheramong the top 2 of a given replicate and time point or had aminimum count less than 30 across the 23 populations in ourextended data set that comprised additional replicates andtime points that were not analyzed here Because the averagecoverage per SNP across the extended data set was ~1500the minimum count of 30 equates to an average 2 minorallele frequency across all replicates and time points (although

allele frequencies could be lower in the time points that werespecifically analyzed in this study) Annotation was based onversion 540 of the D melanogaster reference genomeApproximately 145 million SNPs were called overall

Effective Population Size FDR andCandidate Estimation

In order to estimate Ne and FDR in both treatments forwardWrightndashFisher simulations were performed to generate ex-pected neutral changes during the experiment Three sets ofneutral allele frequency changes were simulated for eachtreatment matching the number of replicates in the studyFor each simulation 145 million SNPs were independentlysampled (ie approximating the number of SNPs in our realdata) from the distribution of allele frequencies expected after15 generations of neutral evolution conditional on the start-ing frequency for each SNP Starting frequencies for each SNPwere randomly drawn from the empirical distribution of startfrequencies in the experiment with the frequencies being thecoverage-weighted average across all replicates The expecteddistribution of allele frequencies after 15 generations was cre-ated by generating an NiNj transition matrix where eachcell is the binomial probability of seeing j alleles with a startingfrequency iN (ie N is the effective population size) and thenraising this matrix to the 15th power To simulate the effectsof DNA pooling in our study an additional round of binomialsampling was conducted on the start and end allele frequen-cies for each SNP To do this each SNP was assigned a randomcoverage value that was drawn from the empirical distribu-tion of coverages from a specific generation and treatmentThis was done separately for each replicate thereby capturingthe heterogeneity in coverage both within and betweenreplicates in the study

The effective population size of the observed data was es-timated as follows First multiple simulations were performedas outlined above each with a differing Ne Each simulationproduced a set of expected start and end allele frequencies fora given Ne from which an allele frequency change (AFC) dis-tribution was obtained (ie the end frequency minus startfrequency for all 145 million SNPs) The expected AFC distri-bution was then compared for goodness of fit with the ob-served AFC distribution by summing the squared difference ofthe AFC values evaluated in 1 frequency change binsBecause individual replicates were simulated we averagedthe goodness of fit across the three replicates to get a pointestimate for each Ne Additionally the error around this meanwas sufficiently large for both treatments that the best-fittingNe was not significantly different from many neighboring Ne

values Consequently we took the lowest Ne from amongstthis range (~250 supplementary table S1 SupplementaryMaterial online) as it is expected to provide the most conser-vative estimate of the FDR (because the strength of drift in-creases as Ne decreases) The Ne for individual chromosomeswas also inferred by limiting the observed data to that specificto each chromosome and then repeating the steps above

Following Ne estimation the number of putative selectedSNPs was inferred as follows First all ~145 million SNPs that

372

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

remained after filtering were subjected to the CMH test(Agresti 2002) The CMH test was shown to outperform sev-eral other statistics that were used for candidate inference inrecent Drosophila EampR studies (Kofler and Schlotterer 2013)Testing was conducted separately for the hot and cold treat-ments with each being contrasted against the base popula-tion Second the CMH test was applied to simulated SNPdata that was derived using the conservative Ne estimate of250 (discussed earlier) Finally the number of candidate SNPswas estimated by setting an empirical FDR of 0001 for eachtreatment Thus the number of candidates for a given treat-ment was quantified as the number of observed SNPs thathad CMH P values that were lower than the least significantCMH P value from among the 01 most extreme simulatedSNPs The number of candidate SNPs was also inferred for arange of different FDR values and the results are shown insupplementary table S1 Supplementary Material online

To examine the effects that linkage between loci had onthe candidate inference we also performed 30 forward neu-tral simulations starting from a population with 250 uniquehaplotypes (ie 250 homozygous diploid individuals) for 15generations Thus the number of founding haplotypes andthe effective population size during the simulations was 250Assuming that the isofemale founders maintained two segre-gating chromosomes per line on averagemdashwhich is notunreasonable considering that they were five generationsold when making the mass-bred populationsmdashthen thesenumbers roughly accord with those expected within ourexperimental populations The simulations were performedusing MimicrEE (Kofler and Schlotterer 2013) This softwareenabled the modeling of recombination at rates that werederived from empirical estimates (Fiston-Lavier et al 2010)Thirty simulations were performed and divided into 10 equal-sized subsets The three simulations from each of the 10subsets were then downsampled to match the coveragedistributions of the three empirical replicates from the rele-vant generation and treatment Next we applied the CMHtest to all sites in each subset and estimated the number ofcandidates thereof via the FDR method outlined in the pre-ceding paragraph The haplotypes used in the simulationswere created by neutral coalescent simulations and aredescribed in Bastide et al (2013)

Enrichment in Thermal Tolerance Genes

We tested for treatment-specific evolution by looking forcandidate enrichment in genes associated with heat andcold tolerance Thermal tolerance gene sets were takenfrom the Candidate Stress Genes in Drosophila databasewhich was obtained from the Centre for EnvironmentalStress and Adaptation Research website (httpcesarorgaulast accessed November 5 2013) Only genes with a signifi-cant and unique association with either heat or cold tol-erance were included in the analyses (supplementary tableS2 Supplementary Material online) Separate enrichmentanalyses were performed on the 2000 4000 6000 800010000 12000 14000 16000 18000 20000 25000 3000035000 40000 45000 50000 60000 70000 80000 90000

and 100000 most significant SNPs in each treatment Notethat each successive set of SNPs was an aggregation of pre-vious SNP sets and the subsequent set of most significantSNPs For instance the set that contained the top 4000 SNPscomprised the set containing the top 2000 SNPs along withthe next 2000 most significant SNPs All enrichment analyseswere performed using Gowinda software (Kofler andSchlotterer 2012) Gowinda generates P values by conductingmultiple simulations in which the position of candidate SNPsare randomly chosen from the ~145 million of SNPs scat-tered throughout the genome The P value was then quan-tified as the proportion of simulations that had more genesspecific to a given category (in our study genes associatedeither with hot or cold thermotolerance) containing at leastone candidate SNP than the observed data set (Kofler andSchlotterer 2012) In total we performed 100000 permuta-tions for each set of candidate SNPs in each treatment andfor each set of thermal tolerance genes In each case a genecontaining multiple candidate SNPs was only counted once(ie the gene option of the mode argument) Gene annota-tions based on version 540 of D melanogaster referencegenome and only the full coding region of each gene andnot the upstream or downstream flanking regions (ie thegene option of the gene-definition argument) wereconsidered Because the thermal genes were specificallytested for enrichment with the expectation that therewould be enrichment for hot alleles in heat tolerancegenes and cold alleles in cold-tolerance genes we report Pvalues rather q values adjusted for testing multiple categoriesThis software was specifically designed to deal with differ-ences in average gene size among categories and thereforedoes not result in the overrepresentation of categories withlarger than average gene sizes (Kofler and Schlotterer 2012)

Identification of Putative High LD Regions andShort Introns

LD can result in large number of false positives for instancevia allele frequency correlations between selected and neutrallinked sites Consequently the following regions were desig-nated as having putatively elevated LD and were removedfrom specific analyses as indicated in the main text Firstregions covered by inversions or within 500 kb either side ofthe inversion breakpoints Segregating inversions were inden-tified in our experimental evolution populations by Kapunet al (2013) Second a 1 Mb region located within the inver-sion-free part of 3R This region was found to harbor a largenumber of low-frequency alleles that showed a similar allelefrequency change a pattern consistent with selection upon asingle haplotype block (Orozco-terWengel et al 2012) Finallyregions with a local recombination rate less than 2 cMMbThe recombination rates were taken from Comeron et al(2012) Approximately 300000 of the original 145 millionSNPs remained following the removal of these regionsBecause 3R contained three segregating inversions and theputative selected haplotype block only 15000 SNPs remainedfollowing removal of these regions whereas other chromo-somes maintained at least 50000 SNPs

373

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Short introns coordinates were taken from Lawrie et al(2013) To further ensure that short introns sites were behav-ing neutrally we only took those that were located within theputative low LD regions identified above

ROC Curves and Reproducibility

To test the reproducibility of our top candidates two addi-tional replicates were sequenced from generation 15 in eachtreatment with all sites being processed through the samepipeline as the original candidates Because we did not haveadditional replicates from the base population two new basereplicates were created This was done by combining thereads from the separate base replicates into a single filefrom which two thirds of all reads were randomly sampled(without replacement) and split in half to generate two newfiles each having one-third of the combined base populationreads (~50-fold coverage) All SNPs from the two new sets ofreplicates were then subjected to CMH testing and comparedfor consistency with the original data set using ROC curvesFor the sake of consistency the ~145 million SNPs from theoriginal data set were used in the comparisons

We used a modified ROC curve analysis to provide agraphical representation of the consistency between theoriginal and new data sets that is those based on threereplicates and two replicates respectively The ROC curvesquantify the overlap between the two data sets across suc-cessively larger groups of SNPs with SNPs having beenranked by their CMH P value Points falling on the 11 diag-onal line indicate that overlap between the two sets of SNPsmatches chance expectations of reproducibility whereaspoints lying above the diagonal indicate excess reproducibil-ity Because we are interested in changes that were mostlikely to be caused by the deterministic process of selectionas opposed to the stochastic effects of genetic drift onlySNPs where the same allele was increasing across all fivereplicates were considered as being reproducible

The expected degree of reproducibility under neutralitywas quantified using neutral MimicrEE simulations (Koflerand Schlotterer 2013) All parameters used were identical tothe simulations used for the FDR estimates outlined aboveexcept that a total of 50 data sets were simulated in this caseThe simulated data sets were divided into 10 equally sizedsubsets with each subset being further split into a group ofthree replicates and a group of two replicates Followingdownsampling of the data sets to match our empirical cov-erage distributions (discussed earlier) for each set of simula-tions CMH tests were applied to each group of three and tworeplicates The resulting data were compared for reproduc-ibility using ROC curves as above

LD Analyses

The degree of LD around the top 2000 candidates was esti-mated for both treatments following the same methods out-lined in Orozco-terWengel et al (2012) Briefly we estimatedthe median P value for all SNPs flanking the top 2000 candi-dates in nonoverlapping 50-bp windows either side of thefocal candidates The decay in the P value around the focal

candidates provides a proxy for the strength and extent of LDin regions flanking putative selected sites To ascertain thebackground level of LD among similar SNPs 2000 more SNPswere randomly drawn from the genome after excluding allregions within 200 bp of the top 2000 candidates The num-bers of random SNPs drawn from each chromosome armmatched the distribution of the top 2000 candidates ineach treatment

We used MimicrEE simulations (Kofler and Schlotterer2013) to estimate the degree of long-range LD achievablegiven a simple scenario where strong selection acted on asingleton located in a single haplotype The selection strength(05) was based on increasing a single allele from 1 to 30over 15 generations (ie the mean allele frequency changeobserved for the top 2000 candidate SNPs in the hot treat-ment) This was derived using s = 2ln(p0q1p1q0)t where sis the selection coefficient p and q the biallelic frequencies atthe start (subscript 0) and end (subscript 1) generations andt is the change in generations (eq 33 in Charlesworth andCharlesworth [2010]) Other than the selected site all otherparameters were the same as the neutral simulations outlinedabove Nine simulations in which the singleton increased infrequency by 25ndash35 were included in the analyses Thesewere evenly split into three subsets with each subset beingdownsampled and submitted to CMH testing (in the samemanner as the neutral simulations discussed earlier) Theresulting data were then used to produce Manhattan plotsthat enabled visual assessment the degree of long-range LD(fig 7)

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to A Bergland and D Lawrie for shar-ing the coordinates for the short introns and Martin Kapun forsharing unpublished results RT wishes to thank his colleaguesat the Institut fur Populationsgenetik and the Petrov laboratoryat Stanford University for helpful discussions on data interpre-tation This work was supported by a PhD fellowship from theVetmeduni Vienna the Austrian Science Fund (FWF P19467-B11 W1225-B20) and the European Research Council (ERC)grant ldquoArchAdaptrdquo awarded to CS

ReferencesAgresti A 2002 Categorical data analysis New York John Wiley amp SonsAndolfatto P Wall JD 2003 Linkage disequilibrium patterns across a

recombination gradient in African Drosophila melanogasterGenetics 1651289ndash1305

Bastide H Betancourt A Nolte V Tobler R Stobe P Futschik A SchlottererC 2013 A genome-wide fine-scale map of natural pigmentationvariation in Drosophila melanogaster PLoS Genet 9e1003534

Bergelson J Roux F 2010 Towards identifying genes underlying ecologi-cally relevant traits in Arabidopsis thaliana Nat Rev Genet 11867ndash879

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2013Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila [cited 2013 Aug 26] Availablefrom httparxivorgabs13035044

374

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Page 2: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

are reported for candidate SNPs it is unlikely that all of thesesites could be independent targets of selection (Haldane 1957Nunney 2003) Rather it appears that current methods areconfounded by large numbers of false positives The reasonswhy however remain elusive

To shed light on why such large numbers of candidateSNPs are typically found in D melanogaster EampR studies wesubjected multiple replicates of the same starting populationto two different selection regimes a hot and a cold environ-ment In accordance with previous studies we detected thou-sands of putative selected sites for both environmentsalthough there was relatively little overlap between the twosets of candidates By leveraging information from additionalindependent replicates and the starting allele frequencies ofcandidate SNPs we demonstrate that our populations carrysignatures of environmentally dependent thermal selectionMoreover we demonstrate that most of the candidate SNPswere false positives that were linked to selected sites overdistances much larger than those typically observed forD melanogaster with many being coincident with largeinversions that underwent sizeable frequency shifts duringthe experiment Such long-range associations were readilyreproduced in simulations that replicated common featuresof published Drosophila EampR studies indicating that this phe-nomenon may be a general cause of false positives in thesestudies Based on our results we make several suggestionsintended to improve EampR performance which includesubstituting D melanogaster for sibling species with fewreported inversions such as D simulans or D mauritiana

ResultsTen replicated D melanogaster populations were establishedfrom 113 recently collected isofemale lines and maintained ata census size of 1000 individuals with nonoverlapping gener-ations These were evenly split between two different tem-perature regimesmdashthe ldquohotrdquo and ldquocoldrdquo treatmentsmdashthatcycled between 18ndash28 C and 10ndash20 C respectivelyGenomic DNA was sequenced from pools of ~500 adultfemales (Pool-Seq) for three different replicates in eachevolved treatment (~generation 15) and the starting popu-lation (ie the ldquobaserdquo population) After quality filtering ~145million SNPs were identified in our populations a numberthat is consistent with previous reports for D melanogaster(see Materials and Methods for more details on the experi-mental regime sequencing protocols and SNP calling)

Candidate SNPs

We identified putative selected sites in each treatmentmdashthatis candidate SNPsmdashby comparing the P values from theCochranndashMantelndashHaenszel (CMH) test (Agresti 2002) foreach observed SNP against those for sites simulated underforward WrightndashFisher evolution (see Materials andMethods) Using an empirical false discovery rate (FDR) of0001 and a simulated Ne of 250 we identified ~17000 can-didate SNPs in the hot treatment and ~3500 candidate SNPsin the cold treatment (hereafter referred to as hot candidateSNPs and cold candidate SNPs respectively supplementary

table S1 Supplementary Material online) Because our simu-lations do not account for linkage between sites and thereforemay result in biased estimates we performed additional sim-ulations of neutral evolution among segregating haplotypesBy allowing for recombination between the different haplo-types these simulations explicitly model correlated allelefrequency dynamics caused by linkage between sites (seeMaterials and Methods) The haplotype-based simulationsdid not reduce the number of inferred candidates nor didremoval of regions with low recombination rates (lt2 cMMb) produce notable differences (supplementary table S1Supplementary Material online) This suggests that ourassumption of independence among SNPs did not upwardlybias the inferred number of candidates Thus despite using aconservative significance threshold and being robust to falsepositives caused by drift in low recombining regions thedetection of such a large number of candidate SNPs suggeststhat our estimates for both treatments were probably inflatedby a large number of false positives

Different Selection Signatures in the TwoThermal Regimes

Reasoning that differences in the efficacy of selection wouldbe discernable through differences in effective population size(Ne) we estimated the Ne for each chromosome arm in bothtreatments (see Material and Methods) Indeed Ne wasconsistently larger for the cold treatment relative to the hottreatment (fig 1) This suggests that the effects of selectionmdasheither acting directly on each SNP or via linkage betweenselected and neutral sitesmdashwas more pervasive in thehot treatment overall Moreover the estimated Ne of theX-chromosome was higher than that for most autosomesin both treatments Because the Ne of the X-chromosome isexpected to be 25 smaller than that of the autosomes inneutrally evolving populations our results suggest that selec-tion was more influential on the autosomes in both treat-ments Although we cannot rule out the possibility that theintertreatment difference in Ne was caused by environmen-tally induced nongenetic fitness variation it is unlikely thatthis mechanism was responsible for the Ne variation that wasobserved across chromosome arms within each treatmentThus our results suggest that selection affected a largenumber of SNPs during the experiment potentially in ahabitat-specific manner

To evaluate the hypothesis that different genes were tar-geted in the two selection regimes we tested whether outlierSNPs were enriched among a carefully curated data set ofgenes associated to either heat or cold stress (see Materialsand Methods) To avoid biasing results by making an a prioriassumption about the best number of candidate SNPs in eachtreatment enrichment was tested over successively larger setsof candidates ranging from the top 2000 to 100000 ldquocandi-daterdquo SNPs The results show that hot candidate SNPs werehighly enriched in heat tolerance genes and that cold candi-date SNPs were moderately enriched within cold tolerancegenes (fig 2) Notably there was an order of magnitude moregenes associated with heat tolerance than cold tolerance in

365

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

our curated data sets (~320 vs ~20 supplementary table S2Supplementary Material online) such that the latter mayhave suffered from reduced power caused by having a rela-tively poorly characterized pathway In contrast there was no

significant enrichment of hot candidate SNPs in the coldtolerance genes or of cold candidate SNPs in heat tolerancegenes Thus although these results confirmed treatment-specific selection we still observed associations for sets ofcandidates that were even larger than those estimatedusing our conservative FDR cutoff This result reaffirms thatwe have a large number of false positives that need to beaccounted for

Natural D melanogaster populations have been describedto vary seasonally This pattern was first reported for mor-phological traits (Tantawy 1964) and was more recentlyextended to inversions (Stalker 1980) and SNPs (Berglandet al 2013) that exhibited cyclic frequency changes acrossseasons Molecular evidence suggests that alleles that hadbeen selected over the winter remained at relatively highfrequencies in Drosophila populations collected in thespring before decreasing again by the fall (Bergland et al2013) Because the founder population of our experimentswas collected in the early summer it probably more closelyresembles a winterspring population than a fall population(Bergland et al 2013) Therefore we reasoned that allelesinvolved in adaptation to cold or hot environments maystart at high or low frequencies respectively whereas falsepositives are expected to follow the background frequencydistribution of neutral SNPs When conditioning the allelefrequency on the rising allelemdashthe allele expected to becoming under selectionmdashwe observed an excess of low-frequency alleles in the hot treatment and conversely anenrichment of intermediate frequency alleles in the cold

23

45

X 2L 2R

100 200 300 400 500

23

45

3L

100 200 300 400 500

3R

100 200 300 400 500

Genomeminuswide

HotCold

minuslo

g 10( sum

(xob

sminus

x exp

)2 )

Effective population size

FIG 1 Estimates of effective population size in the evolving populations The sum of the mean squared differences (negative log10 transformed)between the allele frequency change distributions for the experimental and simulated data for a range of Ne values Larger values on the y axis indicate abetter fit between the observed and simulated values and the diamond indicates the single best fit overall that is the best estimate of Ne for a givenchromosome arm or whole genome Error bars are the standard errors for the three replicates

01

23

45

104 105 104 105

HeatCold5 sig

Number of candidates

minuslo

g 10(p

)

FIG 2 Enrichment of candidate SNPs within genes associated withheat and cold tolerance in Drosophila Successively larger sets of can-didate SNPs from the hot (left panel) or cold (right panel) treatmentwere tested for enrichment in gene sets either associated with cold orheat tolerance All points above the dashed line indicate that theprobability of obtaining the same number of thermal tolerancegenes with at least one candidate SNP by chance was less than 5All genes were taken from the CESAR database and categorizedaccording to whether they were uniquely associated with eitherheat or cold tolerance The enrichment tests indicate that selectionoperated in the expected direction although the large number ofcandidate SNPs for which a significant enrichment was found impliesthat these were confounded by false positives

366

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

treatment (fig 3) Given that the high-frequency alleles expe-rience very little allele frequency change before fixation thelack of predicted high-frequency alleles among the cold can-didates is likely to be a direct consequence of insufficientpower to detect such SNPs Notably this pattern was notrestricted to the top 2000 candidate SNPs but extends toa much larger fraction of loci (supplementary fig S1Supplementary Material online) This further reinforces theidea that a large number of SNPs are either directly selected orbehave similar to selected loci in each treatment

Reproducibility of Ranked SNPs

A testable consequence of directional selection is that tar-geted alleles are expected to increase in frequency across allreplicates whereas false positives should not replicate Henceto further distinguish between true positives and noiseamong our top candidates we sequenced two additionalF15 replicates in each treatment and quantified the level ofreproducibility between these and the original replicatesReproducibility was measured using a modified receiveroperating characteristic (ROC) curve which visualizes theexcess of shared SNPs of similar rank (rank being assignedby P value) between our original and new data sets (seeMaterials and Methods) For both temperature regimes wenoted a substantial concordance between the independentreplicates as indicated by the ROC curve being clearly abovethe diagonal (fig 4) In contrast neutrally evolving sites thatwere simulated to have parameters similar to our treatmentdata and which included recombination did not differ from

expectation (supplementary fig S2 Supplementary Materialonline see Materials and Methods) Nonetheless because theconcordance between the independent sets of replicatespeaked among the top 50000ndash100000 ranked SNPs in eachtreatment this analysis could not further distinguish trulyselected SNPs from false positives

The high concordance that we observed could haveresulted from strong LD between selected and linked neutralSNPs which is expected to produce correlated allele fre-quency changes across contiguous sites We tested thishypothesis by reanalyzing the data after removing low recom-bining regions and genomic regions known to contain segre-gating cosmopolitan inversions (see Materials and Methods)The resulting ROC curve exhibited a substantially reducedsignal of concordance between the two independent sets ofreplicates but was still greater than expected by chance(fig 4) Next we repeated the analysis using only short intro-nic sites (without splicing signals) within the putative low LDregions Evidence suggests that short introns are unlikely tocarry selectively favored alleles in D melanogaster (Halliganand Keightley 2006 Parsch et al 2010 Clemente and Vogl2012) whereby these regions are good proxies for neutral loci(but see Farlow et al 2012) Although SNPs in short intronswere not expected to show much consistency across thegroups the signal of concordance was similar to that of theputative low LD regions (fig 4) Consequently this patternindicates that much of the selection signal could be generatedfrom linkage between neutral sites and the true targets ofselection

00

020

050

080

11

00

010

030

05

00

030

060

090

12

00 02 04 06 08 10

00

010

020

03

00 02 04 06 08 10

00

010

020

030

04

00 02 04 06 08 10

00

030

070

10

13Fre

quen

cy

Start End Increase

FIG 3 Allele frequency distribution of candidate SNPs Distribution of the starting (left panels) end (middle panels) and increase (right panels) in allelefrequencies for the top 2000 candidates SNPs in the hot (top row) and cold (bottom row) treatment Alleles are polarized by the rising allele inthe derived treatment such that plots display the frequencies of the putative selected alleles The dashed gray line indicates the distribution of all~145 million SNPs and the diamonds on the x axis indicate the mean frequency for candidate (redblue) and all (gray) SNPs

367

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Effects of Long-Range LD on False Positives

In genome-wide association studies (GWAS) linkage betweenselected and neighboring neutral sites results in a character-istic pattern observed in Manhattan plots namely a series ofadjacent sites with significant P values that are arranged into apeak Such patterns were not readily recognizable in eithertreatment in our study however Rather the top candidatesoccurred as either isolated SNPs or broad flattened humps(fig 5) Furthermore analyses of genomic regions flankingcandidate SNPs showed that LD had a limited rangebeyond the putative selected site (fig 6) This pattern is con-sistent with the low levels of LD observed in D melanogasterpopulations (Miyashita and Langley 1988 Andolfatto andWall 2003 Mackay et al 2012) and has been previously inter-preted as evidence against linkage as a possible cause for thelarge number of candidate SNPs in Drosophila EampR studies(Orozco-terWengel et al 2012) To reconcile the anomaly ofthousands of putative selected SNPs being detected despitean apparent lack of LD in our study we propose that associ-ations beyond 200 bpmdashwhich is widely regarded the limit ofLD in D melanogaster (Mackay et al 2012)mdashmay be preva-lent within our experimental populations

Support for this idea comes from two separate findingsFirst the excess concordance observed for short introns out-lined above is still evident after exclusion of intronic siteslocated in the same genes that contained one or more can-didate SNPs Thus although excluding these intronic sites didreduce concordance to some extent the excess concordancewas still visible even after removing genes containing one ormore of the top 100000 candidate SNPs (supplementaryfig S3 Supplementary Material online) Given that short in-trons are not expected to contain selected sites (Parsch et al2010 Clemente and Vogl 2012) we suggest that linkagebetween sites in short introns and selected sites in regionsother than the enclosing gene are responsible for producingthis pattern

Second a good albeit potentially extreme example oflong-range LD is provided by a 1 Mb region located on aninversion-free part of chromosome 3R that was first describedby Orozco-terWengel et al (2012) Among our genome-widetop 2000 hot candidates 281 were located in this regionwhich represents a massive enrichment relative to its sizeThese sites increased in frequency by ~27 over the first 15generations on average and 33 of the 24K SNPs in thisregion had the same rising allele across all five replicates inthe hot treatment (cf 23 across the rest of the genomeP value lt22e16 using Fisherrsquos exact test) Furthermore alarge number of putative selected alleles in this region hadstarting frequencies lt1 (118 SNPs) and these sites dis-played highly homogeneous increases in frequency duringthe study (supplementary fig S4 Supplementary Materialonline) Thus although it is evident from the data that thisis a low-frequency haplotype that increased across all repli-cates in the hot treatment 83 of the putative candidates inthis region were separated by distances greater than 200 bpwith an average distance of 5570 bp between sites Becausesuch a broad spacing between candidate sites is well beyondthe typical range of LD in D melanogaster we suggest thatlong-range LD is responsible for these patterns and themajority of false positives more generally

To evaluate to what extent long-range LD could beobserved in a typical EampR study we simulated strong selectionon a single allele private to a rare haplotype using allele fre-quency changes and parameters similar to those in the hottreatment (see Materials and Methods) We found thatneighboring candidate SNPs were often separated by severalthousand base pairs or more and could be millions of basepairs from the selected site (fig 7) These results clearly indi-cate that long-range LD can be readily generated under asimple but plausible scenario for EampR studies

DiscussionExposure of D melanogaster to novel hot or cold environ-ments resulted in selection of alleles in thermal tolerancegenes specific to each environment Moreover the startingfrequencies of the putative selected SNPs differed mark-edly between the two environments suggesting that theyhad experienced selection within the natural progenitorpopulation prior to the start of the experiment Henceour study reveals that a potentially rich source of

00 02 04 06 08 10

00

02

04

06

08

10

Proportion SNPs

Pro

port

ion

com

mon

can

dida

tes

Treatment amp region

Hot AllHot low LDHot SICold AllCold low LDCold SI

FIG 4 Concordance between two independent groups of replicatesModified ROC curve showing the proportion of common candidateSNPs shared between two independent sets of derived populationsrelative to cumulatively larger sets of ranked SNPs (see Materials andMethods) for both hot (red lines) and cold (blue lines) treatmentsSeparate lines are plotted for full data set (~145 million SNPs All) aswell as for those SNPs only falling in regions of putatively low LD (ieregions outside of inversions and low recombining regions) or for shortintrons in these regions (SI) Lines above the black diagonal line indicatethat there are more common candidate SNPs than expected by chancesuch that our top candidates are more reproducible than those neu-trally evolving populations Because sites with low LD or in short intronswith low LD show a similar reduction in concordance relative to all sitesmuch of the excess concordance is probably due to changes in inver-sions and long-range LD between selected and neutral sites

368

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

FIG 5 Manhattan plots showing the genomic distribution of candidate SNPs The negative log10-transformed P values of all SNPs relative to genomicposition The top 2000 candidate SNPs for the hot (top panel) and cold (bottom panel) treatment are highlighted (see key) Candidates are categorizedas being unique to the hot (red) or cold (blue) treatments or common to both (green) Point size scales with significance except for common SNPswhich are shown at a standard size for emphasis Inset boxes indicate the position of inversions and a putative selected haplotype block on 3R (largedashes most proximal to centromere on 3R) The inversions are as follows In(2L)t In(2R)NS In(3L)P In(3R)Payne (short dashes) In(3R)C (dots) andIn(3R)Mo (solid line) The candidate SNPs typically appear as either isolated points or arranged in broad flattened humps and tend to be clustered on3R where three large inversions overlap These patterns further imply that inversions and long-range LD have generated many false positives in thestudy

02

46

810

minus4000 minus2000 0 2000 4000

02

46

810

Distance from candidate (bp)

minuslo

g 10(p

)

HotColdRandom200bp

FIG 6 Correlation between the top 2000 candidate and flanking SNPs The decay in LD (correlation) around candidates is measured as the rate ofdecline in neighboring (negative log10 transformed) P values to background levels Data are averaged over successive 50-bp bins flanking focal SNPs Solidblack lines indicate the bin means The red blue and green polygons around these lines indicate 95 confidence intervals for hot treatment coldtreatment and random (ie background) SNPs respectively In accordance to reported results from D melanogaster most of the LD is lost within200 bp either side of each focal SNP (indicated by dashed lines)

369

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

information lies in using the base population informationwhich has not been considered in most published EampR studiesto date

Despite strong evidence that differential thermal selectioncontributed to our top candidates in each treatment theshear number of putatively selected sites indicates that ourresults were confounded by a large number of false positivesGiven that short-range LD and strong drift in low recombin-ing regions were unlikely to be major contributors to theexcess of false positives in our study we hypothesized thatLD over longer distances was responsible for much of thisexcess Long-range LD in our experimental populations couldhave arisen as a byproduct of the founders being a smallsample of the much larger natural population In this casechance correlations between sites separated by vast distancesmay result from the finite sampling of alleles Long-range LDmay also originate within natural populations For instancestrong LD between loci separated by large distances has beenreported for Japanese D melanogaster populations (Takano-Shimizu et al 2004 Itoh et al 2010) Because the LD was muchstronger in the spring than in the autumn this finding wasattributed to exaggerated sampling effects during the previ-ous winter bottleneck although there was evidence thatrecurrent epistatic selection was also responsible for somelinked loci Finally the introgression of new haplotypesfrom migrant individuals is another natural process thatcould produce long-range LD Recent and ongoing admixture

has been reported for African (Pool and Aquadro 2006 Poolet al 2012) and North American (Caracristi and Schlotterer2003) D melanogaster populations suggesting that in prin-ciple this may also contribute to long-range LD in naturalpopulations

The systematic increase in frequency of alleles in the hottreatment that were broadly distributed across a 1 Mb regionof 3R demonstrates the effects of long-range LD in our exper-iment In combination with differences in the distribution ofthe starting allele frequencies long-range LD may also help toexplain the differences in the number of candidates and thelevel of concordance observed between the two treatmentsGiven that most candidates were rare in the hot treatmentand common in the cold treatmentmdashpresumably resultingfrom prior selection against (for) heat (cold) toleranceallelesmdashlong-range LD is expected to be more extensiveamong the hot candidates This is because rare haplotypesare expected to harbor many more private SNPs thancommon ones and all such sites will experience the samechange in allele frequency until these associations are brokendown by recombination The difference in Ne between thetwo treatments is also consistent with more extensive long--range LD producing more widespread changes in allele fre-quencies in the hot treatment

Segregating inversions are another probable source of falsepositives in our study The large decrease in concordance thatwas observed when inversions and regions with low

FIG 7 Simulated long-range LD generated by a singleton under strong selection Manhattan plots showing the distribution of the top 1000 candidateSNPs from three separate sets of simulated data (top to bottom panels) in which a singleton carried by a single chromosome-length haplotype is understrong selection (see Materials and Methods) The full chromosome is shown in the panels on the right with a zoomed-in view shown in the panels onthe left Significant SNPs occur vast distances from the selected site (dashed line) and tend to be several thousands of base pairs apart Thus it appearslikely that long-range LD between selected and neutral sites can readily inflate the number of false positives in an EampR study

370

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

recombination were removed from the data set supports thisidea (fig 4) Additional confirmation comes from a recentstudy which found that several large inversions were segre-gating in our populations over the course of the experimentincluding three partially overlapping inversions on chromo-some arm 3R (Kapun et al 2013) By tracing the trajectory ofalleles that were private to each inversion a single inversionwas found to increase across all three replicates in each treat-ment (In(3R)c by 25 in the hot and In(3R)mo by 15 in thecold) Indeed the majority of the top 2000 candidate SNPswas observed on chromosome arm 3R for both treatments(~80 for the hot and ~40 for the cold supplementarytable S3 Supplementary Material online) with many candi-date SNPs lying within these inversions Furthermore thegreatest concentration of cold candidates was situatedwithin the region where all three inversions overlapBecause recombination is likely to be suppressed in thisregion the local effects of LD and drift are expected to beexacerbated relative to the rest of the genome Thus changesin inversion frequencies are also likely to have substantiallyinflated the number of false positives in our study

Implications for Future EampR Research

Our results indicate that EampR studies with small founder pop-ulations strong selection and large segregating inversions arelikely to suffer from large numbers of false positives Thesefactors are likely to be present in other D melanogaster EampRstudiesmdashindeed several large inversions segregate at interme-diate frequencies in natural D melanogaster populations(Corbett-Detig and Hartl 2012 Kapun et al 2013)mdashmeaningthat our findings have important implications for future EampRresearch in Drosophila

First if seasonal selection is common in D melanogasterwhich accumulating evidence suggests may be the case(Stalker 1980 Takano-Shimizu et al 2004 Itoh et al 2010Bergland et al 2013) then the genetic composition of theexperimental populations will be dependent on their time ofcollection Evidence suggests that genes associated with stressresistance are highly pleiotropic in Drosophila (Rose et al1992 Hoffman et al 2001 Bubliy and Loeschcke 2005 Kolssand Kawecki 2008) implying that a broad spectrum of traitsmay be affected in this way

Second large numbers of false positives are expected inEampR studies that use small founder populations as a directresult of long-range LD between selected and neutral sitesThe effect of founder population size on the performance ofexperimental design has been explicitly addressed in a recenthaplotype-based simulation study (Kofler and Schlotterer2013) The authors demonstrate that both the power andspecificity of EampR studies is increased when larger numbersof founders are used with the gains in performance beingparticularly noticeable when the strength of selection per SNPis strong Another plausible way to reduce the impact of long-range LD is to create independent replicates from differentsets of founders Although this does not remove long-rangeLD per se it results in different associations across replicates

such that analyses that combine replicates should remove themajority of such effects

Finally our results indicate that inversions are likely to be amajor hindrance in detecting causal loci Large segregatinginversions appear to be ubiquitous in natural D melanogasterpopulations and recent work suggests that newly introducedinversions may influence genetic diversity across entire chro-mosome arms presumably by suppressing recombinationwell beyond inversion breakpoints (Corbett-Detig and Hartl2012 Pool et al 2012 Kapun et al 2013) Consequently anyimprovements in performance that might be achievedthrough the experimental design modifications suggestedabove are unlikely to be realized in chromosome arms carry-ing large inversions whereby inversion-free populations maybe necessary to ensure that such improvements are realizedover the entire genome Although inversion-free D melano-gaster populations can be obtained via karyotyping individ-uals prior to the study the time and effort expended in suchscreens may become prohibitive if large founding populationsare desired Another possibility is to use an alternativeDrosophila species for which segregating inversions are raresuch as D simulans or D mauritiana instead of D melano-gaster Although these species currently lack the detailedgenetic and genomic resources available for D melanogasterthe ongoing development of portable molecular tools hasmade functional validation of candidate loci in these speciestenable (Carroll 2011 Liu et al 2012 Yu et al 2013)

ConclusionPrevious D melanogaster EampR studies have not directly ad-dressed the role that inversions or long-range LD may havehad in generating the large numbers of reported candidateSNPs Our results imply that such estimates were probablyinflated by false positives resulting from these phenomenaConsequently Drosophila EampR study designs will need to bemodified before the true causal loci that contribute to adap-tation andor complex traits can be reliably identified Whilewe have outlined several possibilities here we also point outthat the study by Kofler and Schlotterer (2013) is intended asa general guideline for EampR study design There the authorsuse haplotype-based simulations to investigate EampR perfor-mance for wide range of experimental and evolutionary var-iables and thereby provide an invaluable reference for futureDrosophila EampR research

Materials and Methods

Experimental Populations and Evolution Regimes

Full details on the experimental setup are outlined in Orozco-terWengel et al (2012) Briefly wild D melanogaster werecollected in Northern Portugal in the summer of 2008 Thepopulation was maintained as 113 isofemale lines for fivegenerations in the laboratory prior to the creation of mass-bred populations Ten replicated mass-bred populations werecreated with each replicate combining five females from eachisofemale line This initial population of merged isofemalelines is referred to as the base population or F0 Five replicatepopulations were maintained in each of two novel

371

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

environments one hot and one cold treatment Each treat-ment cycled between two fixed temperature and light set-tings every 12 hmdashbetween 10 C and 20 C in the coldtreatment and 18 C and 28 C in the hot treatmentmdashwithlight and dark periods occurring with the hot and cold phaseof each treatment respectively The heatingcooling periodbetween the two fixed temperatures was relatively rapidtaking ~1 h before the new temperature is reached suchthat most time is spent at either of the modal temperaturesEach replicate was maintained at ~1000 individuals with~5050 sex ratios that were evenly distributed across five~225 ml bottles Each bottle contained ~70 ml of standardDrosophila medium Generations were nonoverlapping

Genome Sequencing and Mapping

Illumina full genome sequencing was performed on threereplicates for each of the base population and evolvedpopulations For the hot treatment a single replicate of the23rd generation and two from the 15th generation weresequenced (Orozco-terWengel et al 2012) whereas for thecold treatment all three sequenced replicates came from the15th generation Sequencing was performed using paired-endreads 75 bp in length in the hot treatment (GAIIx) and 100 bplong in the cold treatment (HiSeq) For each replicate andtime point ~500 flies were pooled and sequenced as onesample (ie Pool-Seq Futschik and Schlotterer 2010) Twoadditional replicates were also sequenced from the 15th gen-eration of the hot and cold treatments These were used tocheck the reproducibility of candidate detection as describedbelow Average genome-wide coverage ranged from ~35- to~90-fold for each of the sequenced replicates (supplementarytable S4 Supplementary Material online)

Mapping was carried out using a combination of purpose-built and third-party software which is described in detail inKofler Orozco-terWengel et al (2011) Briefly reads weretrimmed to remove low-quality bases and then mappedwith bwa (version 058c) (Li and Durbin 2009) against theD melanogaster reference genome (version 518) andWolbachia (NC_0029786) The alignment files were con-verted to the SAM format (Li et al 2009) with reads notmapped in proper pairs or with a mapping quality of lessthan 20 being removed SAM files were converted into thepileup format and indels and repeat sequences identified byRepeatMasker 329 (wwwrepeatmaskerorg last accessedNovember 5 2013) masked The aligned pileup files wereconverted to a synchronized file (Kofler Pandey et al 2011)from which SNPs and their accompanying allele frequencieswere derived All allele frequency changes presented hereinwere averaged across all three replicates (weighted by cover-age) We further removed all SNPs whose coverage was eitheramong the top 2 of a given replicate and time point or had aminimum count less than 30 across the 23 populations in ourextended data set that comprised additional replicates andtime points that were not analyzed here Because the averagecoverage per SNP across the extended data set was ~1500the minimum count of 30 equates to an average 2 minorallele frequency across all replicates and time points (although

allele frequencies could be lower in the time points that werespecifically analyzed in this study) Annotation was based onversion 540 of the D melanogaster reference genomeApproximately 145 million SNPs were called overall

Effective Population Size FDR andCandidate Estimation

In order to estimate Ne and FDR in both treatments forwardWrightndashFisher simulations were performed to generate ex-pected neutral changes during the experiment Three sets ofneutral allele frequency changes were simulated for eachtreatment matching the number of replicates in the studyFor each simulation 145 million SNPs were independentlysampled (ie approximating the number of SNPs in our realdata) from the distribution of allele frequencies expected after15 generations of neutral evolution conditional on the start-ing frequency for each SNP Starting frequencies for each SNPwere randomly drawn from the empirical distribution of startfrequencies in the experiment with the frequencies being thecoverage-weighted average across all replicates The expecteddistribution of allele frequencies after 15 generations was cre-ated by generating an NiNj transition matrix where eachcell is the binomial probability of seeing j alleles with a startingfrequency iN (ie N is the effective population size) and thenraising this matrix to the 15th power To simulate the effectsof DNA pooling in our study an additional round of binomialsampling was conducted on the start and end allele frequen-cies for each SNP To do this each SNP was assigned a randomcoverage value that was drawn from the empirical distribu-tion of coverages from a specific generation and treatmentThis was done separately for each replicate thereby capturingthe heterogeneity in coverage both within and betweenreplicates in the study

The effective population size of the observed data was es-timated as follows First multiple simulations were performedas outlined above each with a differing Ne Each simulationproduced a set of expected start and end allele frequencies fora given Ne from which an allele frequency change (AFC) dis-tribution was obtained (ie the end frequency minus startfrequency for all 145 million SNPs) The expected AFC distri-bution was then compared for goodness of fit with the ob-served AFC distribution by summing the squared difference ofthe AFC values evaluated in 1 frequency change binsBecause individual replicates were simulated we averagedthe goodness of fit across the three replicates to get a pointestimate for each Ne Additionally the error around this meanwas sufficiently large for both treatments that the best-fittingNe was not significantly different from many neighboring Ne

values Consequently we took the lowest Ne from amongstthis range (~250 supplementary table S1 SupplementaryMaterial online) as it is expected to provide the most conser-vative estimate of the FDR (because the strength of drift in-creases as Ne decreases) The Ne for individual chromosomeswas also inferred by limiting the observed data to that specificto each chromosome and then repeating the steps above

Following Ne estimation the number of putative selectedSNPs was inferred as follows First all ~145 million SNPs that

372

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

remained after filtering were subjected to the CMH test(Agresti 2002) The CMH test was shown to outperform sev-eral other statistics that were used for candidate inference inrecent Drosophila EampR studies (Kofler and Schlotterer 2013)Testing was conducted separately for the hot and cold treat-ments with each being contrasted against the base popula-tion Second the CMH test was applied to simulated SNPdata that was derived using the conservative Ne estimate of250 (discussed earlier) Finally the number of candidate SNPswas estimated by setting an empirical FDR of 0001 for eachtreatment Thus the number of candidates for a given treat-ment was quantified as the number of observed SNPs thathad CMH P values that were lower than the least significantCMH P value from among the 01 most extreme simulatedSNPs The number of candidate SNPs was also inferred for arange of different FDR values and the results are shown insupplementary table S1 Supplementary Material online

To examine the effects that linkage between loci had onthe candidate inference we also performed 30 forward neu-tral simulations starting from a population with 250 uniquehaplotypes (ie 250 homozygous diploid individuals) for 15generations Thus the number of founding haplotypes andthe effective population size during the simulations was 250Assuming that the isofemale founders maintained two segre-gating chromosomes per line on averagemdashwhich is notunreasonable considering that they were five generationsold when making the mass-bred populationsmdashthen thesenumbers roughly accord with those expected within ourexperimental populations The simulations were performedusing MimicrEE (Kofler and Schlotterer 2013) This softwareenabled the modeling of recombination at rates that werederived from empirical estimates (Fiston-Lavier et al 2010)Thirty simulations were performed and divided into 10 equal-sized subsets The three simulations from each of the 10subsets were then downsampled to match the coveragedistributions of the three empirical replicates from the rele-vant generation and treatment Next we applied the CMHtest to all sites in each subset and estimated the number ofcandidates thereof via the FDR method outlined in the pre-ceding paragraph The haplotypes used in the simulationswere created by neutral coalescent simulations and aredescribed in Bastide et al (2013)

Enrichment in Thermal Tolerance Genes

We tested for treatment-specific evolution by looking forcandidate enrichment in genes associated with heat andcold tolerance Thermal tolerance gene sets were takenfrom the Candidate Stress Genes in Drosophila databasewhich was obtained from the Centre for EnvironmentalStress and Adaptation Research website (httpcesarorgaulast accessed November 5 2013) Only genes with a signifi-cant and unique association with either heat or cold tol-erance were included in the analyses (supplementary tableS2 Supplementary Material online) Separate enrichmentanalyses were performed on the 2000 4000 6000 800010000 12000 14000 16000 18000 20000 25000 3000035000 40000 45000 50000 60000 70000 80000 90000

and 100000 most significant SNPs in each treatment Notethat each successive set of SNPs was an aggregation of pre-vious SNP sets and the subsequent set of most significantSNPs For instance the set that contained the top 4000 SNPscomprised the set containing the top 2000 SNPs along withthe next 2000 most significant SNPs All enrichment analyseswere performed using Gowinda software (Kofler andSchlotterer 2012) Gowinda generates P values by conductingmultiple simulations in which the position of candidate SNPsare randomly chosen from the ~145 million of SNPs scat-tered throughout the genome The P value was then quan-tified as the proportion of simulations that had more genesspecific to a given category (in our study genes associatedeither with hot or cold thermotolerance) containing at leastone candidate SNP than the observed data set (Kofler andSchlotterer 2012) In total we performed 100000 permuta-tions for each set of candidate SNPs in each treatment andfor each set of thermal tolerance genes In each case a genecontaining multiple candidate SNPs was only counted once(ie the gene option of the mode argument) Gene annota-tions based on version 540 of D melanogaster referencegenome and only the full coding region of each gene andnot the upstream or downstream flanking regions (ie thegene option of the gene-definition argument) wereconsidered Because the thermal genes were specificallytested for enrichment with the expectation that therewould be enrichment for hot alleles in heat tolerancegenes and cold alleles in cold-tolerance genes we report Pvalues rather q values adjusted for testing multiple categoriesThis software was specifically designed to deal with differ-ences in average gene size among categories and thereforedoes not result in the overrepresentation of categories withlarger than average gene sizes (Kofler and Schlotterer 2012)

Identification of Putative High LD Regions andShort Introns

LD can result in large number of false positives for instancevia allele frequency correlations between selected and neutrallinked sites Consequently the following regions were desig-nated as having putatively elevated LD and were removedfrom specific analyses as indicated in the main text Firstregions covered by inversions or within 500 kb either side ofthe inversion breakpoints Segregating inversions were inden-tified in our experimental evolution populations by Kapunet al (2013) Second a 1 Mb region located within the inver-sion-free part of 3R This region was found to harbor a largenumber of low-frequency alleles that showed a similar allelefrequency change a pattern consistent with selection upon asingle haplotype block (Orozco-terWengel et al 2012) Finallyregions with a local recombination rate less than 2 cMMbThe recombination rates were taken from Comeron et al(2012) Approximately 300000 of the original 145 millionSNPs remained following the removal of these regionsBecause 3R contained three segregating inversions and theputative selected haplotype block only 15000 SNPs remainedfollowing removal of these regions whereas other chromo-somes maintained at least 50000 SNPs

373

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Short introns coordinates were taken from Lawrie et al(2013) To further ensure that short introns sites were behav-ing neutrally we only took those that were located within theputative low LD regions identified above

ROC Curves and Reproducibility

To test the reproducibility of our top candidates two addi-tional replicates were sequenced from generation 15 in eachtreatment with all sites being processed through the samepipeline as the original candidates Because we did not haveadditional replicates from the base population two new basereplicates were created This was done by combining thereads from the separate base replicates into a single filefrom which two thirds of all reads were randomly sampled(without replacement) and split in half to generate two newfiles each having one-third of the combined base populationreads (~50-fold coverage) All SNPs from the two new sets ofreplicates were then subjected to CMH testing and comparedfor consistency with the original data set using ROC curvesFor the sake of consistency the ~145 million SNPs from theoriginal data set were used in the comparisons

We used a modified ROC curve analysis to provide agraphical representation of the consistency between theoriginal and new data sets that is those based on threereplicates and two replicates respectively The ROC curvesquantify the overlap between the two data sets across suc-cessively larger groups of SNPs with SNPs having beenranked by their CMH P value Points falling on the 11 diag-onal line indicate that overlap between the two sets of SNPsmatches chance expectations of reproducibility whereaspoints lying above the diagonal indicate excess reproducibil-ity Because we are interested in changes that were mostlikely to be caused by the deterministic process of selectionas opposed to the stochastic effects of genetic drift onlySNPs where the same allele was increasing across all fivereplicates were considered as being reproducible

The expected degree of reproducibility under neutralitywas quantified using neutral MimicrEE simulations (Koflerand Schlotterer 2013) All parameters used were identical tothe simulations used for the FDR estimates outlined aboveexcept that a total of 50 data sets were simulated in this caseThe simulated data sets were divided into 10 equally sizedsubsets with each subset being further split into a group ofthree replicates and a group of two replicates Followingdownsampling of the data sets to match our empirical cov-erage distributions (discussed earlier) for each set of simula-tions CMH tests were applied to each group of three and tworeplicates The resulting data were compared for reproduc-ibility using ROC curves as above

LD Analyses

The degree of LD around the top 2000 candidates was esti-mated for both treatments following the same methods out-lined in Orozco-terWengel et al (2012) Briefly we estimatedthe median P value for all SNPs flanking the top 2000 candi-dates in nonoverlapping 50-bp windows either side of thefocal candidates The decay in the P value around the focal

candidates provides a proxy for the strength and extent of LDin regions flanking putative selected sites To ascertain thebackground level of LD among similar SNPs 2000 more SNPswere randomly drawn from the genome after excluding allregions within 200 bp of the top 2000 candidates The num-bers of random SNPs drawn from each chromosome armmatched the distribution of the top 2000 candidates ineach treatment

We used MimicrEE simulations (Kofler and Schlotterer2013) to estimate the degree of long-range LD achievablegiven a simple scenario where strong selection acted on asingleton located in a single haplotype The selection strength(05) was based on increasing a single allele from 1 to 30over 15 generations (ie the mean allele frequency changeobserved for the top 2000 candidate SNPs in the hot treat-ment) This was derived using s = 2ln(p0q1p1q0)t where sis the selection coefficient p and q the biallelic frequencies atthe start (subscript 0) and end (subscript 1) generations andt is the change in generations (eq 33 in Charlesworth andCharlesworth [2010]) Other than the selected site all otherparameters were the same as the neutral simulations outlinedabove Nine simulations in which the singleton increased infrequency by 25ndash35 were included in the analyses Thesewere evenly split into three subsets with each subset beingdownsampled and submitted to CMH testing (in the samemanner as the neutral simulations discussed earlier) Theresulting data were then used to produce Manhattan plotsthat enabled visual assessment the degree of long-range LD(fig 7)

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to A Bergland and D Lawrie for shar-ing the coordinates for the short introns and Martin Kapun forsharing unpublished results RT wishes to thank his colleaguesat the Institut fur Populationsgenetik and the Petrov laboratoryat Stanford University for helpful discussions on data interpre-tation This work was supported by a PhD fellowship from theVetmeduni Vienna the Austrian Science Fund (FWF P19467-B11 W1225-B20) and the European Research Council (ERC)grant ldquoArchAdaptrdquo awarded to CS

ReferencesAgresti A 2002 Categorical data analysis New York John Wiley amp SonsAndolfatto P Wall JD 2003 Linkage disequilibrium patterns across a

recombination gradient in African Drosophila melanogasterGenetics 1651289ndash1305

Bastide H Betancourt A Nolte V Tobler R Stobe P Futschik A SchlottererC 2013 A genome-wide fine-scale map of natural pigmentationvariation in Drosophila melanogaster PLoS Genet 9e1003534

Bergelson J Roux F 2010 Towards identifying genes underlying ecologi-cally relevant traits in Arabidopsis thaliana Nat Rev Genet 11867ndash879

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2013Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila [cited 2013 Aug 26] Availablefrom httparxivorgabs13035044

374

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Page 3: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

our curated data sets (~320 vs ~20 supplementary table S2Supplementary Material online) such that the latter mayhave suffered from reduced power caused by having a rela-tively poorly characterized pathway In contrast there was no

significant enrichment of hot candidate SNPs in the coldtolerance genes or of cold candidate SNPs in heat tolerancegenes Thus although these results confirmed treatment-specific selection we still observed associations for sets ofcandidates that were even larger than those estimatedusing our conservative FDR cutoff This result reaffirms thatwe have a large number of false positives that need to beaccounted for

Natural D melanogaster populations have been describedto vary seasonally This pattern was first reported for mor-phological traits (Tantawy 1964) and was more recentlyextended to inversions (Stalker 1980) and SNPs (Berglandet al 2013) that exhibited cyclic frequency changes acrossseasons Molecular evidence suggests that alleles that hadbeen selected over the winter remained at relatively highfrequencies in Drosophila populations collected in thespring before decreasing again by the fall (Bergland et al2013) Because the founder population of our experimentswas collected in the early summer it probably more closelyresembles a winterspring population than a fall population(Bergland et al 2013) Therefore we reasoned that allelesinvolved in adaptation to cold or hot environments maystart at high or low frequencies respectively whereas falsepositives are expected to follow the background frequencydistribution of neutral SNPs When conditioning the allelefrequency on the rising allelemdashthe allele expected to becoming under selectionmdashwe observed an excess of low-frequency alleles in the hot treatment and conversely anenrichment of intermediate frequency alleles in the cold

23

45

X 2L 2R

100 200 300 400 500

23

45

3L

100 200 300 400 500

3R

100 200 300 400 500

Genomeminuswide

HotCold

minuslo

g 10( sum

(xob

sminus

x exp

)2 )

Effective population size

FIG 1 Estimates of effective population size in the evolving populations The sum of the mean squared differences (negative log10 transformed)between the allele frequency change distributions for the experimental and simulated data for a range of Ne values Larger values on the y axis indicate abetter fit between the observed and simulated values and the diamond indicates the single best fit overall that is the best estimate of Ne for a givenchromosome arm or whole genome Error bars are the standard errors for the three replicates

01

23

45

104 105 104 105

HeatCold5 sig

Number of candidates

minuslo

g 10(p

)

FIG 2 Enrichment of candidate SNPs within genes associated withheat and cold tolerance in Drosophila Successively larger sets of can-didate SNPs from the hot (left panel) or cold (right panel) treatmentwere tested for enrichment in gene sets either associated with cold orheat tolerance All points above the dashed line indicate that theprobability of obtaining the same number of thermal tolerancegenes with at least one candidate SNP by chance was less than 5All genes were taken from the CESAR database and categorizedaccording to whether they were uniquely associated with eitherheat or cold tolerance The enrichment tests indicate that selectionoperated in the expected direction although the large number ofcandidate SNPs for which a significant enrichment was found impliesthat these were confounded by false positives

366

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

treatment (fig 3) Given that the high-frequency alleles expe-rience very little allele frequency change before fixation thelack of predicted high-frequency alleles among the cold can-didates is likely to be a direct consequence of insufficientpower to detect such SNPs Notably this pattern was notrestricted to the top 2000 candidate SNPs but extends toa much larger fraction of loci (supplementary fig S1Supplementary Material online) This further reinforces theidea that a large number of SNPs are either directly selected orbehave similar to selected loci in each treatment

Reproducibility of Ranked SNPs

A testable consequence of directional selection is that tar-geted alleles are expected to increase in frequency across allreplicates whereas false positives should not replicate Henceto further distinguish between true positives and noiseamong our top candidates we sequenced two additionalF15 replicates in each treatment and quantified the level ofreproducibility between these and the original replicatesReproducibility was measured using a modified receiveroperating characteristic (ROC) curve which visualizes theexcess of shared SNPs of similar rank (rank being assignedby P value) between our original and new data sets (seeMaterials and Methods) For both temperature regimes wenoted a substantial concordance between the independentreplicates as indicated by the ROC curve being clearly abovethe diagonal (fig 4) In contrast neutrally evolving sites thatwere simulated to have parameters similar to our treatmentdata and which included recombination did not differ from

expectation (supplementary fig S2 Supplementary Materialonline see Materials and Methods) Nonetheless because theconcordance between the independent sets of replicatespeaked among the top 50000ndash100000 ranked SNPs in eachtreatment this analysis could not further distinguish trulyselected SNPs from false positives

The high concordance that we observed could haveresulted from strong LD between selected and linked neutralSNPs which is expected to produce correlated allele fre-quency changes across contiguous sites We tested thishypothesis by reanalyzing the data after removing low recom-bining regions and genomic regions known to contain segre-gating cosmopolitan inversions (see Materials and Methods)The resulting ROC curve exhibited a substantially reducedsignal of concordance between the two independent sets ofreplicates but was still greater than expected by chance(fig 4) Next we repeated the analysis using only short intro-nic sites (without splicing signals) within the putative low LDregions Evidence suggests that short introns are unlikely tocarry selectively favored alleles in D melanogaster (Halliganand Keightley 2006 Parsch et al 2010 Clemente and Vogl2012) whereby these regions are good proxies for neutral loci(but see Farlow et al 2012) Although SNPs in short intronswere not expected to show much consistency across thegroups the signal of concordance was similar to that of theputative low LD regions (fig 4) Consequently this patternindicates that much of the selection signal could be generatedfrom linkage between neutral sites and the true targets ofselection

00

020

050

080

11

00

010

030

05

00

030

060

090

12

00 02 04 06 08 10

00

010

020

03

00 02 04 06 08 10

00

010

020

030

04

00 02 04 06 08 10

00

030

070

10

13Fre

quen

cy

Start End Increase

FIG 3 Allele frequency distribution of candidate SNPs Distribution of the starting (left panels) end (middle panels) and increase (right panels) in allelefrequencies for the top 2000 candidates SNPs in the hot (top row) and cold (bottom row) treatment Alleles are polarized by the rising allele inthe derived treatment such that plots display the frequencies of the putative selected alleles The dashed gray line indicates the distribution of all~145 million SNPs and the diamonds on the x axis indicate the mean frequency for candidate (redblue) and all (gray) SNPs

367

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Effects of Long-Range LD on False Positives

In genome-wide association studies (GWAS) linkage betweenselected and neighboring neutral sites results in a character-istic pattern observed in Manhattan plots namely a series ofadjacent sites with significant P values that are arranged into apeak Such patterns were not readily recognizable in eithertreatment in our study however Rather the top candidatesoccurred as either isolated SNPs or broad flattened humps(fig 5) Furthermore analyses of genomic regions flankingcandidate SNPs showed that LD had a limited rangebeyond the putative selected site (fig 6) This pattern is con-sistent with the low levels of LD observed in D melanogasterpopulations (Miyashita and Langley 1988 Andolfatto andWall 2003 Mackay et al 2012) and has been previously inter-preted as evidence against linkage as a possible cause for thelarge number of candidate SNPs in Drosophila EampR studies(Orozco-terWengel et al 2012) To reconcile the anomaly ofthousands of putative selected SNPs being detected despitean apparent lack of LD in our study we propose that associ-ations beyond 200 bpmdashwhich is widely regarded the limit ofLD in D melanogaster (Mackay et al 2012)mdashmay be preva-lent within our experimental populations

Support for this idea comes from two separate findingsFirst the excess concordance observed for short introns out-lined above is still evident after exclusion of intronic siteslocated in the same genes that contained one or more can-didate SNPs Thus although excluding these intronic sites didreduce concordance to some extent the excess concordancewas still visible even after removing genes containing one ormore of the top 100000 candidate SNPs (supplementaryfig S3 Supplementary Material online) Given that short in-trons are not expected to contain selected sites (Parsch et al2010 Clemente and Vogl 2012) we suggest that linkagebetween sites in short introns and selected sites in regionsother than the enclosing gene are responsible for producingthis pattern

Second a good albeit potentially extreme example oflong-range LD is provided by a 1 Mb region located on aninversion-free part of chromosome 3R that was first describedby Orozco-terWengel et al (2012) Among our genome-widetop 2000 hot candidates 281 were located in this regionwhich represents a massive enrichment relative to its sizeThese sites increased in frequency by ~27 over the first 15generations on average and 33 of the 24K SNPs in thisregion had the same rising allele across all five replicates inthe hot treatment (cf 23 across the rest of the genomeP value lt22e16 using Fisherrsquos exact test) Furthermore alarge number of putative selected alleles in this region hadstarting frequencies lt1 (118 SNPs) and these sites dis-played highly homogeneous increases in frequency duringthe study (supplementary fig S4 Supplementary Materialonline) Thus although it is evident from the data that thisis a low-frequency haplotype that increased across all repli-cates in the hot treatment 83 of the putative candidates inthis region were separated by distances greater than 200 bpwith an average distance of 5570 bp between sites Becausesuch a broad spacing between candidate sites is well beyondthe typical range of LD in D melanogaster we suggest thatlong-range LD is responsible for these patterns and themajority of false positives more generally

To evaluate to what extent long-range LD could beobserved in a typical EampR study we simulated strong selectionon a single allele private to a rare haplotype using allele fre-quency changes and parameters similar to those in the hottreatment (see Materials and Methods) We found thatneighboring candidate SNPs were often separated by severalthousand base pairs or more and could be millions of basepairs from the selected site (fig 7) These results clearly indi-cate that long-range LD can be readily generated under asimple but plausible scenario for EampR studies

DiscussionExposure of D melanogaster to novel hot or cold environ-ments resulted in selection of alleles in thermal tolerancegenes specific to each environment Moreover the startingfrequencies of the putative selected SNPs differed mark-edly between the two environments suggesting that theyhad experienced selection within the natural progenitorpopulation prior to the start of the experiment Henceour study reveals that a potentially rich source of

00 02 04 06 08 10

00

02

04

06

08

10

Proportion SNPs

Pro

port

ion

com

mon

can

dida

tes

Treatment amp region

Hot AllHot low LDHot SICold AllCold low LDCold SI

FIG 4 Concordance between two independent groups of replicatesModified ROC curve showing the proportion of common candidateSNPs shared between two independent sets of derived populationsrelative to cumulatively larger sets of ranked SNPs (see Materials andMethods) for both hot (red lines) and cold (blue lines) treatmentsSeparate lines are plotted for full data set (~145 million SNPs All) aswell as for those SNPs only falling in regions of putatively low LD (ieregions outside of inversions and low recombining regions) or for shortintrons in these regions (SI) Lines above the black diagonal line indicatethat there are more common candidate SNPs than expected by chancesuch that our top candidates are more reproducible than those neu-trally evolving populations Because sites with low LD or in short intronswith low LD show a similar reduction in concordance relative to all sitesmuch of the excess concordance is probably due to changes in inver-sions and long-range LD between selected and neutral sites

368

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

FIG 5 Manhattan plots showing the genomic distribution of candidate SNPs The negative log10-transformed P values of all SNPs relative to genomicposition The top 2000 candidate SNPs for the hot (top panel) and cold (bottom panel) treatment are highlighted (see key) Candidates are categorizedas being unique to the hot (red) or cold (blue) treatments or common to both (green) Point size scales with significance except for common SNPswhich are shown at a standard size for emphasis Inset boxes indicate the position of inversions and a putative selected haplotype block on 3R (largedashes most proximal to centromere on 3R) The inversions are as follows In(2L)t In(2R)NS In(3L)P In(3R)Payne (short dashes) In(3R)C (dots) andIn(3R)Mo (solid line) The candidate SNPs typically appear as either isolated points or arranged in broad flattened humps and tend to be clustered on3R where three large inversions overlap These patterns further imply that inversions and long-range LD have generated many false positives in thestudy

02

46

810

minus4000 minus2000 0 2000 4000

02

46

810

Distance from candidate (bp)

minuslo

g 10(p

)

HotColdRandom200bp

FIG 6 Correlation between the top 2000 candidate and flanking SNPs The decay in LD (correlation) around candidates is measured as the rate ofdecline in neighboring (negative log10 transformed) P values to background levels Data are averaged over successive 50-bp bins flanking focal SNPs Solidblack lines indicate the bin means The red blue and green polygons around these lines indicate 95 confidence intervals for hot treatment coldtreatment and random (ie background) SNPs respectively In accordance to reported results from D melanogaster most of the LD is lost within200 bp either side of each focal SNP (indicated by dashed lines)

369

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

information lies in using the base population informationwhich has not been considered in most published EampR studiesto date

Despite strong evidence that differential thermal selectioncontributed to our top candidates in each treatment theshear number of putatively selected sites indicates that ourresults were confounded by a large number of false positivesGiven that short-range LD and strong drift in low recombin-ing regions were unlikely to be major contributors to theexcess of false positives in our study we hypothesized thatLD over longer distances was responsible for much of thisexcess Long-range LD in our experimental populations couldhave arisen as a byproduct of the founders being a smallsample of the much larger natural population In this casechance correlations between sites separated by vast distancesmay result from the finite sampling of alleles Long-range LDmay also originate within natural populations For instancestrong LD between loci separated by large distances has beenreported for Japanese D melanogaster populations (Takano-Shimizu et al 2004 Itoh et al 2010) Because the LD was muchstronger in the spring than in the autumn this finding wasattributed to exaggerated sampling effects during the previ-ous winter bottleneck although there was evidence thatrecurrent epistatic selection was also responsible for somelinked loci Finally the introgression of new haplotypesfrom migrant individuals is another natural process thatcould produce long-range LD Recent and ongoing admixture

has been reported for African (Pool and Aquadro 2006 Poolet al 2012) and North American (Caracristi and Schlotterer2003) D melanogaster populations suggesting that in prin-ciple this may also contribute to long-range LD in naturalpopulations

The systematic increase in frequency of alleles in the hottreatment that were broadly distributed across a 1 Mb regionof 3R demonstrates the effects of long-range LD in our exper-iment In combination with differences in the distribution ofthe starting allele frequencies long-range LD may also help toexplain the differences in the number of candidates and thelevel of concordance observed between the two treatmentsGiven that most candidates were rare in the hot treatmentand common in the cold treatmentmdashpresumably resultingfrom prior selection against (for) heat (cold) toleranceallelesmdashlong-range LD is expected to be more extensiveamong the hot candidates This is because rare haplotypesare expected to harbor many more private SNPs thancommon ones and all such sites will experience the samechange in allele frequency until these associations are brokendown by recombination The difference in Ne between thetwo treatments is also consistent with more extensive long--range LD producing more widespread changes in allele fre-quencies in the hot treatment

Segregating inversions are another probable source of falsepositives in our study The large decrease in concordance thatwas observed when inversions and regions with low

FIG 7 Simulated long-range LD generated by a singleton under strong selection Manhattan plots showing the distribution of the top 1000 candidateSNPs from three separate sets of simulated data (top to bottom panels) in which a singleton carried by a single chromosome-length haplotype is understrong selection (see Materials and Methods) The full chromosome is shown in the panels on the right with a zoomed-in view shown in the panels onthe left Significant SNPs occur vast distances from the selected site (dashed line) and tend to be several thousands of base pairs apart Thus it appearslikely that long-range LD between selected and neutral sites can readily inflate the number of false positives in an EampR study

370

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

recombination were removed from the data set supports thisidea (fig 4) Additional confirmation comes from a recentstudy which found that several large inversions were segre-gating in our populations over the course of the experimentincluding three partially overlapping inversions on chromo-some arm 3R (Kapun et al 2013) By tracing the trajectory ofalleles that were private to each inversion a single inversionwas found to increase across all three replicates in each treat-ment (In(3R)c by 25 in the hot and In(3R)mo by 15 in thecold) Indeed the majority of the top 2000 candidate SNPswas observed on chromosome arm 3R for both treatments(~80 for the hot and ~40 for the cold supplementarytable S3 Supplementary Material online) with many candi-date SNPs lying within these inversions Furthermore thegreatest concentration of cold candidates was situatedwithin the region where all three inversions overlapBecause recombination is likely to be suppressed in thisregion the local effects of LD and drift are expected to beexacerbated relative to the rest of the genome Thus changesin inversion frequencies are also likely to have substantiallyinflated the number of false positives in our study

Implications for Future EampR Research

Our results indicate that EampR studies with small founder pop-ulations strong selection and large segregating inversions arelikely to suffer from large numbers of false positives Thesefactors are likely to be present in other D melanogaster EampRstudiesmdashindeed several large inversions segregate at interme-diate frequencies in natural D melanogaster populations(Corbett-Detig and Hartl 2012 Kapun et al 2013)mdashmeaningthat our findings have important implications for future EampRresearch in Drosophila

First if seasonal selection is common in D melanogasterwhich accumulating evidence suggests may be the case(Stalker 1980 Takano-Shimizu et al 2004 Itoh et al 2010Bergland et al 2013) then the genetic composition of theexperimental populations will be dependent on their time ofcollection Evidence suggests that genes associated with stressresistance are highly pleiotropic in Drosophila (Rose et al1992 Hoffman et al 2001 Bubliy and Loeschcke 2005 Kolssand Kawecki 2008) implying that a broad spectrum of traitsmay be affected in this way

Second large numbers of false positives are expected inEampR studies that use small founder populations as a directresult of long-range LD between selected and neutral sitesThe effect of founder population size on the performance ofexperimental design has been explicitly addressed in a recenthaplotype-based simulation study (Kofler and Schlotterer2013) The authors demonstrate that both the power andspecificity of EampR studies is increased when larger numbersof founders are used with the gains in performance beingparticularly noticeable when the strength of selection per SNPis strong Another plausible way to reduce the impact of long-range LD is to create independent replicates from differentsets of founders Although this does not remove long-rangeLD per se it results in different associations across replicates

such that analyses that combine replicates should remove themajority of such effects

Finally our results indicate that inversions are likely to be amajor hindrance in detecting causal loci Large segregatinginversions appear to be ubiquitous in natural D melanogasterpopulations and recent work suggests that newly introducedinversions may influence genetic diversity across entire chro-mosome arms presumably by suppressing recombinationwell beyond inversion breakpoints (Corbett-Detig and Hartl2012 Pool et al 2012 Kapun et al 2013) Consequently anyimprovements in performance that might be achievedthrough the experimental design modifications suggestedabove are unlikely to be realized in chromosome arms carry-ing large inversions whereby inversion-free populations maybe necessary to ensure that such improvements are realizedover the entire genome Although inversion-free D melano-gaster populations can be obtained via karyotyping individ-uals prior to the study the time and effort expended in suchscreens may become prohibitive if large founding populationsare desired Another possibility is to use an alternativeDrosophila species for which segregating inversions are raresuch as D simulans or D mauritiana instead of D melano-gaster Although these species currently lack the detailedgenetic and genomic resources available for D melanogasterthe ongoing development of portable molecular tools hasmade functional validation of candidate loci in these speciestenable (Carroll 2011 Liu et al 2012 Yu et al 2013)

ConclusionPrevious D melanogaster EampR studies have not directly ad-dressed the role that inversions or long-range LD may havehad in generating the large numbers of reported candidateSNPs Our results imply that such estimates were probablyinflated by false positives resulting from these phenomenaConsequently Drosophila EampR study designs will need to bemodified before the true causal loci that contribute to adap-tation andor complex traits can be reliably identified Whilewe have outlined several possibilities here we also point outthat the study by Kofler and Schlotterer (2013) is intended asa general guideline for EampR study design There the authorsuse haplotype-based simulations to investigate EampR perfor-mance for wide range of experimental and evolutionary var-iables and thereby provide an invaluable reference for futureDrosophila EampR research

Materials and Methods

Experimental Populations and Evolution Regimes

Full details on the experimental setup are outlined in Orozco-terWengel et al (2012) Briefly wild D melanogaster werecollected in Northern Portugal in the summer of 2008 Thepopulation was maintained as 113 isofemale lines for fivegenerations in the laboratory prior to the creation of mass-bred populations Ten replicated mass-bred populations werecreated with each replicate combining five females from eachisofemale line This initial population of merged isofemalelines is referred to as the base population or F0 Five replicatepopulations were maintained in each of two novel

371

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

environments one hot and one cold treatment Each treat-ment cycled between two fixed temperature and light set-tings every 12 hmdashbetween 10 C and 20 C in the coldtreatment and 18 C and 28 C in the hot treatmentmdashwithlight and dark periods occurring with the hot and cold phaseof each treatment respectively The heatingcooling periodbetween the two fixed temperatures was relatively rapidtaking ~1 h before the new temperature is reached suchthat most time is spent at either of the modal temperaturesEach replicate was maintained at ~1000 individuals with~5050 sex ratios that were evenly distributed across five~225 ml bottles Each bottle contained ~70 ml of standardDrosophila medium Generations were nonoverlapping

Genome Sequencing and Mapping

Illumina full genome sequencing was performed on threereplicates for each of the base population and evolvedpopulations For the hot treatment a single replicate of the23rd generation and two from the 15th generation weresequenced (Orozco-terWengel et al 2012) whereas for thecold treatment all three sequenced replicates came from the15th generation Sequencing was performed using paired-endreads 75 bp in length in the hot treatment (GAIIx) and 100 bplong in the cold treatment (HiSeq) For each replicate andtime point ~500 flies were pooled and sequenced as onesample (ie Pool-Seq Futschik and Schlotterer 2010) Twoadditional replicates were also sequenced from the 15th gen-eration of the hot and cold treatments These were used tocheck the reproducibility of candidate detection as describedbelow Average genome-wide coverage ranged from ~35- to~90-fold for each of the sequenced replicates (supplementarytable S4 Supplementary Material online)

Mapping was carried out using a combination of purpose-built and third-party software which is described in detail inKofler Orozco-terWengel et al (2011) Briefly reads weretrimmed to remove low-quality bases and then mappedwith bwa (version 058c) (Li and Durbin 2009) against theD melanogaster reference genome (version 518) andWolbachia (NC_0029786) The alignment files were con-verted to the SAM format (Li et al 2009) with reads notmapped in proper pairs or with a mapping quality of lessthan 20 being removed SAM files were converted into thepileup format and indels and repeat sequences identified byRepeatMasker 329 (wwwrepeatmaskerorg last accessedNovember 5 2013) masked The aligned pileup files wereconverted to a synchronized file (Kofler Pandey et al 2011)from which SNPs and their accompanying allele frequencieswere derived All allele frequency changes presented hereinwere averaged across all three replicates (weighted by cover-age) We further removed all SNPs whose coverage was eitheramong the top 2 of a given replicate and time point or had aminimum count less than 30 across the 23 populations in ourextended data set that comprised additional replicates andtime points that were not analyzed here Because the averagecoverage per SNP across the extended data set was ~1500the minimum count of 30 equates to an average 2 minorallele frequency across all replicates and time points (although

allele frequencies could be lower in the time points that werespecifically analyzed in this study) Annotation was based onversion 540 of the D melanogaster reference genomeApproximately 145 million SNPs were called overall

Effective Population Size FDR andCandidate Estimation

In order to estimate Ne and FDR in both treatments forwardWrightndashFisher simulations were performed to generate ex-pected neutral changes during the experiment Three sets ofneutral allele frequency changes were simulated for eachtreatment matching the number of replicates in the studyFor each simulation 145 million SNPs were independentlysampled (ie approximating the number of SNPs in our realdata) from the distribution of allele frequencies expected after15 generations of neutral evolution conditional on the start-ing frequency for each SNP Starting frequencies for each SNPwere randomly drawn from the empirical distribution of startfrequencies in the experiment with the frequencies being thecoverage-weighted average across all replicates The expecteddistribution of allele frequencies after 15 generations was cre-ated by generating an NiNj transition matrix where eachcell is the binomial probability of seeing j alleles with a startingfrequency iN (ie N is the effective population size) and thenraising this matrix to the 15th power To simulate the effectsof DNA pooling in our study an additional round of binomialsampling was conducted on the start and end allele frequen-cies for each SNP To do this each SNP was assigned a randomcoverage value that was drawn from the empirical distribu-tion of coverages from a specific generation and treatmentThis was done separately for each replicate thereby capturingthe heterogeneity in coverage both within and betweenreplicates in the study

The effective population size of the observed data was es-timated as follows First multiple simulations were performedas outlined above each with a differing Ne Each simulationproduced a set of expected start and end allele frequencies fora given Ne from which an allele frequency change (AFC) dis-tribution was obtained (ie the end frequency minus startfrequency for all 145 million SNPs) The expected AFC distri-bution was then compared for goodness of fit with the ob-served AFC distribution by summing the squared difference ofthe AFC values evaluated in 1 frequency change binsBecause individual replicates were simulated we averagedthe goodness of fit across the three replicates to get a pointestimate for each Ne Additionally the error around this meanwas sufficiently large for both treatments that the best-fittingNe was not significantly different from many neighboring Ne

values Consequently we took the lowest Ne from amongstthis range (~250 supplementary table S1 SupplementaryMaterial online) as it is expected to provide the most conser-vative estimate of the FDR (because the strength of drift in-creases as Ne decreases) The Ne for individual chromosomeswas also inferred by limiting the observed data to that specificto each chromosome and then repeating the steps above

Following Ne estimation the number of putative selectedSNPs was inferred as follows First all ~145 million SNPs that

372

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

remained after filtering were subjected to the CMH test(Agresti 2002) The CMH test was shown to outperform sev-eral other statistics that were used for candidate inference inrecent Drosophila EampR studies (Kofler and Schlotterer 2013)Testing was conducted separately for the hot and cold treat-ments with each being contrasted against the base popula-tion Second the CMH test was applied to simulated SNPdata that was derived using the conservative Ne estimate of250 (discussed earlier) Finally the number of candidate SNPswas estimated by setting an empirical FDR of 0001 for eachtreatment Thus the number of candidates for a given treat-ment was quantified as the number of observed SNPs thathad CMH P values that were lower than the least significantCMH P value from among the 01 most extreme simulatedSNPs The number of candidate SNPs was also inferred for arange of different FDR values and the results are shown insupplementary table S1 Supplementary Material online

To examine the effects that linkage between loci had onthe candidate inference we also performed 30 forward neu-tral simulations starting from a population with 250 uniquehaplotypes (ie 250 homozygous diploid individuals) for 15generations Thus the number of founding haplotypes andthe effective population size during the simulations was 250Assuming that the isofemale founders maintained two segre-gating chromosomes per line on averagemdashwhich is notunreasonable considering that they were five generationsold when making the mass-bred populationsmdashthen thesenumbers roughly accord with those expected within ourexperimental populations The simulations were performedusing MimicrEE (Kofler and Schlotterer 2013) This softwareenabled the modeling of recombination at rates that werederived from empirical estimates (Fiston-Lavier et al 2010)Thirty simulations were performed and divided into 10 equal-sized subsets The three simulations from each of the 10subsets were then downsampled to match the coveragedistributions of the three empirical replicates from the rele-vant generation and treatment Next we applied the CMHtest to all sites in each subset and estimated the number ofcandidates thereof via the FDR method outlined in the pre-ceding paragraph The haplotypes used in the simulationswere created by neutral coalescent simulations and aredescribed in Bastide et al (2013)

Enrichment in Thermal Tolerance Genes

We tested for treatment-specific evolution by looking forcandidate enrichment in genes associated with heat andcold tolerance Thermal tolerance gene sets were takenfrom the Candidate Stress Genes in Drosophila databasewhich was obtained from the Centre for EnvironmentalStress and Adaptation Research website (httpcesarorgaulast accessed November 5 2013) Only genes with a signifi-cant and unique association with either heat or cold tol-erance were included in the analyses (supplementary tableS2 Supplementary Material online) Separate enrichmentanalyses were performed on the 2000 4000 6000 800010000 12000 14000 16000 18000 20000 25000 3000035000 40000 45000 50000 60000 70000 80000 90000

and 100000 most significant SNPs in each treatment Notethat each successive set of SNPs was an aggregation of pre-vious SNP sets and the subsequent set of most significantSNPs For instance the set that contained the top 4000 SNPscomprised the set containing the top 2000 SNPs along withthe next 2000 most significant SNPs All enrichment analyseswere performed using Gowinda software (Kofler andSchlotterer 2012) Gowinda generates P values by conductingmultiple simulations in which the position of candidate SNPsare randomly chosen from the ~145 million of SNPs scat-tered throughout the genome The P value was then quan-tified as the proportion of simulations that had more genesspecific to a given category (in our study genes associatedeither with hot or cold thermotolerance) containing at leastone candidate SNP than the observed data set (Kofler andSchlotterer 2012) In total we performed 100000 permuta-tions for each set of candidate SNPs in each treatment andfor each set of thermal tolerance genes In each case a genecontaining multiple candidate SNPs was only counted once(ie the gene option of the mode argument) Gene annota-tions based on version 540 of D melanogaster referencegenome and only the full coding region of each gene andnot the upstream or downstream flanking regions (ie thegene option of the gene-definition argument) wereconsidered Because the thermal genes were specificallytested for enrichment with the expectation that therewould be enrichment for hot alleles in heat tolerancegenes and cold alleles in cold-tolerance genes we report Pvalues rather q values adjusted for testing multiple categoriesThis software was specifically designed to deal with differ-ences in average gene size among categories and thereforedoes not result in the overrepresentation of categories withlarger than average gene sizes (Kofler and Schlotterer 2012)

Identification of Putative High LD Regions andShort Introns

LD can result in large number of false positives for instancevia allele frequency correlations between selected and neutrallinked sites Consequently the following regions were desig-nated as having putatively elevated LD and were removedfrom specific analyses as indicated in the main text Firstregions covered by inversions or within 500 kb either side ofthe inversion breakpoints Segregating inversions were inden-tified in our experimental evolution populations by Kapunet al (2013) Second a 1 Mb region located within the inver-sion-free part of 3R This region was found to harbor a largenumber of low-frequency alleles that showed a similar allelefrequency change a pattern consistent with selection upon asingle haplotype block (Orozco-terWengel et al 2012) Finallyregions with a local recombination rate less than 2 cMMbThe recombination rates were taken from Comeron et al(2012) Approximately 300000 of the original 145 millionSNPs remained following the removal of these regionsBecause 3R contained three segregating inversions and theputative selected haplotype block only 15000 SNPs remainedfollowing removal of these regions whereas other chromo-somes maintained at least 50000 SNPs

373

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Short introns coordinates were taken from Lawrie et al(2013) To further ensure that short introns sites were behav-ing neutrally we only took those that were located within theputative low LD regions identified above

ROC Curves and Reproducibility

To test the reproducibility of our top candidates two addi-tional replicates were sequenced from generation 15 in eachtreatment with all sites being processed through the samepipeline as the original candidates Because we did not haveadditional replicates from the base population two new basereplicates were created This was done by combining thereads from the separate base replicates into a single filefrom which two thirds of all reads were randomly sampled(without replacement) and split in half to generate two newfiles each having one-third of the combined base populationreads (~50-fold coverage) All SNPs from the two new sets ofreplicates were then subjected to CMH testing and comparedfor consistency with the original data set using ROC curvesFor the sake of consistency the ~145 million SNPs from theoriginal data set were used in the comparisons

We used a modified ROC curve analysis to provide agraphical representation of the consistency between theoriginal and new data sets that is those based on threereplicates and two replicates respectively The ROC curvesquantify the overlap between the two data sets across suc-cessively larger groups of SNPs with SNPs having beenranked by their CMH P value Points falling on the 11 diag-onal line indicate that overlap between the two sets of SNPsmatches chance expectations of reproducibility whereaspoints lying above the diagonal indicate excess reproducibil-ity Because we are interested in changes that were mostlikely to be caused by the deterministic process of selectionas opposed to the stochastic effects of genetic drift onlySNPs where the same allele was increasing across all fivereplicates were considered as being reproducible

The expected degree of reproducibility under neutralitywas quantified using neutral MimicrEE simulations (Koflerand Schlotterer 2013) All parameters used were identical tothe simulations used for the FDR estimates outlined aboveexcept that a total of 50 data sets were simulated in this caseThe simulated data sets were divided into 10 equally sizedsubsets with each subset being further split into a group ofthree replicates and a group of two replicates Followingdownsampling of the data sets to match our empirical cov-erage distributions (discussed earlier) for each set of simula-tions CMH tests were applied to each group of three and tworeplicates The resulting data were compared for reproduc-ibility using ROC curves as above

LD Analyses

The degree of LD around the top 2000 candidates was esti-mated for both treatments following the same methods out-lined in Orozco-terWengel et al (2012) Briefly we estimatedthe median P value for all SNPs flanking the top 2000 candi-dates in nonoverlapping 50-bp windows either side of thefocal candidates The decay in the P value around the focal

candidates provides a proxy for the strength and extent of LDin regions flanking putative selected sites To ascertain thebackground level of LD among similar SNPs 2000 more SNPswere randomly drawn from the genome after excluding allregions within 200 bp of the top 2000 candidates The num-bers of random SNPs drawn from each chromosome armmatched the distribution of the top 2000 candidates ineach treatment

We used MimicrEE simulations (Kofler and Schlotterer2013) to estimate the degree of long-range LD achievablegiven a simple scenario where strong selection acted on asingleton located in a single haplotype The selection strength(05) was based on increasing a single allele from 1 to 30over 15 generations (ie the mean allele frequency changeobserved for the top 2000 candidate SNPs in the hot treat-ment) This was derived using s = 2ln(p0q1p1q0)t where sis the selection coefficient p and q the biallelic frequencies atthe start (subscript 0) and end (subscript 1) generations andt is the change in generations (eq 33 in Charlesworth andCharlesworth [2010]) Other than the selected site all otherparameters were the same as the neutral simulations outlinedabove Nine simulations in which the singleton increased infrequency by 25ndash35 were included in the analyses Thesewere evenly split into three subsets with each subset beingdownsampled and submitted to CMH testing (in the samemanner as the neutral simulations discussed earlier) Theresulting data were then used to produce Manhattan plotsthat enabled visual assessment the degree of long-range LD(fig 7)

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to A Bergland and D Lawrie for shar-ing the coordinates for the short introns and Martin Kapun forsharing unpublished results RT wishes to thank his colleaguesat the Institut fur Populationsgenetik and the Petrov laboratoryat Stanford University for helpful discussions on data interpre-tation This work was supported by a PhD fellowship from theVetmeduni Vienna the Austrian Science Fund (FWF P19467-B11 W1225-B20) and the European Research Council (ERC)grant ldquoArchAdaptrdquo awarded to CS

ReferencesAgresti A 2002 Categorical data analysis New York John Wiley amp SonsAndolfatto P Wall JD 2003 Linkage disequilibrium patterns across a

recombination gradient in African Drosophila melanogasterGenetics 1651289ndash1305

Bastide H Betancourt A Nolte V Tobler R Stobe P Futschik A SchlottererC 2013 A genome-wide fine-scale map of natural pigmentationvariation in Drosophila melanogaster PLoS Genet 9e1003534

Bergelson J Roux F 2010 Towards identifying genes underlying ecologi-cally relevant traits in Arabidopsis thaliana Nat Rev Genet 11867ndash879

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2013Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila [cited 2013 Aug 26] Availablefrom httparxivorgabs13035044

374

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Page 4: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

treatment (fig 3) Given that the high-frequency alleles expe-rience very little allele frequency change before fixation thelack of predicted high-frequency alleles among the cold can-didates is likely to be a direct consequence of insufficientpower to detect such SNPs Notably this pattern was notrestricted to the top 2000 candidate SNPs but extends toa much larger fraction of loci (supplementary fig S1Supplementary Material online) This further reinforces theidea that a large number of SNPs are either directly selected orbehave similar to selected loci in each treatment

Reproducibility of Ranked SNPs

A testable consequence of directional selection is that tar-geted alleles are expected to increase in frequency across allreplicates whereas false positives should not replicate Henceto further distinguish between true positives and noiseamong our top candidates we sequenced two additionalF15 replicates in each treatment and quantified the level ofreproducibility between these and the original replicatesReproducibility was measured using a modified receiveroperating characteristic (ROC) curve which visualizes theexcess of shared SNPs of similar rank (rank being assignedby P value) between our original and new data sets (seeMaterials and Methods) For both temperature regimes wenoted a substantial concordance between the independentreplicates as indicated by the ROC curve being clearly abovethe diagonal (fig 4) In contrast neutrally evolving sites thatwere simulated to have parameters similar to our treatmentdata and which included recombination did not differ from

expectation (supplementary fig S2 Supplementary Materialonline see Materials and Methods) Nonetheless because theconcordance between the independent sets of replicatespeaked among the top 50000ndash100000 ranked SNPs in eachtreatment this analysis could not further distinguish trulyselected SNPs from false positives

The high concordance that we observed could haveresulted from strong LD between selected and linked neutralSNPs which is expected to produce correlated allele fre-quency changes across contiguous sites We tested thishypothesis by reanalyzing the data after removing low recom-bining regions and genomic regions known to contain segre-gating cosmopolitan inversions (see Materials and Methods)The resulting ROC curve exhibited a substantially reducedsignal of concordance between the two independent sets ofreplicates but was still greater than expected by chance(fig 4) Next we repeated the analysis using only short intro-nic sites (without splicing signals) within the putative low LDregions Evidence suggests that short introns are unlikely tocarry selectively favored alleles in D melanogaster (Halliganand Keightley 2006 Parsch et al 2010 Clemente and Vogl2012) whereby these regions are good proxies for neutral loci(but see Farlow et al 2012) Although SNPs in short intronswere not expected to show much consistency across thegroups the signal of concordance was similar to that of theputative low LD regions (fig 4) Consequently this patternindicates that much of the selection signal could be generatedfrom linkage between neutral sites and the true targets ofselection

00

020

050

080

11

00

010

030

05

00

030

060

090

12

00 02 04 06 08 10

00

010

020

03

00 02 04 06 08 10

00

010

020

030

04

00 02 04 06 08 10

00

030

070

10

13Fre

quen

cy

Start End Increase

FIG 3 Allele frequency distribution of candidate SNPs Distribution of the starting (left panels) end (middle panels) and increase (right panels) in allelefrequencies for the top 2000 candidates SNPs in the hot (top row) and cold (bottom row) treatment Alleles are polarized by the rising allele inthe derived treatment such that plots display the frequencies of the putative selected alleles The dashed gray line indicates the distribution of all~145 million SNPs and the diamonds on the x axis indicate the mean frequency for candidate (redblue) and all (gray) SNPs

367

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Effects of Long-Range LD on False Positives

In genome-wide association studies (GWAS) linkage betweenselected and neighboring neutral sites results in a character-istic pattern observed in Manhattan plots namely a series ofadjacent sites with significant P values that are arranged into apeak Such patterns were not readily recognizable in eithertreatment in our study however Rather the top candidatesoccurred as either isolated SNPs or broad flattened humps(fig 5) Furthermore analyses of genomic regions flankingcandidate SNPs showed that LD had a limited rangebeyond the putative selected site (fig 6) This pattern is con-sistent with the low levels of LD observed in D melanogasterpopulations (Miyashita and Langley 1988 Andolfatto andWall 2003 Mackay et al 2012) and has been previously inter-preted as evidence against linkage as a possible cause for thelarge number of candidate SNPs in Drosophila EampR studies(Orozco-terWengel et al 2012) To reconcile the anomaly ofthousands of putative selected SNPs being detected despitean apparent lack of LD in our study we propose that associ-ations beyond 200 bpmdashwhich is widely regarded the limit ofLD in D melanogaster (Mackay et al 2012)mdashmay be preva-lent within our experimental populations

Support for this idea comes from two separate findingsFirst the excess concordance observed for short introns out-lined above is still evident after exclusion of intronic siteslocated in the same genes that contained one or more can-didate SNPs Thus although excluding these intronic sites didreduce concordance to some extent the excess concordancewas still visible even after removing genes containing one ormore of the top 100000 candidate SNPs (supplementaryfig S3 Supplementary Material online) Given that short in-trons are not expected to contain selected sites (Parsch et al2010 Clemente and Vogl 2012) we suggest that linkagebetween sites in short introns and selected sites in regionsother than the enclosing gene are responsible for producingthis pattern

Second a good albeit potentially extreme example oflong-range LD is provided by a 1 Mb region located on aninversion-free part of chromosome 3R that was first describedby Orozco-terWengel et al (2012) Among our genome-widetop 2000 hot candidates 281 were located in this regionwhich represents a massive enrichment relative to its sizeThese sites increased in frequency by ~27 over the first 15generations on average and 33 of the 24K SNPs in thisregion had the same rising allele across all five replicates inthe hot treatment (cf 23 across the rest of the genomeP value lt22e16 using Fisherrsquos exact test) Furthermore alarge number of putative selected alleles in this region hadstarting frequencies lt1 (118 SNPs) and these sites dis-played highly homogeneous increases in frequency duringthe study (supplementary fig S4 Supplementary Materialonline) Thus although it is evident from the data that thisis a low-frequency haplotype that increased across all repli-cates in the hot treatment 83 of the putative candidates inthis region were separated by distances greater than 200 bpwith an average distance of 5570 bp between sites Becausesuch a broad spacing between candidate sites is well beyondthe typical range of LD in D melanogaster we suggest thatlong-range LD is responsible for these patterns and themajority of false positives more generally

To evaluate to what extent long-range LD could beobserved in a typical EampR study we simulated strong selectionon a single allele private to a rare haplotype using allele fre-quency changes and parameters similar to those in the hottreatment (see Materials and Methods) We found thatneighboring candidate SNPs were often separated by severalthousand base pairs or more and could be millions of basepairs from the selected site (fig 7) These results clearly indi-cate that long-range LD can be readily generated under asimple but plausible scenario for EampR studies

DiscussionExposure of D melanogaster to novel hot or cold environ-ments resulted in selection of alleles in thermal tolerancegenes specific to each environment Moreover the startingfrequencies of the putative selected SNPs differed mark-edly between the two environments suggesting that theyhad experienced selection within the natural progenitorpopulation prior to the start of the experiment Henceour study reveals that a potentially rich source of

00 02 04 06 08 10

00

02

04

06

08

10

Proportion SNPs

Pro

port

ion

com

mon

can

dida

tes

Treatment amp region

Hot AllHot low LDHot SICold AllCold low LDCold SI

FIG 4 Concordance between two independent groups of replicatesModified ROC curve showing the proportion of common candidateSNPs shared between two independent sets of derived populationsrelative to cumulatively larger sets of ranked SNPs (see Materials andMethods) for both hot (red lines) and cold (blue lines) treatmentsSeparate lines are plotted for full data set (~145 million SNPs All) aswell as for those SNPs only falling in regions of putatively low LD (ieregions outside of inversions and low recombining regions) or for shortintrons in these regions (SI) Lines above the black diagonal line indicatethat there are more common candidate SNPs than expected by chancesuch that our top candidates are more reproducible than those neu-trally evolving populations Because sites with low LD or in short intronswith low LD show a similar reduction in concordance relative to all sitesmuch of the excess concordance is probably due to changes in inver-sions and long-range LD between selected and neutral sites

368

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

FIG 5 Manhattan plots showing the genomic distribution of candidate SNPs The negative log10-transformed P values of all SNPs relative to genomicposition The top 2000 candidate SNPs for the hot (top panel) and cold (bottom panel) treatment are highlighted (see key) Candidates are categorizedas being unique to the hot (red) or cold (blue) treatments or common to both (green) Point size scales with significance except for common SNPswhich are shown at a standard size for emphasis Inset boxes indicate the position of inversions and a putative selected haplotype block on 3R (largedashes most proximal to centromere on 3R) The inversions are as follows In(2L)t In(2R)NS In(3L)P In(3R)Payne (short dashes) In(3R)C (dots) andIn(3R)Mo (solid line) The candidate SNPs typically appear as either isolated points or arranged in broad flattened humps and tend to be clustered on3R where three large inversions overlap These patterns further imply that inversions and long-range LD have generated many false positives in thestudy

02

46

810

minus4000 minus2000 0 2000 4000

02

46

810

Distance from candidate (bp)

minuslo

g 10(p

)

HotColdRandom200bp

FIG 6 Correlation between the top 2000 candidate and flanking SNPs The decay in LD (correlation) around candidates is measured as the rate ofdecline in neighboring (negative log10 transformed) P values to background levels Data are averaged over successive 50-bp bins flanking focal SNPs Solidblack lines indicate the bin means The red blue and green polygons around these lines indicate 95 confidence intervals for hot treatment coldtreatment and random (ie background) SNPs respectively In accordance to reported results from D melanogaster most of the LD is lost within200 bp either side of each focal SNP (indicated by dashed lines)

369

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

information lies in using the base population informationwhich has not been considered in most published EampR studiesto date

Despite strong evidence that differential thermal selectioncontributed to our top candidates in each treatment theshear number of putatively selected sites indicates that ourresults were confounded by a large number of false positivesGiven that short-range LD and strong drift in low recombin-ing regions were unlikely to be major contributors to theexcess of false positives in our study we hypothesized thatLD over longer distances was responsible for much of thisexcess Long-range LD in our experimental populations couldhave arisen as a byproduct of the founders being a smallsample of the much larger natural population In this casechance correlations between sites separated by vast distancesmay result from the finite sampling of alleles Long-range LDmay also originate within natural populations For instancestrong LD between loci separated by large distances has beenreported for Japanese D melanogaster populations (Takano-Shimizu et al 2004 Itoh et al 2010) Because the LD was muchstronger in the spring than in the autumn this finding wasattributed to exaggerated sampling effects during the previ-ous winter bottleneck although there was evidence thatrecurrent epistatic selection was also responsible for somelinked loci Finally the introgression of new haplotypesfrom migrant individuals is another natural process thatcould produce long-range LD Recent and ongoing admixture

has been reported for African (Pool and Aquadro 2006 Poolet al 2012) and North American (Caracristi and Schlotterer2003) D melanogaster populations suggesting that in prin-ciple this may also contribute to long-range LD in naturalpopulations

The systematic increase in frequency of alleles in the hottreatment that were broadly distributed across a 1 Mb regionof 3R demonstrates the effects of long-range LD in our exper-iment In combination with differences in the distribution ofthe starting allele frequencies long-range LD may also help toexplain the differences in the number of candidates and thelevel of concordance observed between the two treatmentsGiven that most candidates were rare in the hot treatmentand common in the cold treatmentmdashpresumably resultingfrom prior selection against (for) heat (cold) toleranceallelesmdashlong-range LD is expected to be more extensiveamong the hot candidates This is because rare haplotypesare expected to harbor many more private SNPs thancommon ones and all such sites will experience the samechange in allele frequency until these associations are brokendown by recombination The difference in Ne between thetwo treatments is also consistent with more extensive long--range LD producing more widespread changes in allele fre-quencies in the hot treatment

Segregating inversions are another probable source of falsepositives in our study The large decrease in concordance thatwas observed when inversions and regions with low

FIG 7 Simulated long-range LD generated by a singleton under strong selection Manhattan plots showing the distribution of the top 1000 candidateSNPs from three separate sets of simulated data (top to bottom panels) in which a singleton carried by a single chromosome-length haplotype is understrong selection (see Materials and Methods) The full chromosome is shown in the panels on the right with a zoomed-in view shown in the panels onthe left Significant SNPs occur vast distances from the selected site (dashed line) and tend to be several thousands of base pairs apart Thus it appearslikely that long-range LD between selected and neutral sites can readily inflate the number of false positives in an EampR study

370

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

recombination were removed from the data set supports thisidea (fig 4) Additional confirmation comes from a recentstudy which found that several large inversions were segre-gating in our populations over the course of the experimentincluding three partially overlapping inversions on chromo-some arm 3R (Kapun et al 2013) By tracing the trajectory ofalleles that were private to each inversion a single inversionwas found to increase across all three replicates in each treat-ment (In(3R)c by 25 in the hot and In(3R)mo by 15 in thecold) Indeed the majority of the top 2000 candidate SNPswas observed on chromosome arm 3R for both treatments(~80 for the hot and ~40 for the cold supplementarytable S3 Supplementary Material online) with many candi-date SNPs lying within these inversions Furthermore thegreatest concentration of cold candidates was situatedwithin the region where all three inversions overlapBecause recombination is likely to be suppressed in thisregion the local effects of LD and drift are expected to beexacerbated relative to the rest of the genome Thus changesin inversion frequencies are also likely to have substantiallyinflated the number of false positives in our study

Implications for Future EampR Research

Our results indicate that EampR studies with small founder pop-ulations strong selection and large segregating inversions arelikely to suffer from large numbers of false positives Thesefactors are likely to be present in other D melanogaster EampRstudiesmdashindeed several large inversions segregate at interme-diate frequencies in natural D melanogaster populations(Corbett-Detig and Hartl 2012 Kapun et al 2013)mdashmeaningthat our findings have important implications for future EampRresearch in Drosophila

First if seasonal selection is common in D melanogasterwhich accumulating evidence suggests may be the case(Stalker 1980 Takano-Shimizu et al 2004 Itoh et al 2010Bergland et al 2013) then the genetic composition of theexperimental populations will be dependent on their time ofcollection Evidence suggests that genes associated with stressresistance are highly pleiotropic in Drosophila (Rose et al1992 Hoffman et al 2001 Bubliy and Loeschcke 2005 Kolssand Kawecki 2008) implying that a broad spectrum of traitsmay be affected in this way

Second large numbers of false positives are expected inEampR studies that use small founder populations as a directresult of long-range LD between selected and neutral sitesThe effect of founder population size on the performance ofexperimental design has been explicitly addressed in a recenthaplotype-based simulation study (Kofler and Schlotterer2013) The authors demonstrate that both the power andspecificity of EampR studies is increased when larger numbersof founders are used with the gains in performance beingparticularly noticeable when the strength of selection per SNPis strong Another plausible way to reduce the impact of long-range LD is to create independent replicates from differentsets of founders Although this does not remove long-rangeLD per se it results in different associations across replicates

such that analyses that combine replicates should remove themajority of such effects

Finally our results indicate that inversions are likely to be amajor hindrance in detecting causal loci Large segregatinginversions appear to be ubiquitous in natural D melanogasterpopulations and recent work suggests that newly introducedinversions may influence genetic diversity across entire chro-mosome arms presumably by suppressing recombinationwell beyond inversion breakpoints (Corbett-Detig and Hartl2012 Pool et al 2012 Kapun et al 2013) Consequently anyimprovements in performance that might be achievedthrough the experimental design modifications suggestedabove are unlikely to be realized in chromosome arms carry-ing large inversions whereby inversion-free populations maybe necessary to ensure that such improvements are realizedover the entire genome Although inversion-free D melano-gaster populations can be obtained via karyotyping individ-uals prior to the study the time and effort expended in suchscreens may become prohibitive if large founding populationsare desired Another possibility is to use an alternativeDrosophila species for which segregating inversions are raresuch as D simulans or D mauritiana instead of D melano-gaster Although these species currently lack the detailedgenetic and genomic resources available for D melanogasterthe ongoing development of portable molecular tools hasmade functional validation of candidate loci in these speciestenable (Carroll 2011 Liu et al 2012 Yu et al 2013)

ConclusionPrevious D melanogaster EampR studies have not directly ad-dressed the role that inversions or long-range LD may havehad in generating the large numbers of reported candidateSNPs Our results imply that such estimates were probablyinflated by false positives resulting from these phenomenaConsequently Drosophila EampR study designs will need to bemodified before the true causal loci that contribute to adap-tation andor complex traits can be reliably identified Whilewe have outlined several possibilities here we also point outthat the study by Kofler and Schlotterer (2013) is intended asa general guideline for EampR study design There the authorsuse haplotype-based simulations to investigate EampR perfor-mance for wide range of experimental and evolutionary var-iables and thereby provide an invaluable reference for futureDrosophila EampR research

Materials and Methods

Experimental Populations and Evolution Regimes

Full details on the experimental setup are outlined in Orozco-terWengel et al (2012) Briefly wild D melanogaster werecollected in Northern Portugal in the summer of 2008 Thepopulation was maintained as 113 isofemale lines for fivegenerations in the laboratory prior to the creation of mass-bred populations Ten replicated mass-bred populations werecreated with each replicate combining five females from eachisofemale line This initial population of merged isofemalelines is referred to as the base population or F0 Five replicatepopulations were maintained in each of two novel

371

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

environments one hot and one cold treatment Each treat-ment cycled between two fixed temperature and light set-tings every 12 hmdashbetween 10 C and 20 C in the coldtreatment and 18 C and 28 C in the hot treatmentmdashwithlight and dark periods occurring with the hot and cold phaseof each treatment respectively The heatingcooling periodbetween the two fixed temperatures was relatively rapidtaking ~1 h before the new temperature is reached suchthat most time is spent at either of the modal temperaturesEach replicate was maintained at ~1000 individuals with~5050 sex ratios that were evenly distributed across five~225 ml bottles Each bottle contained ~70 ml of standardDrosophila medium Generations were nonoverlapping

Genome Sequencing and Mapping

Illumina full genome sequencing was performed on threereplicates for each of the base population and evolvedpopulations For the hot treatment a single replicate of the23rd generation and two from the 15th generation weresequenced (Orozco-terWengel et al 2012) whereas for thecold treatment all three sequenced replicates came from the15th generation Sequencing was performed using paired-endreads 75 bp in length in the hot treatment (GAIIx) and 100 bplong in the cold treatment (HiSeq) For each replicate andtime point ~500 flies were pooled and sequenced as onesample (ie Pool-Seq Futschik and Schlotterer 2010) Twoadditional replicates were also sequenced from the 15th gen-eration of the hot and cold treatments These were used tocheck the reproducibility of candidate detection as describedbelow Average genome-wide coverage ranged from ~35- to~90-fold for each of the sequenced replicates (supplementarytable S4 Supplementary Material online)

Mapping was carried out using a combination of purpose-built and third-party software which is described in detail inKofler Orozco-terWengel et al (2011) Briefly reads weretrimmed to remove low-quality bases and then mappedwith bwa (version 058c) (Li and Durbin 2009) against theD melanogaster reference genome (version 518) andWolbachia (NC_0029786) The alignment files were con-verted to the SAM format (Li et al 2009) with reads notmapped in proper pairs or with a mapping quality of lessthan 20 being removed SAM files were converted into thepileup format and indels and repeat sequences identified byRepeatMasker 329 (wwwrepeatmaskerorg last accessedNovember 5 2013) masked The aligned pileup files wereconverted to a synchronized file (Kofler Pandey et al 2011)from which SNPs and their accompanying allele frequencieswere derived All allele frequency changes presented hereinwere averaged across all three replicates (weighted by cover-age) We further removed all SNPs whose coverage was eitheramong the top 2 of a given replicate and time point or had aminimum count less than 30 across the 23 populations in ourextended data set that comprised additional replicates andtime points that were not analyzed here Because the averagecoverage per SNP across the extended data set was ~1500the minimum count of 30 equates to an average 2 minorallele frequency across all replicates and time points (although

allele frequencies could be lower in the time points that werespecifically analyzed in this study) Annotation was based onversion 540 of the D melanogaster reference genomeApproximately 145 million SNPs were called overall

Effective Population Size FDR andCandidate Estimation

In order to estimate Ne and FDR in both treatments forwardWrightndashFisher simulations were performed to generate ex-pected neutral changes during the experiment Three sets ofneutral allele frequency changes were simulated for eachtreatment matching the number of replicates in the studyFor each simulation 145 million SNPs were independentlysampled (ie approximating the number of SNPs in our realdata) from the distribution of allele frequencies expected after15 generations of neutral evolution conditional on the start-ing frequency for each SNP Starting frequencies for each SNPwere randomly drawn from the empirical distribution of startfrequencies in the experiment with the frequencies being thecoverage-weighted average across all replicates The expecteddistribution of allele frequencies after 15 generations was cre-ated by generating an NiNj transition matrix where eachcell is the binomial probability of seeing j alleles with a startingfrequency iN (ie N is the effective population size) and thenraising this matrix to the 15th power To simulate the effectsof DNA pooling in our study an additional round of binomialsampling was conducted on the start and end allele frequen-cies for each SNP To do this each SNP was assigned a randomcoverage value that was drawn from the empirical distribu-tion of coverages from a specific generation and treatmentThis was done separately for each replicate thereby capturingthe heterogeneity in coverage both within and betweenreplicates in the study

The effective population size of the observed data was es-timated as follows First multiple simulations were performedas outlined above each with a differing Ne Each simulationproduced a set of expected start and end allele frequencies fora given Ne from which an allele frequency change (AFC) dis-tribution was obtained (ie the end frequency minus startfrequency for all 145 million SNPs) The expected AFC distri-bution was then compared for goodness of fit with the ob-served AFC distribution by summing the squared difference ofthe AFC values evaluated in 1 frequency change binsBecause individual replicates were simulated we averagedthe goodness of fit across the three replicates to get a pointestimate for each Ne Additionally the error around this meanwas sufficiently large for both treatments that the best-fittingNe was not significantly different from many neighboring Ne

values Consequently we took the lowest Ne from amongstthis range (~250 supplementary table S1 SupplementaryMaterial online) as it is expected to provide the most conser-vative estimate of the FDR (because the strength of drift in-creases as Ne decreases) The Ne for individual chromosomeswas also inferred by limiting the observed data to that specificto each chromosome and then repeating the steps above

Following Ne estimation the number of putative selectedSNPs was inferred as follows First all ~145 million SNPs that

372

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

remained after filtering were subjected to the CMH test(Agresti 2002) The CMH test was shown to outperform sev-eral other statistics that were used for candidate inference inrecent Drosophila EampR studies (Kofler and Schlotterer 2013)Testing was conducted separately for the hot and cold treat-ments with each being contrasted against the base popula-tion Second the CMH test was applied to simulated SNPdata that was derived using the conservative Ne estimate of250 (discussed earlier) Finally the number of candidate SNPswas estimated by setting an empirical FDR of 0001 for eachtreatment Thus the number of candidates for a given treat-ment was quantified as the number of observed SNPs thathad CMH P values that were lower than the least significantCMH P value from among the 01 most extreme simulatedSNPs The number of candidate SNPs was also inferred for arange of different FDR values and the results are shown insupplementary table S1 Supplementary Material online

To examine the effects that linkage between loci had onthe candidate inference we also performed 30 forward neu-tral simulations starting from a population with 250 uniquehaplotypes (ie 250 homozygous diploid individuals) for 15generations Thus the number of founding haplotypes andthe effective population size during the simulations was 250Assuming that the isofemale founders maintained two segre-gating chromosomes per line on averagemdashwhich is notunreasonable considering that they were five generationsold when making the mass-bred populationsmdashthen thesenumbers roughly accord with those expected within ourexperimental populations The simulations were performedusing MimicrEE (Kofler and Schlotterer 2013) This softwareenabled the modeling of recombination at rates that werederived from empirical estimates (Fiston-Lavier et al 2010)Thirty simulations were performed and divided into 10 equal-sized subsets The three simulations from each of the 10subsets were then downsampled to match the coveragedistributions of the three empirical replicates from the rele-vant generation and treatment Next we applied the CMHtest to all sites in each subset and estimated the number ofcandidates thereof via the FDR method outlined in the pre-ceding paragraph The haplotypes used in the simulationswere created by neutral coalescent simulations and aredescribed in Bastide et al (2013)

Enrichment in Thermal Tolerance Genes

We tested for treatment-specific evolution by looking forcandidate enrichment in genes associated with heat andcold tolerance Thermal tolerance gene sets were takenfrom the Candidate Stress Genes in Drosophila databasewhich was obtained from the Centre for EnvironmentalStress and Adaptation Research website (httpcesarorgaulast accessed November 5 2013) Only genes with a signifi-cant and unique association with either heat or cold tol-erance were included in the analyses (supplementary tableS2 Supplementary Material online) Separate enrichmentanalyses were performed on the 2000 4000 6000 800010000 12000 14000 16000 18000 20000 25000 3000035000 40000 45000 50000 60000 70000 80000 90000

and 100000 most significant SNPs in each treatment Notethat each successive set of SNPs was an aggregation of pre-vious SNP sets and the subsequent set of most significantSNPs For instance the set that contained the top 4000 SNPscomprised the set containing the top 2000 SNPs along withthe next 2000 most significant SNPs All enrichment analyseswere performed using Gowinda software (Kofler andSchlotterer 2012) Gowinda generates P values by conductingmultiple simulations in which the position of candidate SNPsare randomly chosen from the ~145 million of SNPs scat-tered throughout the genome The P value was then quan-tified as the proportion of simulations that had more genesspecific to a given category (in our study genes associatedeither with hot or cold thermotolerance) containing at leastone candidate SNP than the observed data set (Kofler andSchlotterer 2012) In total we performed 100000 permuta-tions for each set of candidate SNPs in each treatment andfor each set of thermal tolerance genes In each case a genecontaining multiple candidate SNPs was only counted once(ie the gene option of the mode argument) Gene annota-tions based on version 540 of D melanogaster referencegenome and only the full coding region of each gene andnot the upstream or downstream flanking regions (ie thegene option of the gene-definition argument) wereconsidered Because the thermal genes were specificallytested for enrichment with the expectation that therewould be enrichment for hot alleles in heat tolerancegenes and cold alleles in cold-tolerance genes we report Pvalues rather q values adjusted for testing multiple categoriesThis software was specifically designed to deal with differ-ences in average gene size among categories and thereforedoes not result in the overrepresentation of categories withlarger than average gene sizes (Kofler and Schlotterer 2012)

Identification of Putative High LD Regions andShort Introns

LD can result in large number of false positives for instancevia allele frequency correlations between selected and neutrallinked sites Consequently the following regions were desig-nated as having putatively elevated LD and were removedfrom specific analyses as indicated in the main text Firstregions covered by inversions or within 500 kb either side ofthe inversion breakpoints Segregating inversions were inden-tified in our experimental evolution populations by Kapunet al (2013) Second a 1 Mb region located within the inver-sion-free part of 3R This region was found to harbor a largenumber of low-frequency alleles that showed a similar allelefrequency change a pattern consistent with selection upon asingle haplotype block (Orozco-terWengel et al 2012) Finallyregions with a local recombination rate less than 2 cMMbThe recombination rates were taken from Comeron et al(2012) Approximately 300000 of the original 145 millionSNPs remained following the removal of these regionsBecause 3R contained three segregating inversions and theputative selected haplotype block only 15000 SNPs remainedfollowing removal of these regions whereas other chromo-somes maintained at least 50000 SNPs

373

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Short introns coordinates were taken from Lawrie et al(2013) To further ensure that short introns sites were behav-ing neutrally we only took those that were located within theputative low LD regions identified above

ROC Curves and Reproducibility

To test the reproducibility of our top candidates two addi-tional replicates were sequenced from generation 15 in eachtreatment with all sites being processed through the samepipeline as the original candidates Because we did not haveadditional replicates from the base population two new basereplicates were created This was done by combining thereads from the separate base replicates into a single filefrom which two thirds of all reads were randomly sampled(without replacement) and split in half to generate two newfiles each having one-third of the combined base populationreads (~50-fold coverage) All SNPs from the two new sets ofreplicates were then subjected to CMH testing and comparedfor consistency with the original data set using ROC curvesFor the sake of consistency the ~145 million SNPs from theoriginal data set were used in the comparisons

We used a modified ROC curve analysis to provide agraphical representation of the consistency between theoriginal and new data sets that is those based on threereplicates and two replicates respectively The ROC curvesquantify the overlap between the two data sets across suc-cessively larger groups of SNPs with SNPs having beenranked by their CMH P value Points falling on the 11 diag-onal line indicate that overlap between the two sets of SNPsmatches chance expectations of reproducibility whereaspoints lying above the diagonal indicate excess reproducibil-ity Because we are interested in changes that were mostlikely to be caused by the deterministic process of selectionas opposed to the stochastic effects of genetic drift onlySNPs where the same allele was increasing across all fivereplicates were considered as being reproducible

The expected degree of reproducibility under neutralitywas quantified using neutral MimicrEE simulations (Koflerand Schlotterer 2013) All parameters used were identical tothe simulations used for the FDR estimates outlined aboveexcept that a total of 50 data sets were simulated in this caseThe simulated data sets were divided into 10 equally sizedsubsets with each subset being further split into a group ofthree replicates and a group of two replicates Followingdownsampling of the data sets to match our empirical cov-erage distributions (discussed earlier) for each set of simula-tions CMH tests were applied to each group of three and tworeplicates The resulting data were compared for reproduc-ibility using ROC curves as above

LD Analyses

The degree of LD around the top 2000 candidates was esti-mated for both treatments following the same methods out-lined in Orozco-terWengel et al (2012) Briefly we estimatedthe median P value for all SNPs flanking the top 2000 candi-dates in nonoverlapping 50-bp windows either side of thefocal candidates The decay in the P value around the focal

candidates provides a proxy for the strength and extent of LDin regions flanking putative selected sites To ascertain thebackground level of LD among similar SNPs 2000 more SNPswere randomly drawn from the genome after excluding allregions within 200 bp of the top 2000 candidates The num-bers of random SNPs drawn from each chromosome armmatched the distribution of the top 2000 candidates ineach treatment

We used MimicrEE simulations (Kofler and Schlotterer2013) to estimate the degree of long-range LD achievablegiven a simple scenario where strong selection acted on asingleton located in a single haplotype The selection strength(05) was based on increasing a single allele from 1 to 30over 15 generations (ie the mean allele frequency changeobserved for the top 2000 candidate SNPs in the hot treat-ment) This was derived using s = 2ln(p0q1p1q0)t where sis the selection coefficient p and q the biallelic frequencies atthe start (subscript 0) and end (subscript 1) generations andt is the change in generations (eq 33 in Charlesworth andCharlesworth [2010]) Other than the selected site all otherparameters were the same as the neutral simulations outlinedabove Nine simulations in which the singleton increased infrequency by 25ndash35 were included in the analyses Thesewere evenly split into three subsets with each subset beingdownsampled and submitted to CMH testing (in the samemanner as the neutral simulations discussed earlier) Theresulting data were then used to produce Manhattan plotsthat enabled visual assessment the degree of long-range LD(fig 7)

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to A Bergland and D Lawrie for shar-ing the coordinates for the short introns and Martin Kapun forsharing unpublished results RT wishes to thank his colleaguesat the Institut fur Populationsgenetik and the Petrov laboratoryat Stanford University for helpful discussions on data interpre-tation This work was supported by a PhD fellowship from theVetmeduni Vienna the Austrian Science Fund (FWF P19467-B11 W1225-B20) and the European Research Council (ERC)grant ldquoArchAdaptrdquo awarded to CS

ReferencesAgresti A 2002 Categorical data analysis New York John Wiley amp SonsAndolfatto P Wall JD 2003 Linkage disequilibrium patterns across a

recombination gradient in African Drosophila melanogasterGenetics 1651289ndash1305

Bastide H Betancourt A Nolte V Tobler R Stobe P Futschik A SchlottererC 2013 A genome-wide fine-scale map of natural pigmentationvariation in Drosophila melanogaster PLoS Genet 9e1003534

Bergelson J Roux F 2010 Towards identifying genes underlying ecologi-cally relevant traits in Arabidopsis thaliana Nat Rev Genet 11867ndash879

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2013Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila [cited 2013 Aug 26] Availablefrom httparxivorgabs13035044

374

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Page 5: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

Effects of Long-Range LD on False Positives

In genome-wide association studies (GWAS) linkage betweenselected and neighboring neutral sites results in a character-istic pattern observed in Manhattan plots namely a series ofadjacent sites with significant P values that are arranged into apeak Such patterns were not readily recognizable in eithertreatment in our study however Rather the top candidatesoccurred as either isolated SNPs or broad flattened humps(fig 5) Furthermore analyses of genomic regions flankingcandidate SNPs showed that LD had a limited rangebeyond the putative selected site (fig 6) This pattern is con-sistent with the low levels of LD observed in D melanogasterpopulations (Miyashita and Langley 1988 Andolfatto andWall 2003 Mackay et al 2012) and has been previously inter-preted as evidence against linkage as a possible cause for thelarge number of candidate SNPs in Drosophila EampR studies(Orozco-terWengel et al 2012) To reconcile the anomaly ofthousands of putative selected SNPs being detected despitean apparent lack of LD in our study we propose that associ-ations beyond 200 bpmdashwhich is widely regarded the limit ofLD in D melanogaster (Mackay et al 2012)mdashmay be preva-lent within our experimental populations

Support for this idea comes from two separate findingsFirst the excess concordance observed for short introns out-lined above is still evident after exclusion of intronic siteslocated in the same genes that contained one or more can-didate SNPs Thus although excluding these intronic sites didreduce concordance to some extent the excess concordancewas still visible even after removing genes containing one ormore of the top 100000 candidate SNPs (supplementaryfig S3 Supplementary Material online) Given that short in-trons are not expected to contain selected sites (Parsch et al2010 Clemente and Vogl 2012) we suggest that linkagebetween sites in short introns and selected sites in regionsother than the enclosing gene are responsible for producingthis pattern

Second a good albeit potentially extreme example oflong-range LD is provided by a 1 Mb region located on aninversion-free part of chromosome 3R that was first describedby Orozco-terWengel et al (2012) Among our genome-widetop 2000 hot candidates 281 were located in this regionwhich represents a massive enrichment relative to its sizeThese sites increased in frequency by ~27 over the first 15generations on average and 33 of the 24K SNPs in thisregion had the same rising allele across all five replicates inthe hot treatment (cf 23 across the rest of the genomeP value lt22e16 using Fisherrsquos exact test) Furthermore alarge number of putative selected alleles in this region hadstarting frequencies lt1 (118 SNPs) and these sites dis-played highly homogeneous increases in frequency duringthe study (supplementary fig S4 Supplementary Materialonline) Thus although it is evident from the data that thisis a low-frequency haplotype that increased across all repli-cates in the hot treatment 83 of the putative candidates inthis region were separated by distances greater than 200 bpwith an average distance of 5570 bp between sites Becausesuch a broad spacing between candidate sites is well beyondthe typical range of LD in D melanogaster we suggest thatlong-range LD is responsible for these patterns and themajority of false positives more generally

To evaluate to what extent long-range LD could beobserved in a typical EampR study we simulated strong selectionon a single allele private to a rare haplotype using allele fre-quency changes and parameters similar to those in the hottreatment (see Materials and Methods) We found thatneighboring candidate SNPs were often separated by severalthousand base pairs or more and could be millions of basepairs from the selected site (fig 7) These results clearly indi-cate that long-range LD can be readily generated under asimple but plausible scenario for EampR studies

DiscussionExposure of D melanogaster to novel hot or cold environ-ments resulted in selection of alleles in thermal tolerancegenes specific to each environment Moreover the startingfrequencies of the putative selected SNPs differed mark-edly between the two environments suggesting that theyhad experienced selection within the natural progenitorpopulation prior to the start of the experiment Henceour study reveals that a potentially rich source of

00 02 04 06 08 10

00

02

04

06

08

10

Proportion SNPs

Pro

port

ion

com

mon

can

dida

tes

Treatment amp region

Hot AllHot low LDHot SICold AllCold low LDCold SI

FIG 4 Concordance between two independent groups of replicatesModified ROC curve showing the proportion of common candidateSNPs shared between two independent sets of derived populationsrelative to cumulatively larger sets of ranked SNPs (see Materials andMethods) for both hot (red lines) and cold (blue lines) treatmentsSeparate lines are plotted for full data set (~145 million SNPs All) aswell as for those SNPs only falling in regions of putatively low LD (ieregions outside of inversions and low recombining regions) or for shortintrons in these regions (SI) Lines above the black diagonal line indicatethat there are more common candidate SNPs than expected by chancesuch that our top candidates are more reproducible than those neu-trally evolving populations Because sites with low LD or in short intronswith low LD show a similar reduction in concordance relative to all sitesmuch of the excess concordance is probably due to changes in inver-sions and long-range LD between selected and neutral sites

368

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

FIG 5 Manhattan plots showing the genomic distribution of candidate SNPs The negative log10-transformed P values of all SNPs relative to genomicposition The top 2000 candidate SNPs for the hot (top panel) and cold (bottom panel) treatment are highlighted (see key) Candidates are categorizedas being unique to the hot (red) or cold (blue) treatments or common to both (green) Point size scales with significance except for common SNPswhich are shown at a standard size for emphasis Inset boxes indicate the position of inversions and a putative selected haplotype block on 3R (largedashes most proximal to centromere on 3R) The inversions are as follows In(2L)t In(2R)NS In(3L)P In(3R)Payne (short dashes) In(3R)C (dots) andIn(3R)Mo (solid line) The candidate SNPs typically appear as either isolated points or arranged in broad flattened humps and tend to be clustered on3R where three large inversions overlap These patterns further imply that inversions and long-range LD have generated many false positives in thestudy

02

46

810

minus4000 minus2000 0 2000 4000

02

46

810

Distance from candidate (bp)

minuslo

g 10(p

)

HotColdRandom200bp

FIG 6 Correlation between the top 2000 candidate and flanking SNPs The decay in LD (correlation) around candidates is measured as the rate ofdecline in neighboring (negative log10 transformed) P values to background levels Data are averaged over successive 50-bp bins flanking focal SNPs Solidblack lines indicate the bin means The red blue and green polygons around these lines indicate 95 confidence intervals for hot treatment coldtreatment and random (ie background) SNPs respectively In accordance to reported results from D melanogaster most of the LD is lost within200 bp either side of each focal SNP (indicated by dashed lines)

369

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

information lies in using the base population informationwhich has not been considered in most published EampR studiesto date

Despite strong evidence that differential thermal selectioncontributed to our top candidates in each treatment theshear number of putatively selected sites indicates that ourresults were confounded by a large number of false positivesGiven that short-range LD and strong drift in low recombin-ing regions were unlikely to be major contributors to theexcess of false positives in our study we hypothesized thatLD over longer distances was responsible for much of thisexcess Long-range LD in our experimental populations couldhave arisen as a byproduct of the founders being a smallsample of the much larger natural population In this casechance correlations between sites separated by vast distancesmay result from the finite sampling of alleles Long-range LDmay also originate within natural populations For instancestrong LD between loci separated by large distances has beenreported for Japanese D melanogaster populations (Takano-Shimizu et al 2004 Itoh et al 2010) Because the LD was muchstronger in the spring than in the autumn this finding wasattributed to exaggerated sampling effects during the previ-ous winter bottleneck although there was evidence thatrecurrent epistatic selection was also responsible for somelinked loci Finally the introgression of new haplotypesfrom migrant individuals is another natural process thatcould produce long-range LD Recent and ongoing admixture

has been reported for African (Pool and Aquadro 2006 Poolet al 2012) and North American (Caracristi and Schlotterer2003) D melanogaster populations suggesting that in prin-ciple this may also contribute to long-range LD in naturalpopulations

The systematic increase in frequency of alleles in the hottreatment that were broadly distributed across a 1 Mb regionof 3R demonstrates the effects of long-range LD in our exper-iment In combination with differences in the distribution ofthe starting allele frequencies long-range LD may also help toexplain the differences in the number of candidates and thelevel of concordance observed between the two treatmentsGiven that most candidates were rare in the hot treatmentand common in the cold treatmentmdashpresumably resultingfrom prior selection against (for) heat (cold) toleranceallelesmdashlong-range LD is expected to be more extensiveamong the hot candidates This is because rare haplotypesare expected to harbor many more private SNPs thancommon ones and all such sites will experience the samechange in allele frequency until these associations are brokendown by recombination The difference in Ne between thetwo treatments is also consistent with more extensive long--range LD producing more widespread changes in allele fre-quencies in the hot treatment

Segregating inversions are another probable source of falsepositives in our study The large decrease in concordance thatwas observed when inversions and regions with low

FIG 7 Simulated long-range LD generated by a singleton under strong selection Manhattan plots showing the distribution of the top 1000 candidateSNPs from three separate sets of simulated data (top to bottom panels) in which a singleton carried by a single chromosome-length haplotype is understrong selection (see Materials and Methods) The full chromosome is shown in the panels on the right with a zoomed-in view shown in the panels onthe left Significant SNPs occur vast distances from the selected site (dashed line) and tend to be several thousands of base pairs apart Thus it appearslikely that long-range LD between selected and neutral sites can readily inflate the number of false positives in an EampR study

370

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

recombination were removed from the data set supports thisidea (fig 4) Additional confirmation comes from a recentstudy which found that several large inversions were segre-gating in our populations over the course of the experimentincluding three partially overlapping inversions on chromo-some arm 3R (Kapun et al 2013) By tracing the trajectory ofalleles that were private to each inversion a single inversionwas found to increase across all three replicates in each treat-ment (In(3R)c by 25 in the hot and In(3R)mo by 15 in thecold) Indeed the majority of the top 2000 candidate SNPswas observed on chromosome arm 3R for both treatments(~80 for the hot and ~40 for the cold supplementarytable S3 Supplementary Material online) with many candi-date SNPs lying within these inversions Furthermore thegreatest concentration of cold candidates was situatedwithin the region where all three inversions overlapBecause recombination is likely to be suppressed in thisregion the local effects of LD and drift are expected to beexacerbated relative to the rest of the genome Thus changesin inversion frequencies are also likely to have substantiallyinflated the number of false positives in our study

Implications for Future EampR Research

Our results indicate that EampR studies with small founder pop-ulations strong selection and large segregating inversions arelikely to suffer from large numbers of false positives Thesefactors are likely to be present in other D melanogaster EampRstudiesmdashindeed several large inversions segregate at interme-diate frequencies in natural D melanogaster populations(Corbett-Detig and Hartl 2012 Kapun et al 2013)mdashmeaningthat our findings have important implications for future EampRresearch in Drosophila

First if seasonal selection is common in D melanogasterwhich accumulating evidence suggests may be the case(Stalker 1980 Takano-Shimizu et al 2004 Itoh et al 2010Bergland et al 2013) then the genetic composition of theexperimental populations will be dependent on their time ofcollection Evidence suggests that genes associated with stressresistance are highly pleiotropic in Drosophila (Rose et al1992 Hoffman et al 2001 Bubliy and Loeschcke 2005 Kolssand Kawecki 2008) implying that a broad spectrum of traitsmay be affected in this way

Second large numbers of false positives are expected inEampR studies that use small founder populations as a directresult of long-range LD between selected and neutral sitesThe effect of founder population size on the performance ofexperimental design has been explicitly addressed in a recenthaplotype-based simulation study (Kofler and Schlotterer2013) The authors demonstrate that both the power andspecificity of EampR studies is increased when larger numbersof founders are used with the gains in performance beingparticularly noticeable when the strength of selection per SNPis strong Another plausible way to reduce the impact of long-range LD is to create independent replicates from differentsets of founders Although this does not remove long-rangeLD per se it results in different associations across replicates

such that analyses that combine replicates should remove themajority of such effects

Finally our results indicate that inversions are likely to be amajor hindrance in detecting causal loci Large segregatinginversions appear to be ubiquitous in natural D melanogasterpopulations and recent work suggests that newly introducedinversions may influence genetic diversity across entire chro-mosome arms presumably by suppressing recombinationwell beyond inversion breakpoints (Corbett-Detig and Hartl2012 Pool et al 2012 Kapun et al 2013) Consequently anyimprovements in performance that might be achievedthrough the experimental design modifications suggestedabove are unlikely to be realized in chromosome arms carry-ing large inversions whereby inversion-free populations maybe necessary to ensure that such improvements are realizedover the entire genome Although inversion-free D melano-gaster populations can be obtained via karyotyping individ-uals prior to the study the time and effort expended in suchscreens may become prohibitive if large founding populationsare desired Another possibility is to use an alternativeDrosophila species for which segregating inversions are raresuch as D simulans or D mauritiana instead of D melano-gaster Although these species currently lack the detailedgenetic and genomic resources available for D melanogasterthe ongoing development of portable molecular tools hasmade functional validation of candidate loci in these speciestenable (Carroll 2011 Liu et al 2012 Yu et al 2013)

ConclusionPrevious D melanogaster EampR studies have not directly ad-dressed the role that inversions or long-range LD may havehad in generating the large numbers of reported candidateSNPs Our results imply that such estimates were probablyinflated by false positives resulting from these phenomenaConsequently Drosophila EampR study designs will need to bemodified before the true causal loci that contribute to adap-tation andor complex traits can be reliably identified Whilewe have outlined several possibilities here we also point outthat the study by Kofler and Schlotterer (2013) is intended asa general guideline for EampR study design There the authorsuse haplotype-based simulations to investigate EampR perfor-mance for wide range of experimental and evolutionary var-iables and thereby provide an invaluable reference for futureDrosophila EampR research

Materials and Methods

Experimental Populations and Evolution Regimes

Full details on the experimental setup are outlined in Orozco-terWengel et al (2012) Briefly wild D melanogaster werecollected in Northern Portugal in the summer of 2008 Thepopulation was maintained as 113 isofemale lines for fivegenerations in the laboratory prior to the creation of mass-bred populations Ten replicated mass-bred populations werecreated with each replicate combining five females from eachisofemale line This initial population of merged isofemalelines is referred to as the base population or F0 Five replicatepopulations were maintained in each of two novel

371

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

environments one hot and one cold treatment Each treat-ment cycled between two fixed temperature and light set-tings every 12 hmdashbetween 10 C and 20 C in the coldtreatment and 18 C and 28 C in the hot treatmentmdashwithlight and dark periods occurring with the hot and cold phaseof each treatment respectively The heatingcooling periodbetween the two fixed temperatures was relatively rapidtaking ~1 h before the new temperature is reached suchthat most time is spent at either of the modal temperaturesEach replicate was maintained at ~1000 individuals with~5050 sex ratios that were evenly distributed across five~225 ml bottles Each bottle contained ~70 ml of standardDrosophila medium Generations were nonoverlapping

Genome Sequencing and Mapping

Illumina full genome sequencing was performed on threereplicates for each of the base population and evolvedpopulations For the hot treatment a single replicate of the23rd generation and two from the 15th generation weresequenced (Orozco-terWengel et al 2012) whereas for thecold treatment all three sequenced replicates came from the15th generation Sequencing was performed using paired-endreads 75 bp in length in the hot treatment (GAIIx) and 100 bplong in the cold treatment (HiSeq) For each replicate andtime point ~500 flies were pooled and sequenced as onesample (ie Pool-Seq Futschik and Schlotterer 2010) Twoadditional replicates were also sequenced from the 15th gen-eration of the hot and cold treatments These were used tocheck the reproducibility of candidate detection as describedbelow Average genome-wide coverage ranged from ~35- to~90-fold for each of the sequenced replicates (supplementarytable S4 Supplementary Material online)

Mapping was carried out using a combination of purpose-built and third-party software which is described in detail inKofler Orozco-terWengel et al (2011) Briefly reads weretrimmed to remove low-quality bases and then mappedwith bwa (version 058c) (Li and Durbin 2009) against theD melanogaster reference genome (version 518) andWolbachia (NC_0029786) The alignment files were con-verted to the SAM format (Li et al 2009) with reads notmapped in proper pairs or with a mapping quality of lessthan 20 being removed SAM files were converted into thepileup format and indels and repeat sequences identified byRepeatMasker 329 (wwwrepeatmaskerorg last accessedNovember 5 2013) masked The aligned pileup files wereconverted to a synchronized file (Kofler Pandey et al 2011)from which SNPs and their accompanying allele frequencieswere derived All allele frequency changes presented hereinwere averaged across all three replicates (weighted by cover-age) We further removed all SNPs whose coverage was eitheramong the top 2 of a given replicate and time point or had aminimum count less than 30 across the 23 populations in ourextended data set that comprised additional replicates andtime points that were not analyzed here Because the averagecoverage per SNP across the extended data set was ~1500the minimum count of 30 equates to an average 2 minorallele frequency across all replicates and time points (although

allele frequencies could be lower in the time points that werespecifically analyzed in this study) Annotation was based onversion 540 of the D melanogaster reference genomeApproximately 145 million SNPs were called overall

Effective Population Size FDR andCandidate Estimation

In order to estimate Ne and FDR in both treatments forwardWrightndashFisher simulations were performed to generate ex-pected neutral changes during the experiment Three sets ofneutral allele frequency changes were simulated for eachtreatment matching the number of replicates in the studyFor each simulation 145 million SNPs were independentlysampled (ie approximating the number of SNPs in our realdata) from the distribution of allele frequencies expected after15 generations of neutral evolution conditional on the start-ing frequency for each SNP Starting frequencies for each SNPwere randomly drawn from the empirical distribution of startfrequencies in the experiment with the frequencies being thecoverage-weighted average across all replicates The expecteddistribution of allele frequencies after 15 generations was cre-ated by generating an NiNj transition matrix where eachcell is the binomial probability of seeing j alleles with a startingfrequency iN (ie N is the effective population size) and thenraising this matrix to the 15th power To simulate the effectsof DNA pooling in our study an additional round of binomialsampling was conducted on the start and end allele frequen-cies for each SNP To do this each SNP was assigned a randomcoverage value that was drawn from the empirical distribu-tion of coverages from a specific generation and treatmentThis was done separately for each replicate thereby capturingthe heterogeneity in coverage both within and betweenreplicates in the study

The effective population size of the observed data was es-timated as follows First multiple simulations were performedas outlined above each with a differing Ne Each simulationproduced a set of expected start and end allele frequencies fora given Ne from which an allele frequency change (AFC) dis-tribution was obtained (ie the end frequency minus startfrequency for all 145 million SNPs) The expected AFC distri-bution was then compared for goodness of fit with the ob-served AFC distribution by summing the squared difference ofthe AFC values evaluated in 1 frequency change binsBecause individual replicates were simulated we averagedthe goodness of fit across the three replicates to get a pointestimate for each Ne Additionally the error around this meanwas sufficiently large for both treatments that the best-fittingNe was not significantly different from many neighboring Ne

values Consequently we took the lowest Ne from amongstthis range (~250 supplementary table S1 SupplementaryMaterial online) as it is expected to provide the most conser-vative estimate of the FDR (because the strength of drift in-creases as Ne decreases) The Ne for individual chromosomeswas also inferred by limiting the observed data to that specificto each chromosome and then repeating the steps above

Following Ne estimation the number of putative selectedSNPs was inferred as follows First all ~145 million SNPs that

372

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

remained after filtering were subjected to the CMH test(Agresti 2002) The CMH test was shown to outperform sev-eral other statistics that were used for candidate inference inrecent Drosophila EampR studies (Kofler and Schlotterer 2013)Testing was conducted separately for the hot and cold treat-ments with each being contrasted against the base popula-tion Second the CMH test was applied to simulated SNPdata that was derived using the conservative Ne estimate of250 (discussed earlier) Finally the number of candidate SNPswas estimated by setting an empirical FDR of 0001 for eachtreatment Thus the number of candidates for a given treat-ment was quantified as the number of observed SNPs thathad CMH P values that were lower than the least significantCMH P value from among the 01 most extreme simulatedSNPs The number of candidate SNPs was also inferred for arange of different FDR values and the results are shown insupplementary table S1 Supplementary Material online

To examine the effects that linkage between loci had onthe candidate inference we also performed 30 forward neu-tral simulations starting from a population with 250 uniquehaplotypes (ie 250 homozygous diploid individuals) for 15generations Thus the number of founding haplotypes andthe effective population size during the simulations was 250Assuming that the isofemale founders maintained two segre-gating chromosomes per line on averagemdashwhich is notunreasonable considering that they were five generationsold when making the mass-bred populationsmdashthen thesenumbers roughly accord with those expected within ourexperimental populations The simulations were performedusing MimicrEE (Kofler and Schlotterer 2013) This softwareenabled the modeling of recombination at rates that werederived from empirical estimates (Fiston-Lavier et al 2010)Thirty simulations were performed and divided into 10 equal-sized subsets The three simulations from each of the 10subsets were then downsampled to match the coveragedistributions of the three empirical replicates from the rele-vant generation and treatment Next we applied the CMHtest to all sites in each subset and estimated the number ofcandidates thereof via the FDR method outlined in the pre-ceding paragraph The haplotypes used in the simulationswere created by neutral coalescent simulations and aredescribed in Bastide et al (2013)

Enrichment in Thermal Tolerance Genes

We tested for treatment-specific evolution by looking forcandidate enrichment in genes associated with heat andcold tolerance Thermal tolerance gene sets were takenfrom the Candidate Stress Genes in Drosophila databasewhich was obtained from the Centre for EnvironmentalStress and Adaptation Research website (httpcesarorgaulast accessed November 5 2013) Only genes with a signifi-cant and unique association with either heat or cold tol-erance were included in the analyses (supplementary tableS2 Supplementary Material online) Separate enrichmentanalyses were performed on the 2000 4000 6000 800010000 12000 14000 16000 18000 20000 25000 3000035000 40000 45000 50000 60000 70000 80000 90000

and 100000 most significant SNPs in each treatment Notethat each successive set of SNPs was an aggregation of pre-vious SNP sets and the subsequent set of most significantSNPs For instance the set that contained the top 4000 SNPscomprised the set containing the top 2000 SNPs along withthe next 2000 most significant SNPs All enrichment analyseswere performed using Gowinda software (Kofler andSchlotterer 2012) Gowinda generates P values by conductingmultiple simulations in which the position of candidate SNPsare randomly chosen from the ~145 million of SNPs scat-tered throughout the genome The P value was then quan-tified as the proportion of simulations that had more genesspecific to a given category (in our study genes associatedeither with hot or cold thermotolerance) containing at leastone candidate SNP than the observed data set (Kofler andSchlotterer 2012) In total we performed 100000 permuta-tions for each set of candidate SNPs in each treatment andfor each set of thermal tolerance genes In each case a genecontaining multiple candidate SNPs was only counted once(ie the gene option of the mode argument) Gene annota-tions based on version 540 of D melanogaster referencegenome and only the full coding region of each gene andnot the upstream or downstream flanking regions (ie thegene option of the gene-definition argument) wereconsidered Because the thermal genes were specificallytested for enrichment with the expectation that therewould be enrichment for hot alleles in heat tolerancegenes and cold alleles in cold-tolerance genes we report Pvalues rather q values adjusted for testing multiple categoriesThis software was specifically designed to deal with differ-ences in average gene size among categories and thereforedoes not result in the overrepresentation of categories withlarger than average gene sizes (Kofler and Schlotterer 2012)

Identification of Putative High LD Regions andShort Introns

LD can result in large number of false positives for instancevia allele frequency correlations between selected and neutrallinked sites Consequently the following regions were desig-nated as having putatively elevated LD and were removedfrom specific analyses as indicated in the main text Firstregions covered by inversions or within 500 kb either side ofthe inversion breakpoints Segregating inversions were inden-tified in our experimental evolution populations by Kapunet al (2013) Second a 1 Mb region located within the inver-sion-free part of 3R This region was found to harbor a largenumber of low-frequency alleles that showed a similar allelefrequency change a pattern consistent with selection upon asingle haplotype block (Orozco-terWengel et al 2012) Finallyregions with a local recombination rate less than 2 cMMbThe recombination rates were taken from Comeron et al(2012) Approximately 300000 of the original 145 millionSNPs remained following the removal of these regionsBecause 3R contained three segregating inversions and theputative selected haplotype block only 15000 SNPs remainedfollowing removal of these regions whereas other chromo-somes maintained at least 50000 SNPs

373

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Short introns coordinates were taken from Lawrie et al(2013) To further ensure that short introns sites were behav-ing neutrally we only took those that were located within theputative low LD regions identified above

ROC Curves and Reproducibility

To test the reproducibility of our top candidates two addi-tional replicates were sequenced from generation 15 in eachtreatment with all sites being processed through the samepipeline as the original candidates Because we did not haveadditional replicates from the base population two new basereplicates were created This was done by combining thereads from the separate base replicates into a single filefrom which two thirds of all reads were randomly sampled(without replacement) and split in half to generate two newfiles each having one-third of the combined base populationreads (~50-fold coverage) All SNPs from the two new sets ofreplicates were then subjected to CMH testing and comparedfor consistency with the original data set using ROC curvesFor the sake of consistency the ~145 million SNPs from theoriginal data set were used in the comparisons

We used a modified ROC curve analysis to provide agraphical representation of the consistency between theoriginal and new data sets that is those based on threereplicates and two replicates respectively The ROC curvesquantify the overlap between the two data sets across suc-cessively larger groups of SNPs with SNPs having beenranked by their CMH P value Points falling on the 11 diag-onal line indicate that overlap between the two sets of SNPsmatches chance expectations of reproducibility whereaspoints lying above the diagonal indicate excess reproducibil-ity Because we are interested in changes that were mostlikely to be caused by the deterministic process of selectionas opposed to the stochastic effects of genetic drift onlySNPs where the same allele was increasing across all fivereplicates were considered as being reproducible

The expected degree of reproducibility under neutralitywas quantified using neutral MimicrEE simulations (Koflerand Schlotterer 2013) All parameters used were identical tothe simulations used for the FDR estimates outlined aboveexcept that a total of 50 data sets were simulated in this caseThe simulated data sets were divided into 10 equally sizedsubsets with each subset being further split into a group ofthree replicates and a group of two replicates Followingdownsampling of the data sets to match our empirical cov-erage distributions (discussed earlier) for each set of simula-tions CMH tests were applied to each group of three and tworeplicates The resulting data were compared for reproduc-ibility using ROC curves as above

LD Analyses

The degree of LD around the top 2000 candidates was esti-mated for both treatments following the same methods out-lined in Orozco-terWengel et al (2012) Briefly we estimatedthe median P value for all SNPs flanking the top 2000 candi-dates in nonoverlapping 50-bp windows either side of thefocal candidates The decay in the P value around the focal

candidates provides a proxy for the strength and extent of LDin regions flanking putative selected sites To ascertain thebackground level of LD among similar SNPs 2000 more SNPswere randomly drawn from the genome after excluding allregions within 200 bp of the top 2000 candidates The num-bers of random SNPs drawn from each chromosome armmatched the distribution of the top 2000 candidates ineach treatment

We used MimicrEE simulations (Kofler and Schlotterer2013) to estimate the degree of long-range LD achievablegiven a simple scenario where strong selection acted on asingleton located in a single haplotype The selection strength(05) was based on increasing a single allele from 1 to 30over 15 generations (ie the mean allele frequency changeobserved for the top 2000 candidate SNPs in the hot treat-ment) This was derived using s = 2ln(p0q1p1q0)t where sis the selection coefficient p and q the biallelic frequencies atthe start (subscript 0) and end (subscript 1) generations andt is the change in generations (eq 33 in Charlesworth andCharlesworth [2010]) Other than the selected site all otherparameters were the same as the neutral simulations outlinedabove Nine simulations in which the singleton increased infrequency by 25ndash35 were included in the analyses Thesewere evenly split into three subsets with each subset beingdownsampled and submitted to CMH testing (in the samemanner as the neutral simulations discussed earlier) Theresulting data were then used to produce Manhattan plotsthat enabled visual assessment the degree of long-range LD(fig 7)

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to A Bergland and D Lawrie for shar-ing the coordinates for the short introns and Martin Kapun forsharing unpublished results RT wishes to thank his colleaguesat the Institut fur Populationsgenetik and the Petrov laboratoryat Stanford University for helpful discussions on data interpre-tation This work was supported by a PhD fellowship from theVetmeduni Vienna the Austrian Science Fund (FWF P19467-B11 W1225-B20) and the European Research Council (ERC)grant ldquoArchAdaptrdquo awarded to CS

ReferencesAgresti A 2002 Categorical data analysis New York John Wiley amp SonsAndolfatto P Wall JD 2003 Linkage disequilibrium patterns across a

recombination gradient in African Drosophila melanogasterGenetics 1651289ndash1305

Bastide H Betancourt A Nolte V Tobler R Stobe P Futschik A SchlottererC 2013 A genome-wide fine-scale map of natural pigmentationvariation in Drosophila melanogaster PLoS Genet 9e1003534

Bergelson J Roux F 2010 Towards identifying genes underlying ecologi-cally relevant traits in Arabidopsis thaliana Nat Rev Genet 11867ndash879

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2013Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila [cited 2013 Aug 26] Availablefrom httparxivorgabs13035044

374

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Page 6: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

FIG 5 Manhattan plots showing the genomic distribution of candidate SNPs The negative log10-transformed P values of all SNPs relative to genomicposition The top 2000 candidate SNPs for the hot (top panel) and cold (bottom panel) treatment are highlighted (see key) Candidates are categorizedas being unique to the hot (red) or cold (blue) treatments or common to both (green) Point size scales with significance except for common SNPswhich are shown at a standard size for emphasis Inset boxes indicate the position of inversions and a putative selected haplotype block on 3R (largedashes most proximal to centromere on 3R) The inversions are as follows In(2L)t In(2R)NS In(3L)P In(3R)Payne (short dashes) In(3R)C (dots) andIn(3R)Mo (solid line) The candidate SNPs typically appear as either isolated points or arranged in broad flattened humps and tend to be clustered on3R where three large inversions overlap These patterns further imply that inversions and long-range LD have generated many false positives in thestudy

02

46

810

minus4000 minus2000 0 2000 4000

02

46

810

Distance from candidate (bp)

minuslo

g 10(p

)

HotColdRandom200bp

FIG 6 Correlation between the top 2000 candidate and flanking SNPs The decay in LD (correlation) around candidates is measured as the rate ofdecline in neighboring (negative log10 transformed) P values to background levels Data are averaged over successive 50-bp bins flanking focal SNPs Solidblack lines indicate the bin means The red blue and green polygons around these lines indicate 95 confidence intervals for hot treatment coldtreatment and random (ie background) SNPs respectively In accordance to reported results from D melanogaster most of the LD is lost within200 bp either side of each focal SNP (indicated by dashed lines)

369

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

information lies in using the base population informationwhich has not been considered in most published EampR studiesto date

Despite strong evidence that differential thermal selectioncontributed to our top candidates in each treatment theshear number of putatively selected sites indicates that ourresults were confounded by a large number of false positivesGiven that short-range LD and strong drift in low recombin-ing regions were unlikely to be major contributors to theexcess of false positives in our study we hypothesized thatLD over longer distances was responsible for much of thisexcess Long-range LD in our experimental populations couldhave arisen as a byproduct of the founders being a smallsample of the much larger natural population In this casechance correlations between sites separated by vast distancesmay result from the finite sampling of alleles Long-range LDmay also originate within natural populations For instancestrong LD between loci separated by large distances has beenreported for Japanese D melanogaster populations (Takano-Shimizu et al 2004 Itoh et al 2010) Because the LD was muchstronger in the spring than in the autumn this finding wasattributed to exaggerated sampling effects during the previ-ous winter bottleneck although there was evidence thatrecurrent epistatic selection was also responsible for somelinked loci Finally the introgression of new haplotypesfrom migrant individuals is another natural process thatcould produce long-range LD Recent and ongoing admixture

has been reported for African (Pool and Aquadro 2006 Poolet al 2012) and North American (Caracristi and Schlotterer2003) D melanogaster populations suggesting that in prin-ciple this may also contribute to long-range LD in naturalpopulations

The systematic increase in frequency of alleles in the hottreatment that were broadly distributed across a 1 Mb regionof 3R demonstrates the effects of long-range LD in our exper-iment In combination with differences in the distribution ofthe starting allele frequencies long-range LD may also help toexplain the differences in the number of candidates and thelevel of concordance observed between the two treatmentsGiven that most candidates were rare in the hot treatmentand common in the cold treatmentmdashpresumably resultingfrom prior selection against (for) heat (cold) toleranceallelesmdashlong-range LD is expected to be more extensiveamong the hot candidates This is because rare haplotypesare expected to harbor many more private SNPs thancommon ones and all such sites will experience the samechange in allele frequency until these associations are brokendown by recombination The difference in Ne between thetwo treatments is also consistent with more extensive long--range LD producing more widespread changes in allele fre-quencies in the hot treatment

Segregating inversions are another probable source of falsepositives in our study The large decrease in concordance thatwas observed when inversions and regions with low

FIG 7 Simulated long-range LD generated by a singleton under strong selection Manhattan plots showing the distribution of the top 1000 candidateSNPs from three separate sets of simulated data (top to bottom panels) in which a singleton carried by a single chromosome-length haplotype is understrong selection (see Materials and Methods) The full chromosome is shown in the panels on the right with a zoomed-in view shown in the panels onthe left Significant SNPs occur vast distances from the selected site (dashed line) and tend to be several thousands of base pairs apart Thus it appearslikely that long-range LD between selected and neutral sites can readily inflate the number of false positives in an EampR study

370

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

recombination were removed from the data set supports thisidea (fig 4) Additional confirmation comes from a recentstudy which found that several large inversions were segre-gating in our populations over the course of the experimentincluding three partially overlapping inversions on chromo-some arm 3R (Kapun et al 2013) By tracing the trajectory ofalleles that were private to each inversion a single inversionwas found to increase across all three replicates in each treat-ment (In(3R)c by 25 in the hot and In(3R)mo by 15 in thecold) Indeed the majority of the top 2000 candidate SNPswas observed on chromosome arm 3R for both treatments(~80 for the hot and ~40 for the cold supplementarytable S3 Supplementary Material online) with many candi-date SNPs lying within these inversions Furthermore thegreatest concentration of cold candidates was situatedwithin the region where all three inversions overlapBecause recombination is likely to be suppressed in thisregion the local effects of LD and drift are expected to beexacerbated relative to the rest of the genome Thus changesin inversion frequencies are also likely to have substantiallyinflated the number of false positives in our study

Implications for Future EampR Research

Our results indicate that EampR studies with small founder pop-ulations strong selection and large segregating inversions arelikely to suffer from large numbers of false positives Thesefactors are likely to be present in other D melanogaster EampRstudiesmdashindeed several large inversions segregate at interme-diate frequencies in natural D melanogaster populations(Corbett-Detig and Hartl 2012 Kapun et al 2013)mdashmeaningthat our findings have important implications for future EampRresearch in Drosophila

First if seasonal selection is common in D melanogasterwhich accumulating evidence suggests may be the case(Stalker 1980 Takano-Shimizu et al 2004 Itoh et al 2010Bergland et al 2013) then the genetic composition of theexperimental populations will be dependent on their time ofcollection Evidence suggests that genes associated with stressresistance are highly pleiotropic in Drosophila (Rose et al1992 Hoffman et al 2001 Bubliy and Loeschcke 2005 Kolssand Kawecki 2008) implying that a broad spectrum of traitsmay be affected in this way

Second large numbers of false positives are expected inEampR studies that use small founder populations as a directresult of long-range LD between selected and neutral sitesThe effect of founder population size on the performance ofexperimental design has been explicitly addressed in a recenthaplotype-based simulation study (Kofler and Schlotterer2013) The authors demonstrate that both the power andspecificity of EampR studies is increased when larger numbersof founders are used with the gains in performance beingparticularly noticeable when the strength of selection per SNPis strong Another plausible way to reduce the impact of long-range LD is to create independent replicates from differentsets of founders Although this does not remove long-rangeLD per se it results in different associations across replicates

such that analyses that combine replicates should remove themajority of such effects

Finally our results indicate that inversions are likely to be amajor hindrance in detecting causal loci Large segregatinginversions appear to be ubiquitous in natural D melanogasterpopulations and recent work suggests that newly introducedinversions may influence genetic diversity across entire chro-mosome arms presumably by suppressing recombinationwell beyond inversion breakpoints (Corbett-Detig and Hartl2012 Pool et al 2012 Kapun et al 2013) Consequently anyimprovements in performance that might be achievedthrough the experimental design modifications suggestedabove are unlikely to be realized in chromosome arms carry-ing large inversions whereby inversion-free populations maybe necessary to ensure that such improvements are realizedover the entire genome Although inversion-free D melano-gaster populations can be obtained via karyotyping individ-uals prior to the study the time and effort expended in suchscreens may become prohibitive if large founding populationsare desired Another possibility is to use an alternativeDrosophila species for which segregating inversions are raresuch as D simulans or D mauritiana instead of D melano-gaster Although these species currently lack the detailedgenetic and genomic resources available for D melanogasterthe ongoing development of portable molecular tools hasmade functional validation of candidate loci in these speciestenable (Carroll 2011 Liu et al 2012 Yu et al 2013)

ConclusionPrevious D melanogaster EampR studies have not directly ad-dressed the role that inversions or long-range LD may havehad in generating the large numbers of reported candidateSNPs Our results imply that such estimates were probablyinflated by false positives resulting from these phenomenaConsequently Drosophila EampR study designs will need to bemodified before the true causal loci that contribute to adap-tation andor complex traits can be reliably identified Whilewe have outlined several possibilities here we also point outthat the study by Kofler and Schlotterer (2013) is intended asa general guideline for EampR study design There the authorsuse haplotype-based simulations to investigate EampR perfor-mance for wide range of experimental and evolutionary var-iables and thereby provide an invaluable reference for futureDrosophila EampR research

Materials and Methods

Experimental Populations and Evolution Regimes

Full details on the experimental setup are outlined in Orozco-terWengel et al (2012) Briefly wild D melanogaster werecollected in Northern Portugal in the summer of 2008 Thepopulation was maintained as 113 isofemale lines for fivegenerations in the laboratory prior to the creation of mass-bred populations Ten replicated mass-bred populations werecreated with each replicate combining five females from eachisofemale line This initial population of merged isofemalelines is referred to as the base population or F0 Five replicatepopulations were maintained in each of two novel

371

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

environments one hot and one cold treatment Each treat-ment cycled between two fixed temperature and light set-tings every 12 hmdashbetween 10 C and 20 C in the coldtreatment and 18 C and 28 C in the hot treatmentmdashwithlight and dark periods occurring with the hot and cold phaseof each treatment respectively The heatingcooling periodbetween the two fixed temperatures was relatively rapidtaking ~1 h before the new temperature is reached suchthat most time is spent at either of the modal temperaturesEach replicate was maintained at ~1000 individuals with~5050 sex ratios that were evenly distributed across five~225 ml bottles Each bottle contained ~70 ml of standardDrosophila medium Generations were nonoverlapping

Genome Sequencing and Mapping

Illumina full genome sequencing was performed on threereplicates for each of the base population and evolvedpopulations For the hot treatment a single replicate of the23rd generation and two from the 15th generation weresequenced (Orozco-terWengel et al 2012) whereas for thecold treatment all three sequenced replicates came from the15th generation Sequencing was performed using paired-endreads 75 bp in length in the hot treatment (GAIIx) and 100 bplong in the cold treatment (HiSeq) For each replicate andtime point ~500 flies were pooled and sequenced as onesample (ie Pool-Seq Futschik and Schlotterer 2010) Twoadditional replicates were also sequenced from the 15th gen-eration of the hot and cold treatments These were used tocheck the reproducibility of candidate detection as describedbelow Average genome-wide coverage ranged from ~35- to~90-fold for each of the sequenced replicates (supplementarytable S4 Supplementary Material online)

Mapping was carried out using a combination of purpose-built and third-party software which is described in detail inKofler Orozco-terWengel et al (2011) Briefly reads weretrimmed to remove low-quality bases and then mappedwith bwa (version 058c) (Li and Durbin 2009) against theD melanogaster reference genome (version 518) andWolbachia (NC_0029786) The alignment files were con-verted to the SAM format (Li et al 2009) with reads notmapped in proper pairs or with a mapping quality of lessthan 20 being removed SAM files were converted into thepileup format and indels and repeat sequences identified byRepeatMasker 329 (wwwrepeatmaskerorg last accessedNovember 5 2013) masked The aligned pileup files wereconverted to a synchronized file (Kofler Pandey et al 2011)from which SNPs and their accompanying allele frequencieswere derived All allele frequency changes presented hereinwere averaged across all three replicates (weighted by cover-age) We further removed all SNPs whose coverage was eitheramong the top 2 of a given replicate and time point or had aminimum count less than 30 across the 23 populations in ourextended data set that comprised additional replicates andtime points that were not analyzed here Because the averagecoverage per SNP across the extended data set was ~1500the minimum count of 30 equates to an average 2 minorallele frequency across all replicates and time points (although

allele frequencies could be lower in the time points that werespecifically analyzed in this study) Annotation was based onversion 540 of the D melanogaster reference genomeApproximately 145 million SNPs were called overall

Effective Population Size FDR andCandidate Estimation

In order to estimate Ne and FDR in both treatments forwardWrightndashFisher simulations were performed to generate ex-pected neutral changes during the experiment Three sets ofneutral allele frequency changes were simulated for eachtreatment matching the number of replicates in the studyFor each simulation 145 million SNPs were independentlysampled (ie approximating the number of SNPs in our realdata) from the distribution of allele frequencies expected after15 generations of neutral evolution conditional on the start-ing frequency for each SNP Starting frequencies for each SNPwere randomly drawn from the empirical distribution of startfrequencies in the experiment with the frequencies being thecoverage-weighted average across all replicates The expecteddistribution of allele frequencies after 15 generations was cre-ated by generating an NiNj transition matrix where eachcell is the binomial probability of seeing j alleles with a startingfrequency iN (ie N is the effective population size) and thenraising this matrix to the 15th power To simulate the effectsof DNA pooling in our study an additional round of binomialsampling was conducted on the start and end allele frequen-cies for each SNP To do this each SNP was assigned a randomcoverage value that was drawn from the empirical distribu-tion of coverages from a specific generation and treatmentThis was done separately for each replicate thereby capturingthe heterogeneity in coverage both within and betweenreplicates in the study

The effective population size of the observed data was es-timated as follows First multiple simulations were performedas outlined above each with a differing Ne Each simulationproduced a set of expected start and end allele frequencies fora given Ne from which an allele frequency change (AFC) dis-tribution was obtained (ie the end frequency minus startfrequency for all 145 million SNPs) The expected AFC distri-bution was then compared for goodness of fit with the ob-served AFC distribution by summing the squared difference ofthe AFC values evaluated in 1 frequency change binsBecause individual replicates were simulated we averagedthe goodness of fit across the three replicates to get a pointestimate for each Ne Additionally the error around this meanwas sufficiently large for both treatments that the best-fittingNe was not significantly different from many neighboring Ne

values Consequently we took the lowest Ne from amongstthis range (~250 supplementary table S1 SupplementaryMaterial online) as it is expected to provide the most conser-vative estimate of the FDR (because the strength of drift in-creases as Ne decreases) The Ne for individual chromosomeswas also inferred by limiting the observed data to that specificto each chromosome and then repeating the steps above

Following Ne estimation the number of putative selectedSNPs was inferred as follows First all ~145 million SNPs that

372

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

remained after filtering were subjected to the CMH test(Agresti 2002) The CMH test was shown to outperform sev-eral other statistics that were used for candidate inference inrecent Drosophila EampR studies (Kofler and Schlotterer 2013)Testing was conducted separately for the hot and cold treat-ments with each being contrasted against the base popula-tion Second the CMH test was applied to simulated SNPdata that was derived using the conservative Ne estimate of250 (discussed earlier) Finally the number of candidate SNPswas estimated by setting an empirical FDR of 0001 for eachtreatment Thus the number of candidates for a given treat-ment was quantified as the number of observed SNPs thathad CMH P values that were lower than the least significantCMH P value from among the 01 most extreme simulatedSNPs The number of candidate SNPs was also inferred for arange of different FDR values and the results are shown insupplementary table S1 Supplementary Material online

To examine the effects that linkage between loci had onthe candidate inference we also performed 30 forward neu-tral simulations starting from a population with 250 uniquehaplotypes (ie 250 homozygous diploid individuals) for 15generations Thus the number of founding haplotypes andthe effective population size during the simulations was 250Assuming that the isofemale founders maintained two segre-gating chromosomes per line on averagemdashwhich is notunreasonable considering that they were five generationsold when making the mass-bred populationsmdashthen thesenumbers roughly accord with those expected within ourexperimental populations The simulations were performedusing MimicrEE (Kofler and Schlotterer 2013) This softwareenabled the modeling of recombination at rates that werederived from empirical estimates (Fiston-Lavier et al 2010)Thirty simulations were performed and divided into 10 equal-sized subsets The three simulations from each of the 10subsets were then downsampled to match the coveragedistributions of the three empirical replicates from the rele-vant generation and treatment Next we applied the CMHtest to all sites in each subset and estimated the number ofcandidates thereof via the FDR method outlined in the pre-ceding paragraph The haplotypes used in the simulationswere created by neutral coalescent simulations and aredescribed in Bastide et al (2013)

Enrichment in Thermal Tolerance Genes

We tested for treatment-specific evolution by looking forcandidate enrichment in genes associated with heat andcold tolerance Thermal tolerance gene sets were takenfrom the Candidate Stress Genes in Drosophila databasewhich was obtained from the Centre for EnvironmentalStress and Adaptation Research website (httpcesarorgaulast accessed November 5 2013) Only genes with a signifi-cant and unique association with either heat or cold tol-erance were included in the analyses (supplementary tableS2 Supplementary Material online) Separate enrichmentanalyses were performed on the 2000 4000 6000 800010000 12000 14000 16000 18000 20000 25000 3000035000 40000 45000 50000 60000 70000 80000 90000

and 100000 most significant SNPs in each treatment Notethat each successive set of SNPs was an aggregation of pre-vious SNP sets and the subsequent set of most significantSNPs For instance the set that contained the top 4000 SNPscomprised the set containing the top 2000 SNPs along withthe next 2000 most significant SNPs All enrichment analyseswere performed using Gowinda software (Kofler andSchlotterer 2012) Gowinda generates P values by conductingmultiple simulations in which the position of candidate SNPsare randomly chosen from the ~145 million of SNPs scat-tered throughout the genome The P value was then quan-tified as the proportion of simulations that had more genesspecific to a given category (in our study genes associatedeither with hot or cold thermotolerance) containing at leastone candidate SNP than the observed data set (Kofler andSchlotterer 2012) In total we performed 100000 permuta-tions for each set of candidate SNPs in each treatment andfor each set of thermal tolerance genes In each case a genecontaining multiple candidate SNPs was only counted once(ie the gene option of the mode argument) Gene annota-tions based on version 540 of D melanogaster referencegenome and only the full coding region of each gene andnot the upstream or downstream flanking regions (ie thegene option of the gene-definition argument) wereconsidered Because the thermal genes were specificallytested for enrichment with the expectation that therewould be enrichment for hot alleles in heat tolerancegenes and cold alleles in cold-tolerance genes we report Pvalues rather q values adjusted for testing multiple categoriesThis software was specifically designed to deal with differ-ences in average gene size among categories and thereforedoes not result in the overrepresentation of categories withlarger than average gene sizes (Kofler and Schlotterer 2012)

Identification of Putative High LD Regions andShort Introns

LD can result in large number of false positives for instancevia allele frequency correlations between selected and neutrallinked sites Consequently the following regions were desig-nated as having putatively elevated LD and were removedfrom specific analyses as indicated in the main text Firstregions covered by inversions or within 500 kb either side ofthe inversion breakpoints Segregating inversions were inden-tified in our experimental evolution populations by Kapunet al (2013) Second a 1 Mb region located within the inver-sion-free part of 3R This region was found to harbor a largenumber of low-frequency alleles that showed a similar allelefrequency change a pattern consistent with selection upon asingle haplotype block (Orozco-terWengel et al 2012) Finallyregions with a local recombination rate less than 2 cMMbThe recombination rates were taken from Comeron et al(2012) Approximately 300000 of the original 145 millionSNPs remained following the removal of these regionsBecause 3R contained three segregating inversions and theputative selected haplotype block only 15000 SNPs remainedfollowing removal of these regions whereas other chromo-somes maintained at least 50000 SNPs

373

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Short introns coordinates were taken from Lawrie et al(2013) To further ensure that short introns sites were behav-ing neutrally we only took those that were located within theputative low LD regions identified above

ROC Curves and Reproducibility

To test the reproducibility of our top candidates two addi-tional replicates were sequenced from generation 15 in eachtreatment with all sites being processed through the samepipeline as the original candidates Because we did not haveadditional replicates from the base population two new basereplicates were created This was done by combining thereads from the separate base replicates into a single filefrom which two thirds of all reads were randomly sampled(without replacement) and split in half to generate two newfiles each having one-third of the combined base populationreads (~50-fold coverage) All SNPs from the two new sets ofreplicates were then subjected to CMH testing and comparedfor consistency with the original data set using ROC curvesFor the sake of consistency the ~145 million SNPs from theoriginal data set were used in the comparisons

We used a modified ROC curve analysis to provide agraphical representation of the consistency between theoriginal and new data sets that is those based on threereplicates and two replicates respectively The ROC curvesquantify the overlap between the two data sets across suc-cessively larger groups of SNPs with SNPs having beenranked by their CMH P value Points falling on the 11 diag-onal line indicate that overlap between the two sets of SNPsmatches chance expectations of reproducibility whereaspoints lying above the diagonal indicate excess reproducibil-ity Because we are interested in changes that were mostlikely to be caused by the deterministic process of selectionas opposed to the stochastic effects of genetic drift onlySNPs where the same allele was increasing across all fivereplicates were considered as being reproducible

The expected degree of reproducibility under neutralitywas quantified using neutral MimicrEE simulations (Koflerand Schlotterer 2013) All parameters used were identical tothe simulations used for the FDR estimates outlined aboveexcept that a total of 50 data sets were simulated in this caseThe simulated data sets were divided into 10 equally sizedsubsets with each subset being further split into a group ofthree replicates and a group of two replicates Followingdownsampling of the data sets to match our empirical cov-erage distributions (discussed earlier) for each set of simula-tions CMH tests were applied to each group of three and tworeplicates The resulting data were compared for reproduc-ibility using ROC curves as above

LD Analyses

The degree of LD around the top 2000 candidates was esti-mated for both treatments following the same methods out-lined in Orozco-terWengel et al (2012) Briefly we estimatedthe median P value for all SNPs flanking the top 2000 candi-dates in nonoverlapping 50-bp windows either side of thefocal candidates The decay in the P value around the focal

candidates provides a proxy for the strength and extent of LDin regions flanking putative selected sites To ascertain thebackground level of LD among similar SNPs 2000 more SNPswere randomly drawn from the genome after excluding allregions within 200 bp of the top 2000 candidates The num-bers of random SNPs drawn from each chromosome armmatched the distribution of the top 2000 candidates ineach treatment

We used MimicrEE simulations (Kofler and Schlotterer2013) to estimate the degree of long-range LD achievablegiven a simple scenario where strong selection acted on asingleton located in a single haplotype The selection strength(05) was based on increasing a single allele from 1 to 30over 15 generations (ie the mean allele frequency changeobserved for the top 2000 candidate SNPs in the hot treat-ment) This was derived using s = 2ln(p0q1p1q0)t where sis the selection coefficient p and q the biallelic frequencies atthe start (subscript 0) and end (subscript 1) generations andt is the change in generations (eq 33 in Charlesworth andCharlesworth [2010]) Other than the selected site all otherparameters were the same as the neutral simulations outlinedabove Nine simulations in which the singleton increased infrequency by 25ndash35 were included in the analyses Thesewere evenly split into three subsets with each subset beingdownsampled and submitted to CMH testing (in the samemanner as the neutral simulations discussed earlier) Theresulting data were then used to produce Manhattan plotsthat enabled visual assessment the degree of long-range LD(fig 7)

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to A Bergland and D Lawrie for shar-ing the coordinates for the short introns and Martin Kapun forsharing unpublished results RT wishes to thank his colleaguesat the Institut fur Populationsgenetik and the Petrov laboratoryat Stanford University for helpful discussions on data interpre-tation This work was supported by a PhD fellowship from theVetmeduni Vienna the Austrian Science Fund (FWF P19467-B11 W1225-B20) and the European Research Council (ERC)grant ldquoArchAdaptrdquo awarded to CS

ReferencesAgresti A 2002 Categorical data analysis New York John Wiley amp SonsAndolfatto P Wall JD 2003 Linkage disequilibrium patterns across a

recombination gradient in African Drosophila melanogasterGenetics 1651289ndash1305

Bastide H Betancourt A Nolte V Tobler R Stobe P Futschik A SchlottererC 2013 A genome-wide fine-scale map of natural pigmentationvariation in Drosophila melanogaster PLoS Genet 9e1003534

Bergelson J Roux F 2010 Towards identifying genes underlying ecologi-cally relevant traits in Arabidopsis thaliana Nat Rev Genet 11867ndash879

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2013Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila [cited 2013 Aug 26] Availablefrom httparxivorgabs13035044

374

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Page 7: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

information lies in using the base population informationwhich has not been considered in most published EampR studiesto date

Despite strong evidence that differential thermal selectioncontributed to our top candidates in each treatment theshear number of putatively selected sites indicates that ourresults were confounded by a large number of false positivesGiven that short-range LD and strong drift in low recombin-ing regions were unlikely to be major contributors to theexcess of false positives in our study we hypothesized thatLD over longer distances was responsible for much of thisexcess Long-range LD in our experimental populations couldhave arisen as a byproduct of the founders being a smallsample of the much larger natural population In this casechance correlations between sites separated by vast distancesmay result from the finite sampling of alleles Long-range LDmay also originate within natural populations For instancestrong LD between loci separated by large distances has beenreported for Japanese D melanogaster populations (Takano-Shimizu et al 2004 Itoh et al 2010) Because the LD was muchstronger in the spring than in the autumn this finding wasattributed to exaggerated sampling effects during the previ-ous winter bottleneck although there was evidence thatrecurrent epistatic selection was also responsible for somelinked loci Finally the introgression of new haplotypesfrom migrant individuals is another natural process thatcould produce long-range LD Recent and ongoing admixture

has been reported for African (Pool and Aquadro 2006 Poolet al 2012) and North American (Caracristi and Schlotterer2003) D melanogaster populations suggesting that in prin-ciple this may also contribute to long-range LD in naturalpopulations

The systematic increase in frequency of alleles in the hottreatment that were broadly distributed across a 1 Mb regionof 3R demonstrates the effects of long-range LD in our exper-iment In combination with differences in the distribution ofthe starting allele frequencies long-range LD may also help toexplain the differences in the number of candidates and thelevel of concordance observed between the two treatmentsGiven that most candidates were rare in the hot treatmentand common in the cold treatmentmdashpresumably resultingfrom prior selection against (for) heat (cold) toleranceallelesmdashlong-range LD is expected to be more extensiveamong the hot candidates This is because rare haplotypesare expected to harbor many more private SNPs thancommon ones and all such sites will experience the samechange in allele frequency until these associations are brokendown by recombination The difference in Ne between thetwo treatments is also consistent with more extensive long--range LD producing more widespread changes in allele fre-quencies in the hot treatment

Segregating inversions are another probable source of falsepositives in our study The large decrease in concordance thatwas observed when inversions and regions with low

FIG 7 Simulated long-range LD generated by a singleton under strong selection Manhattan plots showing the distribution of the top 1000 candidateSNPs from three separate sets of simulated data (top to bottom panels) in which a singleton carried by a single chromosome-length haplotype is understrong selection (see Materials and Methods) The full chromosome is shown in the panels on the right with a zoomed-in view shown in the panels onthe left Significant SNPs occur vast distances from the selected site (dashed line) and tend to be several thousands of base pairs apart Thus it appearslikely that long-range LD between selected and neutral sites can readily inflate the number of false positives in an EampR study

370

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

recombination were removed from the data set supports thisidea (fig 4) Additional confirmation comes from a recentstudy which found that several large inversions were segre-gating in our populations over the course of the experimentincluding three partially overlapping inversions on chromo-some arm 3R (Kapun et al 2013) By tracing the trajectory ofalleles that were private to each inversion a single inversionwas found to increase across all three replicates in each treat-ment (In(3R)c by 25 in the hot and In(3R)mo by 15 in thecold) Indeed the majority of the top 2000 candidate SNPswas observed on chromosome arm 3R for both treatments(~80 for the hot and ~40 for the cold supplementarytable S3 Supplementary Material online) with many candi-date SNPs lying within these inversions Furthermore thegreatest concentration of cold candidates was situatedwithin the region where all three inversions overlapBecause recombination is likely to be suppressed in thisregion the local effects of LD and drift are expected to beexacerbated relative to the rest of the genome Thus changesin inversion frequencies are also likely to have substantiallyinflated the number of false positives in our study

Implications for Future EampR Research

Our results indicate that EampR studies with small founder pop-ulations strong selection and large segregating inversions arelikely to suffer from large numbers of false positives Thesefactors are likely to be present in other D melanogaster EampRstudiesmdashindeed several large inversions segregate at interme-diate frequencies in natural D melanogaster populations(Corbett-Detig and Hartl 2012 Kapun et al 2013)mdashmeaningthat our findings have important implications for future EampRresearch in Drosophila

First if seasonal selection is common in D melanogasterwhich accumulating evidence suggests may be the case(Stalker 1980 Takano-Shimizu et al 2004 Itoh et al 2010Bergland et al 2013) then the genetic composition of theexperimental populations will be dependent on their time ofcollection Evidence suggests that genes associated with stressresistance are highly pleiotropic in Drosophila (Rose et al1992 Hoffman et al 2001 Bubliy and Loeschcke 2005 Kolssand Kawecki 2008) implying that a broad spectrum of traitsmay be affected in this way

Second large numbers of false positives are expected inEampR studies that use small founder populations as a directresult of long-range LD between selected and neutral sitesThe effect of founder population size on the performance ofexperimental design has been explicitly addressed in a recenthaplotype-based simulation study (Kofler and Schlotterer2013) The authors demonstrate that both the power andspecificity of EampR studies is increased when larger numbersof founders are used with the gains in performance beingparticularly noticeable when the strength of selection per SNPis strong Another plausible way to reduce the impact of long-range LD is to create independent replicates from differentsets of founders Although this does not remove long-rangeLD per se it results in different associations across replicates

such that analyses that combine replicates should remove themajority of such effects

Finally our results indicate that inversions are likely to be amajor hindrance in detecting causal loci Large segregatinginversions appear to be ubiquitous in natural D melanogasterpopulations and recent work suggests that newly introducedinversions may influence genetic diversity across entire chro-mosome arms presumably by suppressing recombinationwell beyond inversion breakpoints (Corbett-Detig and Hartl2012 Pool et al 2012 Kapun et al 2013) Consequently anyimprovements in performance that might be achievedthrough the experimental design modifications suggestedabove are unlikely to be realized in chromosome arms carry-ing large inversions whereby inversion-free populations maybe necessary to ensure that such improvements are realizedover the entire genome Although inversion-free D melano-gaster populations can be obtained via karyotyping individ-uals prior to the study the time and effort expended in suchscreens may become prohibitive if large founding populationsare desired Another possibility is to use an alternativeDrosophila species for which segregating inversions are raresuch as D simulans or D mauritiana instead of D melano-gaster Although these species currently lack the detailedgenetic and genomic resources available for D melanogasterthe ongoing development of portable molecular tools hasmade functional validation of candidate loci in these speciestenable (Carroll 2011 Liu et al 2012 Yu et al 2013)

ConclusionPrevious D melanogaster EampR studies have not directly ad-dressed the role that inversions or long-range LD may havehad in generating the large numbers of reported candidateSNPs Our results imply that such estimates were probablyinflated by false positives resulting from these phenomenaConsequently Drosophila EampR study designs will need to bemodified before the true causal loci that contribute to adap-tation andor complex traits can be reliably identified Whilewe have outlined several possibilities here we also point outthat the study by Kofler and Schlotterer (2013) is intended asa general guideline for EampR study design There the authorsuse haplotype-based simulations to investigate EampR perfor-mance for wide range of experimental and evolutionary var-iables and thereby provide an invaluable reference for futureDrosophila EampR research

Materials and Methods

Experimental Populations and Evolution Regimes

Full details on the experimental setup are outlined in Orozco-terWengel et al (2012) Briefly wild D melanogaster werecollected in Northern Portugal in the summer of 2008 Thepopulation was maintained as 113 isofemale lines for fivegenerations in the laboratory prior to the creation of mass-bred populations Ten replicated mass-bred populations werecreated with each replicate combining five females from eachisofemale line This initial population of merged isofemalelines is referred to as the base population or F0 Five replicatepopulations were maintained in each of two novel

371

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

environments one hot and one cold treatment Each treat-ment cycled between two fixed temperature and light set-tings every 12 hmdashbetween 10 C and 20 C in the coldtreatment and 18 C and 28 C in the hot treatmentmdashwithlight and dark periods occurring with the hot and cold phaseof each treatment respectively The heatingcooling periodbetween the two fixed temperatures was relatively rapidtaking ~1 h before the new temperature is reached suchthat most time is spent at either of the modal temperaturesEach replicate was maintained at ~1000 individuals with~5050 sex ratios that were evenly distributed across five~225 ml bottles Each bottle contained ~70 ml of standardDrosophila medium Generations were nonoverlapping

Genome Sequencing and Mapping

Illumina full genome sequencing was performed on threereplicates for each of the base population and evolvedpopulations For the hot treatment a single replicate of the23rd generation and two from the 15th generation weresequenced (Orozco-terWengel et al 2012) whereas for thecold treatment all three sequenced replicates came from the15th generation Sequencing was performed using paired-endreads 75 bp in length in the hot treatment (GAIIx) and 100 bplong in the cold treatment (HiSeq) For each replicate andtime point ~500 flies were pooled and sequenced as onesample (ie Pool-Seq Futschik and Schlotterer 2010) Twoadditional replicates were also sequenced from the 15th gen-eration of the hot and cold treatments These were used tocheck the reproducibility of candidate detection as describedbelow Average genome-wide coverage ranged from ~35- to~90-fold for each of the sequenced replicates (supplementarytable S4 Supplementary Material online)

Mapping was carried out using a combination of purpose-built and third-party software which is described in detail inKofler Orozco-terWengel et al (2011) Briefly reads weretrimmed to remove low-quality bases and then mappedwith bwa (version 058c) (Li and Durbin 2009) against theD melanogaster reference genome (version 518) andWolbachia (NC_0029786) The alignment files were con-verted to the SAM format (Li et al 2009) with reads notmapped in proper pairs or with a mapping quality of lessthan 20 being removed SAM files were converted into thepileup format and indels and repeat sequences identified byRepeatMasker 329 (wwwrepeatmaskerorg last accessedNovember 5 2013) masked The aligned pileup files wereconverted to a synchronized file (Kofler Pandey et al 2011)from which SNPs and their accompanying allele frequencieswere derived All allele frequency changes presented hereinwere averaged across all three replicates (weighted by cover-age) We further removed all SNPs whose coverage was eitheramong the top 2 of a given replicate and time point or had aminimum count less than 30 across the 23 populations in ourextended data set that comprised additional replicates andtime points that were not analyzed here Because the averagecoverage per SNP across the extended data set was ~1500the minimum count of 30 equates to an average 2 minorallele frequency across all replicates and time points (although

allele frequencies could be lower in the time points that werespecifically analyzed in this study) Annotation was based onversion 540 of the D melanogaster reference genomeApproximately 145 million SNPs were called overall

Effective Population Size FDR andCandidate Estimation

In order to estimate Ne and FDR in both treatments forwardWrightndashFisher simulations were performed to generate ex-pected neutral changes during the experiment Three sets ofneutral allele frequency changes were simulated for eachtreatment matching the number of replicates in the studyFor each simulation 145 million SNPs were independentlysampled (ie approximating the number of SNPs in our realdata) from the distribution of allele frequencies expected after15 generations of neutral evolution conditional on the start-ing frequency for each SNP Starting frequencies for each SNPwere randomly drawn from the empirical distribution of startfrequencies in the experiment with the frequencies being thecoverage-weighted average across all replicates The expecteddistribution of allele frequencies after 15 generations was cre-ated by generating an NiNj transition matrix where eachcell is the binomial probability of seeing j alleles with a startingfrequency iN (ie N is the effective population size) and thenraising this matrix to the 15th power To simulate the effectsof DNA pooling in our study an additional round of binomialsampling was conducted on the start and end allele frequen-cies for each SNP To do this each SNP was assigned a randomcoverage value that was drawn from the empirical distribu-tion of coverages from a specific generation and treatmentThis was done separately for each replicate thereby capturingthe heterogeneity in coverage both within and betweenreplicates in the study

The effective population size of the observed data was es-timated as follows First multiple simulations were performedas outlined above each with a differing Ne Each simulationproduced a set of expected start and end allele frequencies fora given Ne from which an allele frequency change (AFC) dis-tribution was obtained (ie the end frequency minus startfrequency for all 145 million SNPs) The expected AFC distri-bution was then compared for goodness of fit with the ob-served AFC distribution by summing the squared difference ofthe AFC values evaluated in 1 frequency change binsBecause individual replicates were simulated we averagedthe goodness of fit across the three replicates to get a pointestimate for each Ne Additionally the error around this meanwas sufficiently large for both treatments that the best-fittingNe was not significantly different from many neighboring Ne

values Consequently we took the lowest Ne from amongstthis range (~250 supplementary table S1 SupplementaryMaterial online) as it is expected to provide the most conser-vative estimate of the FDR (because the strength of drift in-creases as Ne decreases) The Ne for individual chromosomeswas also inferred by limiting the observed data to that specificto each chromosome and then repeating the steps above

Following Ne estimation the number of putative selectedSNPs was inferred as follows First all ~145 million SNPs that

372

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

remained after filtering were subjected to the CMH test(Agresti 2002) The CMH test was shown to outperform sev-eral other statistics that were used for candidate inference inrecent Drosophila EampR studies (Kofler and Schlotterer 2013)Testing was conducted separately for the hot and cold treat-ments with each being contrasted against the base popula-tion Second the CMH test was applied to simulated SNPdata that was derived using the conservative Ne estimate of250 (discussed earlier) Finally the number of candidate SNPswas estimated by setting an empirical FDR of 0001 for eachtreatment Thus the number of candidates for a given treat-ment was quantified as the number of observed SNPs thathad CMH P values that were lower than the least significantCMH P value from among the 01 most extreme simulatedSNPs The number of candidate SNPs was also inferred for arange of different FDR values and the results are shown insupplementary table S1 Supplementary Material online

To examine the effects that linkage between loci had onthe candidate inference we also performed 30 forward neu-tral simulations starting from a population with 250 uniquehaplotypes (ie 250 homozygous diploid individuals) for 15generations Thus the number of founding haplotypes andthe effective population size during the simulations was 250Assuming that the isofemale founders maintained two segre-gating chromosomes per line on averagemdashwhich is notunreasonable considering that they were five generationsold when making the mass-bred populationsmdashthen thesenumbers roughly accord with those expected within ourexperimental populations The simulations were performedusing MimicrEE (Kofler and Schlotterer 2013) This softwareenabled the modeling of recombination at rates that werederived from empirical estimates (Fiston-Lavier et al 2010)Thirty simulations were performed and divided into 10 equal-sized subsets The three simulations from each of the 10subsets were then downsampled to match the coveragedistributions of the three empirical replicates from the rele-vant generation and treatment Next we applied the CMHtest to all sites in each subset and estimated the number ofcandidates thereof via the FDR method outlined in the pre-ceding paragraph The haplotypes used in the simulationswere created by neutral coalescent simulations and aredescribed in Bastide et al (2013)

Enrichment in Thermal Tolerance Genes

We tested for treatment-specific evolution by looking forcandidate enrichment in genes associated with heat andcold tolerance Thermal tolerance gene sets were takenfrom the Candidate Stress Genes in Drosophila databasewhich was obtained from the Centre for EnvironmentalStress and Adaptation Research website (httpcesarorgaulast accessed November 5 2013) Only genes with a signifi-cant and unique association with either heat or cold tol-erance were included in the analyses (supplementary tableS2 Supplementary Material online) Separate enrichmentanalyses were performed on the 2000 4000 6000 800010000 12000 14000 16000 18000 20000 25000 3000035000 40000 45000 50000 60000 70000 80000 90000

and 100000 most significant SNPs in each treatment Notethat each successive set of SNPs was an aggregation of pre-vious SNP sets and the subsequent set of most significantSNPs For instance the set that contained the top 4000 SNPscomprised the set containing the top 2000 SNPs along withthe next 2000 most significant SNPs All enrichment analyseswere performed using Gowinda software (Kofler andSchlotterer 2012) Gowinda generates P values by conductingmultiple simulations in which the position of candidate SNPsare randomly chosen from the ~145 million of SNPs scat-tered throughout the genome The P value was then quan-tified as the proportion of simulations that had more genesspecific to a given category (in our study genes associatedeither with hot or cold thermotolerance) containing at leastone candidate SNP than the observed data set (Kofler andSchlotterer 2012) In total we performed 100000 permuta-tions for each set of candidate SNPs in each treatment andfor each set of thermal tolerance genes In each case a genecontaining multiple candidate SNPs was only counted once(ie the gene option of the mode argument) Gene annota-tions based on version 540 of D melanogaster referencegenome and only the full coding region of each gene andnot the upstream or downstream flanking regions (ie thegene option of the gene-definition argument) wereconsidered Because the thermal genes were specificallytested for enrichment with the expectation that therewould be enrichment for hot alleles in heat tolerancegenes and cold alleles in cold-tolerance genes we report Pvalues rather q values adjusted for testing multiple categoriesThis software was specifically designed to deal with differ-ences in average gene size among categories and thereforedoes not result in the overrepresentation of categories withlarger than average gene sizes (Kofler and Schlotterer 2012)

Identification of Putative High LD Regions andShort Introns

LD can result in large number of false positives for instancevia allele frequency correlations between selected and neutrallinked sites Consequently the following regions were desig-nated as having putatively elevated LD and were removedfrom specific analyses as indicated in the main text Firstregions covered by inversions or within 500 kb either side ofthe inversion breakpoints Segregating inversions were inden-tified in our experimental evolution populations by Kapunet al (2013) Second a 1 Mb region located within the inver-sion-free part of 3R This region was found to harbor a largenumber of low-frequency alleles that showed a similar allelefrequency change a pattern consistent with selection upon asingle haplotype block (Orozco-terWengel et al 2012) Finallyregions with a local recombination rate less than 2 cMMbThe recombination rates were taken from Comeron et al(2012) Approximately 300000 of the original 145 millionSNPs remained following the removal of these regionsBecause 3R contained three segregating inversions and theputative selected haplotype block only 15000 SNPs remainedfollowing removal of these regions whereas other chromo-somes maintained at least 50000 SNPs

373

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Short introns coordinates were taken from Lawrie et al(2013) To further ensure that short introns sites were behav-ing neutrally we only took those that were located within theputative low LD regions identified above

ROC Curves and Reproducibility

To test the reproducibility of our top candidates two addi-tional replicates were sequenced from generation 15 in eachtreatment with all sites being processed through the samepipeline as the original candidates Because we did not haveadditional replicates from the base population two new basereplicates were created This was done by combining thereads from the separate base replicates into a single filefrom which two thirds of all reads were randomly sampled(without replacement) and split in half to generate two newfiles each having one-third of the combined base populationreads (~50-fold coverage) All SNPs from the two new sets ofreplicates were then subjected to CMH testing and comparedfor consistency with the original data set using ROC curvesFor the sake of consistency the ~145 million SNPs from theoriginal data set were used in the comparisons

We used a modified ROC curve analysis to provide agraphical representation of the consistency between theoriginal and new data sets that is those based on threereplicates and two replicates respectively The ROC curvesquantify the overlap between the two data sets across suc-cessively larger groups of SNPs with SNPs having beenranked by their CMH P value Points falling on the 11 diag-onal line indicate that overlap between the two sets of SNPsmatches chance expectations of reproducibility whereaspoints lying above the diagonal indicate excess reproducibil-ity Because we are interested in changes that were mostlikely to be caused by the deterministic process of selectionas opposed to the stochastic effects of genetic drift onlySNPs where the same allele was increasing across all fivereplicates were considered as being reproducible

The expected degree of reproducibility under neutralitywas quantified using neutral MimicrEE simulations (Koflerand Schlotterer 2013) All parameters used were identical tothe simulations used for the FDR estimates outlined aboveexcept that a total of 50 data sets were simulated in this caseThe simulated data sets were divided into 10 equally sizedsubsets with each subset being further split into a group ofthree replicates and a group of two replicates Followingdownsampling of the data sets to match our empirical cov-erage distributions (discussed earlier) for each set of simula-tions CMH tests were applied to each group of three and tworeplicates The resulting data were compared for reproduc-ibility using ROC curves as above

LD Analyses

The degree of LD around the top 2000 candidates was esti-mated for both treatments following the same methods out-lined in Orozco-terWengel et al (2012) Briefly we estimatedthe median P value for all SNPs flanking the top 2000 candi-dates in nonoverlapping 50-bp windows either side of thefocal candidates The decay in the P value around the focal

candidates provides a proxy for the strength and extent of LDin regions flanking putative selected sites To ascertain thebackground level of LD among similar SNPs 2000 more SNPswere randomly drawn from the genome after excluding allregions within 200 bp of the top 2000 candidates The num-bers of random SNPs drawn from each chromosome armmatched the distribution of the top 2000 candidates ineach treatment

We used MimicrEE simulations (Kofler and Schlotterer2013) to estimate the degree of long-range LD achievablegiven a simple scenario where strong selection acted on asingleton located in a single haplotype The selection strength(05) was based on increasing a single allele from 1 to 30over 15 generations (ie the mean allele frequency changeobserved for the top 2000 candidate SNPs in the hot treat-ment) This was derived using s = 2ln(p0q1p1q0)t where sis the selection coefficient p and q the biallelic frequencies atthe start (subscript 0) and end (subscript 1) generations andt is the change in generations (eq 33 in Charlesworth andCharlesworth [2010]) Other than the selected site all otherparameters were the same as the neutral simulations outlinedabove Nine simulations in which the singleton increased infrequency by 25ndash35 were included in the analyses Thesewere evenly split into three subsets with each subset beingdownsampled and submitted to CMH testing (in the samemanner as the neutral simulations discussed earlier) Theresulting data were then used to produce Manhattan plotsthat enabled visual assessment the degree of long-range LD(fig 7)

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to A Bergland and D Lawrie for shar-ing the coordinates for the short introns and Martin Kapun forsharing unpublished results RT wishes to thank his colleaguesat the Institut fur Populationsgenetik and the Petrov laboratoryat Stanford University for helpful discussions on data interpre-tation This work was supported by a PhD fellowship from theVetmeduni Vienna the Austrian Science Fund (FWF P19467-B11 W1225-B20) and the European Research Council (ERC)grant ldquoArchAdaptrdquo awarded to CS

ReferencesAgresti A 2002 Categorical data analysis New York John Wiley amp SonsAndolfatto P Wall JD 2003 Linkage disequilibrium patterns across a

recombination gradient in African Drosophila melanogasterGenetics 1651289ndash1305

Bastide H Betancourt A Nolte V Tobler R Stobe P Futschik A SchlottererC 2013 A genome-wide fine-scale map of natural pigmentationvariation in Drosophila melanogaster PLoS Genet 9e1003534

Bergelson J Roux F 2010 Towards identifying genes underlying ecologi-cally relevant traits in Arabidopsis thaliana Nat Rev Genet 11867ndash879

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2013Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila [cited 2013 Aug 26] Availablefrom httparxivorgabs13035044

374

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Page 8: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

recombination were removed from the data set supports thisidea (fig 4) Additional confirmation comes from a recentstudy which found that several large inversions were segre-gating in our populations over the course of the experimentincluding three partially overlapping inversions on chromo-some arm 3R (Kapun et al 2013) By tracing the trajectory ofalleles that were private to each inversion a single inversionwas found to increase across all three replicates in each treat-ment (In(3R)c by 25 in the hot and In(3R)mo by 15 in thecold) Indeed the majority of the top 2000 candidate SNPswas observed on chromosome arm 3R for both treatments(~80 for the hot and ~40 for the cold supplementarytable S3 Supplementary Material online) with many candi-date SNPs lying within these inversions Furthermore thegreatest concentration of cold candidates was situatedwithin the region where all three inversions overlapBecause recombination is likely to be suppressed in thisregion the local effects of LD and drift are expected to beexacerbated relative to the rest of the genome Thus changesin inversion frequencies are also likely to have substantiallyinflated the number of false positives in our study

Implications for Future EampR Research

Our results indicate that EampR studies with small founder pop-ulations strong selection and large segregating inversions arelikely to suffer from large numbers of false positives Thesefactors are likely to be present in other D melanogaster EampRstudiesmdashindeed several large inversions segregate at interme-diate frequencies in natural D melanogaster populations(Corbett-Detig and Hartl 2012 Kapun et al 2013)mdashmeaningthat our findings have important implications for future EampRresearch in Drosophila

First if seasonal selection is common in D melanogasterwhich accumulating evidence suggests may be the case(Stalker 1980 Takano-Shimizu et al 2004 Itoh et al 2010Bergland et al 2013) then the genetic composition of theexperimental populations will be dependent on their time ofcollection Evidence suggests that genes associated with stressresistance are highly pleiotropic in Drosophila (Rose et al1992 Hoffman et al 2001 Bubliy and Loeschcke 2005 Kolssand Kawecki 2008) implying that a broad spectrum of traitsmay be affected in this way

Second large numbers of false positives are expected inEampR studies that use small founder populations as a directresult of long-range LD between selected and neutral sitesThe effect of founder population size on the performance ofexperimental design has been explicitly addressed in a recenthaplotype-based simulation study (Kofler and Schlotterer2013) The authors demonstrate that both the power andspecificity of EampR studies is increased when larger numbersof founders are used with the gains in performance beingparticularly noticeable when the strength of selection per SNPis strong Another plausible way to reduce the impact of long-range LD is to create independent replicates from differentsets of founders Although this does not remove long-rangeLD per se it results in different associations across replicates

such that analyses that combine replicates should remove themajority of such effects

Finally our results indicate that inversions are likely to be amajor hindrance in detecting causal loci Large segregatinginversions appear to be ubiquitous in natural D melanogasterpopulations and recent work suggests that newly introducedinversions may influence genetic diversity across entire chro-mosome arms presumably by suppressing recombinationwell beyond inversion breakpoints (Corbett-Detig and Hartl2012 Pool et al 2012 Kapun et al 2013) Consequently anyimprovements in performance that might be achievedthrough the experimental design modifications suggestedabove are unlikely to be realized in chromosome arms carry-ing large inversions whereby inversion-free populations maybe necessary to ensure that such improvements are realizedover the entire genome Although inversion-free D melano-gaster populations can be obtained via karyotyping individ-uals prior to the study the time and effort expended in suchscreens may become prohibitive if large founding populationsare desired Another possibility is to use an alternativeDrosophila species for which segregating inversions are raresuch as D simulans or D mauritiana instead of D melano-gaster Although these species currently lack the detailedgenetic and genomic resources available for D melanogasterthe ongoing development of portable molecular tools hasmade functional validation of candidate loci in these speciestenable (Carroll 2011 Liu et al 2012 Yu et al 2013)

ConclusionPrevious D melanogaster EampR studies have not directly ad-dressed the role that inversions or long-range LD may havehad in generating the large numbers of reported candidateSNPs Our results imply that such estimates were probablyinflated by false positives resulting from these phenomenaConsequently Drosophila EampR study designs will need to bemodified before the true causal loci that contribute to adap-tation andor complex traits can be reliably identified Whilewe have outlined several possibilities here we also point outthat the study by Kofler and Schlotterer (2013) is intended asa general guideline for EampR study design There the authorsuse haplotype-based simulations to investigate EampR perfor-mance for wide range of experimental and evolutionary var-iables and thereby provide an invaluable reference for futureDrosophila EampR research

Materials and Methods

Experimental Populations and Evolution Regimes

Full details on the experimental setup are outlined in Orozco-terWengel et al (2012) Briefly wild D melanogaster werecollected in Northern Portugal in the summer of 2008 Thepopulation was maintained as 113 isofemale lines for fivegenerations in the laboratory prior to the creation of mass-bred populations Ten replicated mass-bred populations werecreated with each replicate combining five females from eachisofemale line This initial population of merged isofemalelines is referred to as the base population or F0 Five replicatepopulations were maintained in each of two novel

371

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

environments one hot and one cold treatment Each treat-ment cycled between two fixed temperature and light set-tings every 12 hmdashbetween 10 C and 20 C in the coldtreatment and 18 C and 28 C in the hot treatmentmdashwithlight and dark periods occurring with the hot and cold phaseof each treatment respectively The heatingcooling periodbetween the two fixed temperatures was relatively rapidtaking ~1 h before the new temperature is reached suchthat most time is spent at either of the modal temperaturesEach replicate was maintained at ~1000 individuals with~5050 sex ratios that were evenly distributed across five~225 ml bottles Each bottle contained ~70 ml of standardDrosophila medium Generations were nonoverlapping

Genome Sequencing and Mapping

Illumina full genome sequencing was performed on threereplicates for each of the base population and evolvedpopulations For the hot treatment a single replicate of the23rd generation and two from the 15th generation weresequenced (Orozco-terWengel et al 2012) whereas for thecold treatment all three sequenced replicates came from the15th generation Sequencing was performed using paired-endreads 75 bp in length in the hot treatment (GAIIx) and 100 bplong in the cold treatment (HiSeq) For each replicate andtime point ~500 flies were pooled and sequenced as onesample (ie Pool-Seq Futschik and Schlotterer 2010) Twoadditional replicates were also sequenced from the 15th gen-eration of the hot and cold treatments These were used tocheck the reproducibility of candidate detection as describedbelow Average genome-wide coverage ranged from ~35- to~90-fold for each of the sequenced replicates (supplementarytable S4 Supplementary Material online)

Mapping was carried out using a combination of purpose-built and third-party software which is described in detail inKofler Orozco-terWengel et al (2011) Briefly reads weretrimmed to remove low-quality bases and then mappedwith bwa (version 058c) (Li and Durbin 2009) against theD melanogaster reference genome (version 518) andWolbachia (NC_0029786) The alignment files were con-verted to the SAM format (Li et al 2009) with reads notmapped in proper pairs or with a mapping quality of lessthan 20 being removed SAM files were converted into thepileup format and indels and repeat sequences identified byRepeatMasker 329 (wwwrepeatmaskerorg last accessedNovember 5 2013) masked The aligned pileup files wereconverted to a synchronized file (Kofler Pandey et al 2011)from which SNPs and their accompanying allele frequencieswere derived All allele frequency changes presented hereinwere averaged across all three replicates (weighted by cover-age) We further removed all SNPs whose coverage was eitheramong the top 2 of a given replicate and time point or had aminimum count less than 30 across the 23 populations in ourextended data set that comprised additional replicates andtime points that were not analyzed here Because the averagecoverage per SNP across the extended data set was ~1500the minimum count of 30 equates to an average 2 minorallele frequency across all replicates and time points (although

allele frequencies could be lower in the time points that werespecifically analyzed in this study) Annotation was based onversion 540 of the D melanogaster reference genomeApproximately 145 million SNPs were called overall

Effective Population Size FDR andCandidate Estimation

In order to estimate Ne and FDR in both treatments forwardWrightndashFisher simulations were performed to generate ex-pected neutral changes during the experiment Three sets ofneutral allele frequency changes were simulated for eachtreatment matching the number of replicates in the studyFor each simulation 145 million SNPs were independentlysampled (ie approximating the number of SNPs in our realdata) from the distribution of allele frequencies expected after15 generations of neutral evolution conditional on the start-ing frequency for each SNP Starting frequencies for each SNPwere randomly drawn from the empirical distribution of startfrequencies in the experiment with the frequencies being thecoverage-weighted average across all replicates The expecteddistribution of allele frequencies after 15 generations was cre-ated by generating an NiNj transition matrix where eachcell is the binomial probability of seeing j alleles with a startingfrequency iN (ie N is the effective population size) and thenraising this matrix to the 15th power To simulate the effectsof DNA pooling in our study an additional round of binomialsampling was conducted on the start and end allele frequen-cies for each SNP To do this each SNP was assigned a randomcoverage value that was drawn from the empirical distribu-tion of coverages from a specific generation and treatmentThis was done separately for each replicate thereby capturingthe heterogeneity in coverage both within and betweenreplicates in the study

The effective population size of the observed data was es-timated as follows First multiple simulations were performedas outlined above each with a differing Ne Each simulationproduced a set of expected start and end allele frequencies fora given Ne from which an allele frequency change (AFC) dis-tribution was obtained (ie the end frequency minus startfrequency for all 145 million SNPs) The expected AFC distri-bution was then compared for goodness of fit with the ob-served AFC distribution by summing the squared difference ofthe AFC values evaluated in 1 frequency change binsBecause individual replicates were simulated we averagedthe goodness of fit across the three replicates to get a pointestimate for each Ne Additionally the error around this meanwas sufficiently large for both treatments that the best-fittingNe was not significantly different from many neighboring Ne

values Consequently we took the lowest Ne from amongstthis range (~250 supplementary table S1 SupplementaryMaterial online) as it is expected to provide the most conser-vative estimate of the FDR (because the strength of drift in-creases as Ne decreases) The Ne for individual chromosomeswas also inferred by limiting the observed data to that specificto each chromosome and then repeating the steps above

Following Ne estimation the number of putative selectedSNPs was inferred as follows First all ~145 million SNPs that

372

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

remained after filtering were subjected to the CMH test(Agresti 2002) The CMH test was shown to outperform sev-eral other statistics that were used for candidate inference inrecent Drosophila EampR studies (Kofler and Schlotterer 2013)Testing was conducted separately for the hot and cold treat-ments with each being contrasted against the base popula-tion Second the CMH test was applied to simulated SNPdata that was derived using the conservative Ne estimate of250 (discussed earlier) Finally the number of candidate SNPswas estimated by setting an empirical FDR of 0001 for eachtreatment Thus the number of candidates for a given treat-ment was quantified as the number of observed SNPs thathad CMH P values that were lower than the least significantCMH P value from among the 01 most extreme simulatedSNPs The number of candidate SNPs was also inferred for arange of different FDR values and the results are shown insupplementary table S1 Supplementary Material online

To examine the effects that linkage between loci had onthe candidate inference we also performed 30 forward neu-tral simulations starting from a population with 250 uniquehaplotypes (ie 250 homozygous diploid individuals) for 15generations Thus the number of founding haplotypes andthe effective population size during the simulations was 250Assuming that the isofemale founders maintained two segre-gating chromosomes per line on averagemdashwhich is notunreasonable considering that they were five generationsold when making the mass-bred populationsmdashthen thesenumbers roughly accord with those expected within ourexperimental populations The simulations were performedusing MimicrEE (Kofler and Schlotterer 2013) This softwareenabled the modeling of recombination at rates that werederived from empirical estimates (Fiston-Lavier et al 2010)Thirty simulations were performed and divided into 10 equal-sized subsets The three simulations from each of the 10subsets were then downsampled to match the coveragedistributions of the three empirical replicates from the rele-vant generation and treatment Next we applied the CMHtest to all sites in each subset and estimated the number ofcandidates thereof via the FDR method outlined in the pre-ceding paragraph The haplotypes used in the simulationswere created by neutral coalescent simulations and aredescribed in Bastide et al (2013)

Enrichment in Thermal Tolerance Genes

We tested for treatment-specific evolution by looking forcandidate enrichment in genes associated with heat andcold tolerance Thermal tolerance gene sets were takenfrom the Candidate Stress Genes in Drosophila databasewhich was obtained from the Centre for EnvironmentalStress and Adaptation Research website (httpcesarorgaulast accessed November 5 2013) Only genes with a signifi-cant and unique association with either heat or cold tol-erance were included in the analyses (supplementary tableS2 Supplementary Material online) Separate enrichmentanalyses were performed on the 2000 4000 6000 800010000 12000 14000 16000 18000 20000 25000 3000035000 40000 45000 50000 60000 70000 80000 90000

and 100000 most significant SNPs in each treatment Notethat each successive set of SNPs was an aggregation of pre-vious SNP sets and the subsequent set of most significantSNPs For instance the set that contained the top 4000 SNPscomprised the set containing the top 2000 SNPs along withthe next 2000 most significant SNPs All enrichment analyseswere performed using Gowinda software (Kofler andSchlotterer 2012) Gowinda generates P values by conductingmultiple simulations in which the position of candidate SNPsare randomly chosen from the ~145 million of SNPs scat-tered throughout the genome The P value was then quan-tified as the proportion of simulations that had more genesspecific to a given category (in our study genes associatedeither with hot or cold thermotolerance) containing at leastone candidate SNP than the observed data set (Kofler andSchlotterer 2012) In total we performed 100000 permuta-tions for each set of candidate SNPs in each treatment andfor each set of thermal tolerance genes In each case a genecontaining multiple candidate SNPs was only counted once(ie the gene option of the mode argument) Gene annota-tions based on version 540 of D melanogaster referencegenome and only the full coding region of each gene andnot the upstream or downstream flanking regions (ie thegene option of the gene-definition argument) wereconsidered Because the thermal genes were specificallytested for enrichment with the expectation that therewould be enrichment for hot alleles in heat tolerancegenes and cold alleles in cold-tolerance genes we report Pvalues rather q values adjusted for testing multiple categoriesThis software was specifically designed to deal with differ-ences in average gene size among categories and thereforedoes not result in the overrepresentation of categories withlarger than average gene sizes (Kofler and Schlotterer 2012)

Identification of Putative High LD Regions andShort Introns

LD can result in large number of false positives for instancevia allele frequency correlations between selected and neutrallinked sites Consequently the following regions were desig-nated as having putatively elevated LD and were removedfrom specific analyses as indicated in the main text Firstregions covered by inversions or within 500 kb either side ofthe inversion breakpoints Segregating inversions were inden-tified in our experimental evolution populations by Kapunet al (2013) Second a 1 Mb region located within the inver-sion-free part of 3R This region was found to harbor a largenumber of low-frequency alleles that showed a similar allelefrequency change a pattern consistent with selection upon asingle haplotype block (Orozco-terWengel et al 2012) Finallyregions with a local recombination rate less than 2 cMMbThe recombination rates were taken from Comeron et al(2012) Approximately 300000 of the original 145 millionSNPs remained following the removal of these regionsBecause 3R contained three segregating inversions and theputative selected haplotype block only 15000 SNPs remainedfollowing removal of these regions whereas other chromo-somes maintained at least 50000 SNPs

373

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Short introns coordinates were taken from Lawrie et al(2013) To further ensure that short introns sites were behav-ing neutrally we only took those that were located within theputative low LD regions identified above

ROC Curves and Reproducibility

To test the reproducibility of our top candidates two addi-tional replicates were sequenced from generation 15 in eachtreatment with all sites being processed through the samepipeline as the original candidates Because we did not haveadditional replicates from the base population two new basereplicates were created This was done by combining thereads from the separate base replicates into a single filefrom which two thirds of all reads were randomly sampled(without replacement) and split in half to generate two newfiles each having one-third of the combined base populationreads (~50-fold coverage) All SNPs from the two new sets ofreplicates were then subjected to CMH testing and comparedfor consistency with the original data set using ROC curvesFor the sake of consistency the ~145 million SNPs from theoriginal data set were used in the comparisons

We used a modified ROC curve analysis to provide agraphical representation of the consistency between theoriginal and new data sets that is those based on threereplicates and two replicates respectively The ROC curvesquantify the overlap between the two data sets across suc-cessively larger groups of SNPs with SNPs having beenranked by their CMH P value Points falling on the 11 diag-onal line indicate that overlap between the two sets of SNPsmatches chance expectations of reproducibility whereaspoints lying above the diagonal indicate excess reproducibil-ity Because we are interested in changes that were mostlikely to be caused by the deterministic process of selectionas opposed to the stochastic effects of genetic drift onlySNPs where the same allele was increasing across all fivereplicates were considered as being reproducible

The expected degree of reproducibility under neutralitywas quantified using neutral MimicrEE simulations (Koflerand Schlotterer 2013) All parameters used were identical tothe simulations used for the FDR estimates outlined aboveexcept that a total of 50 data sets were simulated in this caseThe simulated data sets were divided into 10 equally sizedsubsets with each subset being further split into a group ofthree replicates and a group of two replicates Followingdownsampling of the data sets to match our empirical cov-erage distributions (discussed earlier) for each set of simula-tions CMH tests were applied to each group of three and tworeplicates The resulting data were compared for reproduc-ibility using ROC curves as above

LD Analyses

The degree of LD around the top 2000 candidates was esti-mated for both treatments following the same methods out-lined in Orozco-terWengel et al (2012) Briefly we estimatedthe median P value for all SNPs flanking the top 2000 candi-dates in nonoverlapping 50-bp windows either side of thefocal candidates The decay in the P value around the focal

candidates provides a proxy for the strength and extent of LDin regions flanking putative selected sites To ascertain thebackground level of LD among similar SNPs 2000 more SNPswere randomly drawn from the genome after excluding allregions within 200 bp of the top 2000 candidates The num-bers of random SNPs drawn from each chromosome armmatched the distribution of the top 2000 candidates ineach treatment

We used MimicrEE simulations (Kofler and Schlotterer2013) to estimate the degree of long-range LD achievablegiven a simple scenario where strong selection acted on asingleton located in a single haplotype The selection strength(05) was based on increasing a single allele from 1 to 30over 15 generations (ie the mean allele frequency changeobserved for the top 2000 candidate SNPs in the hot treat-ment) This was derived using s = 2ln(p0q1p1q0)t where sis the selection coefficient p and q the biallelic frequencies atthe start (subscript 0) and end (subscript 1) generations andt is the change in generations (eq 33 in Charlesworth andCharlesworth [2010]) Other than the selected site all otherparameters were the same as the neutral simulations outlinedabove Nine simulations in which the singleton increased infrequency by 25ndash35 were included in the analyses Thesewere evenly split into three subsets with each subset beingdownsampled and submitted to CMH testing (in the samemanner as the neutral simulations discussed earlier) Theresulting data were then used to produce Manhattan plotsthat enabled visual assessment the degree of long-range LD(fig 7)

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to A Bergland and D Lawrie for shar-ing the coordinates for the short introns and Martin Kapun forsharing unpublished results RT wishes to thank his colleaguesat the Institut fur Populationsgenetik and the Petrov laboratoryat Stanford University for helpful discussions on data interpre-tation This work was supported by a PhD fellowship from theVetmeduni Vienna the Austrian Science Fund (FWF P19467-B11 W1225-B20) and the European Research Council (ERC)grant ldquoArchAdaptrdquo awarded to CS

ReferencesAgresti A 2002 Categorical data analysis New York John Wiley amp SonsAndolfatto P Wall JD 2003 Linkage disequilibrium patterns across a

recombination gradient in African Drosophila melanogasterGenetics 1651289ndash1305

Bastide H Betancourt A Nolte V Tobler R Stobe P Futschik A SchlottererC 2013 A genome-wide fine-scale map of natural pigmentationvariation in Drosophila melanogaster PLoS Genet 9e1003534

Bergelson J Roux F 2010 Towards identifying genes underlying ecologi-cally relevant traits in Arabidopsis thaliana Nat Rev Genet 11867ndash879

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2013Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila [cited 2013 Aug 26] Availablefrom httparxivorgabs13035044

374

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Page 9: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

environments one hot and one cold treatment Each treat-ment cycled between two fixed temperature and light set-tings every 12 hmdashbetween 10 C and 20 C in the coldtreatment and 18 C and 28 C in the hot treatmentmdashwithlight and dark periods occurring with the hot and cold phaseof each treatment respectively The heatingcooling periodbetween the two fixed temperatures was relatively rapidtaking ~1 h before the new temperature is reached suchthat most time is spent at either of the modal temperaturesEach replicate was maintained at ~1000 individuals with~5050 sex ratios that were evenly distributed across five~225 ml bottles Each bottle contained ~70 ml of standardDrosophila medium Generations were nonoverlapping

Genome Sequencing and Mapping

Illumina full genome sequencing was performed on threereplicates for each of the base population and evolvedpopulations For the hot treatment a single replicate of the23rd generation and two from the 15th generation weresequenced (Orozco-terWengel et al 2012) whereas for thecold treatment all three sequenced replicates came from the15th generation Sequencing was performed using paired-endreads 75 bp in length in the hot treatment (GAIIx) and 100 bplong in the cold treatment (HiSeq) For each replicate andtime point ~500 flies were pooled and sequenced as onesample (ie Pool-Seq Futschik and Schlotterer 2010) Twoadditional replicates were also sequenced from the 15th gen-eration of the hot and cold treatments These were used tocheck the reproducibility of candidate detection as describedbelow Average genome-wide coverage ranged from ~35- to~90-fold for each of the sequenced replicates (supplementarytable S4 Supplementary Material online)

Mapping was carried out using a combination of purpose-built and third-party software which is described in detail inKofler Orozco-terWengel et al (2011) Briefly reads weretrimmed to remove low-quality bases and then mappedwith bwa (version 058c) (Li and Durbin 2009) against theD melanogaster reference genome (version 518) andWolbachia (NC_0029786) The alignment files were con-verted to the SAM format (Li et al 2009) with reads notmapped in proper pairs or with a mapping quality of lessthan 20 being removed SAM files were converted into thepileup format and indels and repeat sequences identified byRepeatMasker 329 (wwwrepeatmaskerorg last accessedNovember 5 2013) masked The aligned pileup files wereconverted to a synchronized file (Kofler Pandey et al 2011)from which SNPs and their accompanying allele frequencieswere derived All allele frequency changes presented hereinwere averaged across all three replicates (weighted by cover-age) We further removed all SNPs whose coverage was eitheramong the top 2 of a given replicate and time point or had aminimum count less than 30 across the 23 populations in ourextended data set that comprised additional replicates andtime points that were not analyzed here Because the averagecoverage per SNP across the extended data set was ~1500the minimum count of 30 equates to an average 2 minorallele frequency across all replicates and time points (although

allele frequencies could be lower in the time points that werespecifically analyzed in this study) Annotation was based onversion 540 of the D melanogaster reference genomeApproximately 145 million SNPs were called overall

Effective Population Size FDR andCandidate Estimation

In order to estimate Ne and FDR in both treatments forwardWrightndashFisher simulations were performed to generate ex-pected neutral changes during the experiment Three sets ofneutral allele frequency changes were simulated for eachtreatment matching the number of replicates in the studyFor each simulation 145 million SNPs were independentlysampled (ie approximating the number of SNPs in our realdata) from the distribution of allele frequencies expected after15 generations of neutral evolution conditional on the start-ing frequency for each SNP Starting frequencies for each SNPwere randomly drawn from the empirical distribution of startfrequencies in the experiment with the frequencies being thecoverage-weighted average across all replicates The expecteddistribution of allele frequencies after 15 generations was cre-ated by generating an NiNj transition matrix where eachcell is the binomial probability of seeing j alleles with a startingfrequency iN (ie N is the effective population size) and thenraising this matrix to the 15th power To simulate the effectsof DNA pooling in our study an additional round of binomialsampling was conducted on the start and end allele frequen-cies for each SNP To do this each SNP was assigned a randomcoverage value that was drawn from the empirical distribu-tion of coverages from a specific generation and treatmentThis was done separately for each replicate thereby capturingthe heterogeneity in coverage both within and betweenreplicates in the study

The effective population size of the observed data was es-timated as follows First multiple simulations were performedas outlined above each with a differing Ne Each simulationproduced a set of expected start and end allele frequencies fora given Ne from which an allele frequency change (AFC) dis-tribution was obtained (ie the end frequency minus startfrequency for all 145 million SNPs) The expected AFC distri-bution was then compared for goodness of fit with the ob-served AFC distribution by summing the squared difference ofthe AFC values evaluated in 1 frequency change binsBecause individual replicates were simulated we averagedthe goodness of fit across the three replicates to get a pointestimate for each Ne Additionally the error around this meanwas sufficiently large for both treatments that the best-fittingNe was not significantly different from many neighboring Ne

values Consequently we took the lowest Ne from amongstthis range (~250 supplementary table S1 SupplementaryMaterial online) as it is expected to provide the most conser-vative estimate of the FDR (because the strength of drift in-creases as Ne decreases) The Ne for individual chromosomeswas also inferred by limiting the observed data to that specificto each chromosome and then repeating the steps above

Following Ne estimation the number of putative selectedSNPs was inferred as follows First all ~145 million SNPs that

372

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

remained after filtering were subjected to the CMH test(Agresti 2002) The CMH test was shown to outperform sev-eral other statistics that were used for candidate inference inrecent Drosophila EampR studies (Kofler and Schlotterer 2013)Testing was conducted separately for the hot and cold treat-ments with each being contrasted against the base popula-tion Second the CMH test was applied to simulated SNPdata that was derived using the conservative Ne estimate of250 (discussed earlier) Finally the number of candidate SNPswas estimated by setting an empirical FDR of 0001 for eachtreatment Thus the number of candidates for a given treat-ment was quantified as the number of observed SNPs thathad CMH P values that were lower than the least significantCMH P value from among the 01 most extreme simulatedSNPs The number of candidate SNPs was also inferred for arange of different FDR values and the results are shown insupplementary table S1 Supplementary Material online

To examine the effects that linkage between loci had onthe candidate inference we also performed 30 forward neu-tral simulations starting from a population with 250 uniquehaplotypes (ie 250 homozygous diploid individuals) for 15generations Thus the number of founding haplotypes andthe effective population size during the simulations was 250Assuming that the isofemale founders maintained two segre-gating chromosomes per line on averagemdashwhich is notunreasonable considering that they were five generationsold when making the mass-bred populationsmdashthen thesenumbers roughly accord with those expected within ourexperimental populations The simulations were performedusing MimicrEE (Kofler and Schlotterer 2013) This softwareenabled the modeling of recombination at rates that werederived from empirical estimates (Fiston-Lavier et al 2010)Thirty simulations were performed and divided into 10 equal-sized subsets The three simulations from each of the 10subsets were then downsampled to match the coveragedistributions of the three empirical replicates from the rele-vant generation and treatment Next we applied the CMHtest to all sites in each subset and estimated the number ofcandidates thereof via the FDR method outlined in the pre-ceding paragraph The haplotypes used in the simulationswere created by neutral coalescent simulations and aredescribed in Bastide et al (2013)

Enrichment in Thermal Tolerance Genes

We tested for treatment-specific evolution by looking forcandidate enrichment in genes associated with heat andcold tolerance Thermal tolerance gene sets were takenfrom the Candidate Stress Genes in Drosophila databasewhich was obtained from the Centre for EnvironmentalStress and Adaptation Research website (httpcesarorgaulast accessed November 5 2013) Only genes with a signifi-cant and unique association with either heat or cold tol-erance were included in the analyses (supplementary tableS2 Supplementary Material online) Separate enrichmentanalyses were performed on the 2000 4000 6000 800010000 12000 14000 16000 18000 20000 25000 3000035000 40000 45000 50000 60000 70000 80000 90000

and 100000 most significant SNPs in each treatment Notethat each successive set of SNPs was an aggregation of pre-vious SNP sets and the subsequent set of most significantSNPs For instance the set that contained the top 4000 SNPscomprised the set containing the top 2000 SNPs along withthe next 2000 most significant SNPs All enrichment analyseswere performed using Gowinda software (Kofler andSchlotterer 2012) Gowinda generates P values by conductingmultiple simulations in which the position of candidate SNPsare randomly chosen from the ~145 million of SNPs scat-tered throughout the genome The P value was then quan-tified as the proportion of simulations that had more genesspecific to a given category (in our study genes associatedeither with hot or cold thermotolerance) containing at leastone candidate SNP than the observed data set (Kofler andSchlotterer 2012) In total we performed 100000 permuta-tions for each set of candidate SNPs in each treatment andfor each set of thermal tolerance genes In each case a genecontaining multiple candidate SNPs was only counted once(ie the gene option of the mode argument) Gene annota-tions based on version 540 of D melanogaster referencegenome and only the full coding region of each gene andnot the upstream or downstream flanking regions (ie thegene option of the gene-definition argument) wereconsidered Because the thermal genes were specificallytested for enrichment with the expectation that therewould be enrichment for hot alleles in heat tolerancegenes and cold alleles in cold-tolerance genes we report Pvalues rather q values adjusted for testing multiple categoriesThis software was specifically designed to deal with differ-ences in average gene size among categories and thereforedoes not result in the overrepresentation of categories withlarger than average gene sizes (Kofler and Schlotterer 2012)

Identification of Putative High LD Regions andShort Introns

LD can result in large number of false positives for instancevia allele frequency correlations between selected and neutrallinked sites Consequently the following regions were desig-nated as having putatively elevated LD and were removedfrom specific analyses as indicated in the main text Firstregions covered by inversions or within 500 kb either side ofthe inversion breakpoints Segregating inversions were inden-tified in our experimental evolution populations by Kapunet al (2013) Second a 1 Mb region located within the inver-sion-free part of 3R This region was found to harbor a largenumber of low-frequency alleles that showed a similar allelefrequency change a pattern consistent with selection upon asingle haplotype block (Orozco-terWengel et al 2012) Finallyregions with a local recombination rate less than 2 cMMbThe recombination rates were taken from Comeron et al(2012) Approximately 300000 of the original 145 millionSNPs remained following the removal of these regionsBecause 3R contained three segregating inversions and theputative selected haplotype block only 15000 SNPs remainedfollowing removal of these regions whereas other chromo-somes maintained at least 50000 SNPs

373

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Short introns coordinates were taken from Lawrie et al(2013) To further ensure that short introns sites were behav-ing neutrally we only took those that were located within theputative low LD regions identified above

ROC Curves and Reproducibility

To test the reproducibility of our top candidates two addi-tional replicates were sequenced from generation 15 in eachtreatment with all sites being processed through the samepipeline as the original candidates Because we did not haveadditional replicates from the base population two new basereplicates were created This was done by combining thereads from the separate base replicates into a single filefrom which two thirds of all reads were randomly sampled(without replacement) and split in half to generate two newfiles each having one-third of the combined base populationreads (~50-fold coverage) All SNPs from the two new sets ofreplicates were then subjected to CMH testing and comparedfor consistency with the original data set using ROC curvesFor the sake of consistency the ~145 million SNPs from theoriginal data set were used in the comparisons

We used a modified ROC curve analysis to provide agraphical representation of the consistency between theoriginal and new data sets that is those based on threereplicates and two replicates respectively The ROC curvesquantify the overlap between the two data sets across suc-cessively larger groups of SNPs with SNPs having beenranked by their CMH P value Points falling on the 11 diag-onal line indicate that overlap between the two sets of SNPsmatches chance expectations of reproducibility whereaspoints lying above the diagonal indicate excess reproducibil-ity Because we are interested in changes that were mostlikely to be caused by the deterministic process of selectionas opposed to the stochastic effects of genetic drift onlySNPs where the same allele was increasing across all fivereplicates were considered as being reproducible

The expected degree of reproducibility under neutralitywas quantified using neutral MimicrEE simulations (Koflerand Schlotterer 2013) All parameters used were identical tothe simulations used for the FDR estimates outlined aboveexcept that a total of 50 data sets were simulated in this caseThe simulated data sets were divided into 10 equally sizedsubsets with each subset being further split into a group ofthree replicates and a group of two replicates Followingdownsampling of the data sets to match our empirical cov-erage distributions (discussed earlier) for each set of simula-tions CMH tests were applied to each group of three and tworeplicates The resulting data were compared for reproduc-ibility using ROC curves as above

LD Analyses

The degree of LD around the top 2000 candidates was esti-mated for both treatments following the same methods out-lined in Orozco-terWengel et al (2012) Briefly we estimatedthe median P value for all SNPs flanking the top 2000 candi-dates in nonoverlapping 50-bp windows either side of thefocal candidates The decay in the P value around the focal

candidates provides a proxy for the strength and extent of LDin regions flanking putative selected sites To ascertain thebackground level of LD among similar SNPs 2000 more SNPswere randomly drawn from the genome after excluding allregions within 200 bp of the top 2000 candidates The num-bers of random SNPs drawn from each chromosome armmatched the distribution of the top 2000 candidates ineach treatment

We used MimicrEE simulations (Kofler and Schlotterer2013) to estimate the degree of long-range LD achievablegiven a simple scenario where strong selection acted on asingleton located in a single haplotype The selection strength(05) was based on increasing a single allele from 1 to 30over 15 generations (ie the mean allele frequency changeobserved for the top 2000 candidate SNPs in the hot treat-ment) This was derived using s = 2ln(p0q1p1q0)t where sis the selection coefficient p and q the biallelic frequencies atthe start (subscript 0) and end (subscript 1) generations andt is the change in generations (eq 33 in Charlesworth andCharlesworth [2010]) Other than the selected site all otherparameters were the same as the neutral simulations outlinedabove Nine simulations in which the singleton increased infrequency by 25ndash35 were included in the analyses Thesewere evenly split into three subsets with each subset beingdownsampled and submitted to CMH testing (in the samemanner as the neutral simulations discussed earlier) Theresulting data were then used to produce Manhattan plotsthat enabled visual assessment the degree of long-range LD(fig 7)

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to A Bergland and D Lawrie for shar-ing the coordinates for the short introns and Martin Kapun forsharing unpublished results RT wishes to thank his colleaguesat the Institut fur Populationsgenetik and the Petrov laboratoryat Stanford University for helpful discussions on data interpre-tation This work was supported by a PhD fellowship from theVetmeduni Vienna the Austrian Science Fund (FWF P19467-B11 W1225-B20) and the European Research Council (ERC)grant ldquoArchAdaptrdquo awarded to CS

ReferencesAgresti A 2002 Categorical data analysis New York John Wiley amp SonsAndolfatto P Wall JD 2003 Linkage disequilibrium patterns across a

recombination gradient in African Drosophila melanogasterGenetics 1651289ndash1305

Bastide H Betancourt A Nolte V Tobler R Stobe P Futschik A SchlottererC 2013 A genome-wide fine-scale map of natural pigmentationvariation in Drosophila melanogaster PLoS Genet 9e1003534

Bergelson J Roux F 2010 Towards identifying genes underlying ecologi-cally relevant traits in Arabidopsis thaliana Nat Rev Genet 11867ndash879

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2013Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila [cited 2013 Aug 26] Availablefrom httparxivorgabs13035044

374

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Page 10: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

remained after filtering were subjected to the CMH test(Agresti 2002) The CMH test was shown to outperform sev-eral other statistics that were used for candidate inference inrecent Drosophila EampR studies (Kofler and Schlotterer 2013)Testing was conducted separately for the hot and cold treat-ments with each being contrasted against the base popula-tion Second the CMH test was applied to simulated SNPdata that was derived using the conservative Ne estimate of250 (discussed earlier) Finally the number of candidate SNPswas estimated by setting an empirical FDR of 0001 for eachtreatment Thus the number of candidates for a given treat-ment was quantified as the number of observed SNPs thathad CMH P values that were lower than the least significantCMH P value from among the 01 most extreme simulatedSNPs The number of candidate SNPs was also inferred for arange of different FDR values and the results are shown insupplementary table S1 Supplementary Material online

To examine the effects that linkage between loci had onthe candidate inference we also performed 30 forward neu-tral simulations starting from a population with 250 uniquehaplotypes (ie 250 homozygous diploid individuals) for 15generations Thus the number of founding haplotypes andthe effective population size during the simulations was 250Assuming that the isofemale founders maintained two segre-gating chromosomes per line on averagemdashwhich is notunreasonable considering that they were five generationsold when making the mass-bred populationsmdashthen thesenumbers roughly accord with those expected within ourexperimental populations The simulations were performedusing MimicrEE (Kofler and Schlotterer 2013) This softwareenabled the modeling of recombination at rates that werederived from empirical estimates (Fiston-Lavier et al 2010)Thirty simulations were performed and divided into 10 equal-sized subsets The three simulations from each of the 10subsets were then downsampled to match the coveragedistributions of the three empirical replicates from the rele-vant generation and treatment Next we applied the CMHtest to all sites in each subset and estimated the number ofcandidates thereof via the FDR method outlined in the pre-ceding paragraph The haplotypes used in the simulationswere created by neutral coalescent simulations and aredescribed in Bastide et al (2013)

Enrichment in Thermal Tolerance Genes

We tested for treatment-specific evolution by looking forcandidate enrichment in genes associated with heat andcold tolerance Thermal tolerance gene sets were takenfrom the Candidate Stress Genes in Drosophila databasewhich was obtained from the Centre for EnvironmentalStress and Adaptation Research website (httpcesarorgaulast accessed November 5 2013) Only genes with a signifi-cant and unique association with either heat or cold tol-erance were included in the analyses (supplementary tableS2 Supplementary Material online) Separate enrichmentanalyses were performed on the 2000 4000 6000 800010000 12000 14000 16000 18000 20000 25000 3000035000 40000 45000 50000 60000 70000 80000 90000

and 100000 most significant SNPs in each treatment Notethat each successive set of SNPs was an aggregation of pre-vious SNP sets and the subsequent set of most significantSNPs For instance the set that contained the top 4000 SNPscomprised the set containing the top 2000 SNPs along withthe next 2000 most significant SNPs All enrichment analyseswere performed using Gowinda software (Kofler andSchlotterer 2012) Gowinda generates P values by conductingmultiple simulations in which the position of candidate SNPsare randomly chosen from the ~145 million of SNPs scat-tered throughout the genome The P value was then quan-tified as the proportion of simulations that had more genesspecific to a given category (in our study genes associatedeither with hot or cold thermotolerance) containing at leastone candidate SNP than the observed data set (Kofler andSchlotterer 2012) In total we performed 100000 permuta-tions for each set of candidate SNPs in each treatment andfor each set of thermal tolerance genes In each case a genecontaining multiple candidate SNPs was only counted once(ie the gene option of the mode argument) Gene annota-tions based on version 540 of D melanogaster referencegenome and only the full coding region of each gene andnot the upstream or downstream flanking regions (ie thegene option of the gene-definition argument) wereconsidered Because the thermal genes were specificallytested for enrichment with the expectation that therewould be enrichment for hot alleles in heat tolerancegenes and cold alleles in cold-tolerance genes we report Pvalues rather q values adjusted for testing multiple categoriesThis software was specifically designed to deal with differ-ences in average gene size among categories and thereforedoes not result in the overrepresentation of categories withlarger than average gene sizes (Kofler and Schlotterer 2012)

Identification of Putative High LD Regions andShort Introns

LD can result in large number of false positives for instancevia allele frequency correlations between selected and neutrallinked sites Consequently the following regions were desig-nated as having putatively elevated LD and were removedfrom specific analyses as indicated in the main text Firstregions covered by inversions or within 500 kb either side ofthe inversion breakpoints Segregating inversions were inden-tified in our experimental evolution populations by Kapunet al (2013) Second a 1 Mb region located within the inver-sion-free part of 3R This region was found to harbor a largenumber of low-frequency alleles that showed a similar allelefrequency change a pattern consistent with selection upon asingle haplotype block (Orozco-terWengel et al 2012) Finallyregions with a local recombination rate less than 2 cMMbThe recombination rates were taken from Comeron et al(2012) Approximately 300000 of the original 145 millionSNPs remained following the removal of these regionsBecause 3R contained three segregating inversions and theputative selected haplotype block only 15000 SNPs remainedfollowing removal of these regions whereas other chromo-somes maintained at least 50000 SNPs

373

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Short introns coordinates were taken from Lawrie et al(2013) To further ensure that short introns sites were behav-ing neutrally we only took those that were located within theputative low LD regions identified above

ROC Curves and Reproducibility

To test the reproducibility of our top candidates two addi-tional replicates were sequenced from generation 15 in eachtreatment with all sites being processed through the samepipeline as the original candidates Because we did not haveadditional replicates from the base population two new basereplicates were created This was done by combining thereads from the separate base replicates into a single filefrom which two thirds of all reads were randomly sampled(without replacement) and split in half to generate two newfiles each having one-third of the combined base populationreads (~50-fold coverage) All SNPs from the two new sets ofreplicates were then subjected to CMH testing and comparedfor consistency with the original data set using ROC curvesFor the sake of consistency the ~145 million SNPs from theoriginal data set were used in the comparisons

We used a modified ROC curve analysis to provide agraphical representation of the consistency between theoriginal and new data sets that is those based on threereplicates and two replicates respectively The ROC curvesquantify the overlap between the two data sets across suc-cessively larger groups of SNPs with SNPs having beenranked by their CMH P value Points falling on the 11 diag-onal line indicate that overlap between the two sets of SNPsmatches chance expectations of reproducibility whereaspoints lying above the diagonal indicate excess reproducibil-ity Because we are interested in changes that were mostlikely to be caused by the deterministic process of selectionas opposed to the stochastic effects of genetic drift onlySNPs where the same allele was increasing across all fivereplicates were considered as being reproducible

The expected degree of reproducibility under neutralitywas quantified using neutral MimicrEE simulations (Koflerand Schlotterer 2013) All parameters used were identical tothe simulations used for the FDR estimates outlined aboveexcept that a total of 50 data sets were simulated in this caseThe simulated data sets were divided into 10 equally sizedsubsets with each subset being further split into a group ofthree replicates and a group of two replicates Followingdownsampling of the data sets to match our empirical cov-erage distributions (discussed earlier) for each set of simula-tions CMH tests were applied to each group of three and tworeplicates The resulting data were compared for reproduc-ibility using ROC curves as above

LD Analyses

The degree of LD around the top 2000 candidates was esti-mated for both treatments following the same methods out-lined in Orozco-terWengel et al (2012) Briefly we estimatedthe median P value for all SNPs flanking the top 2000 candi-dates in nonoverlapping 50-bp windows either side of thefocal candidates The decay in the P value around the focal

candidates provides a proxy for the strength and extent of LDin regions flanking putative selected sites To ascertain thebackground level of LD among similar SNPs 2000 more SNPswere randomly drawn from the genome after excluding allregions within 200 bp of the top 2000 candidates The num-bers of random SNPs drawn from each chromosome armmatched the distribution of the top 2000 candidates ineach treatment

We used MimicrEE simulations (Kofler and Schlotterer2013) to estimate the degree of long-range LD achievablegiven a simple scenario where strong selection acted on asingleton located in a single haplotype The selection strength(05) was based on increasing a single allele from 1 to 30over 15 generations (ie the mean allele frequency changeobserved for the top 2000 candidate SNPs in the hot treat-ment) This was derived using s = 2ln(p0q1p1q0)t where sis the selection coefficient p and q the biallelic frequencies atthe start (subscript 0) and end (subscript 1) generations andt is the change in generations (eq 33 in Charlesworth andCharlesworth [2010]) Other than the selected site all otherparameters were the same as the neutral simulations outlinedabove Nine simulations in which the singleton increased infrequency by 25ndash35 were included in the analyses Thesewere evenly split into three subsets with each subset beingdownsampled and submitted to CMH testing (in the samemanner as the neutral simulations discussed earlier) Theresulting data were then used to produce Manhattan plotsthat enabled visual assessment the degree of long-range LD(fig 7)

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to A Bergland and D Lawrie for shar-ing the coordinates for the short introns and Martin Kapun forsharing unpublished results RT wishes to thank his colleaguesat the Institut fur Populationsgenetik and the Petrov laboratoryat Stanford University for helpful discussions on data interpre-tation This work was supported by a PhD fellowship from theVetmeduni Vienna the Austrian Science Fund (FWF P19467-B11 W1225-B20) and the European Research Council (ERC)grant ldquoArchAdaptrdquo awarded to CS

ReferencesAgresti A 2002 Categorical data analysis New York John Wiley amp SonsAndolfatto P Wall JD 2003 Linkage disequilibrium patterns across a

recombination gradient in African Drosophila melanogasterGenetics 1651289ndash1305

Bastide H Betancourt A Nolte V Tobler R Stobe P Futschik A SchlottererC 2013 A genome-wide fine-scale map of natural pigmentationvariation in Drosophila melanogaster PLoS Genet 9e1003534

Bergelson J Roux F 2010 Towards identifying genes underlying ecologi-cally relevant traits in Arabidopsis thaliana Nat Rev Genet 11867ndash879

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2013Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila [cited 2013 Aug 26] Availablefrom httparxivorgabs13035044

374

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Page 11: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

Short introns coordinates were taken from Lawrie et al(2013) To further ensure that short introns sites were behav-ing neutrally we only took those that were located within theputative low LD regions identified above

ROC Curves and Reproducibility

To test the reproducibility of our top candidates two addi-tional replicates were sequenced from generation 15 in eachtreatment with all sites being processed through the samepipeline as the original candidates Because we did not haveadditional replicates from the base population two new basereplicates were created This was done by combining thereads from the separate base replicates into a single filefrom which two thirds of all reads were randomly sampled(without replacement) and split in half to generate two newfiles each having one-third of the combined base populationreads (~50-fold coverage) All SNPs from the two new sets ofreplicates were then subjected to CMH testing and comparedfor consistency with the original data set using ROC curvesFor the sake of consistency the ~145 million SNPs from theoriginal data set were used in the comparisons

We used a modified ROC curve analysis to provide agraphical representation of the consistency between theoriginal and new data sets that is those based on threereplicates and two replicates respectively The ROC curvesquantify the overlap between the two data sets across suc-cessively larger groups of SNPs with SNPs having beenranked by their CMH P value Points falling on the 11 diag-onal line indicate that overlap between the two sets of SNPsmatches chance expectations of reproducibility whereaspoints lying above the diagonal indicate excess reproducibil-ity Because we are interested in changes that were mostlikely to be caused by the deterministic process of selectionas opposed to the stochastic effects of genetic drift onlySNPs where the same allele was increasing across all fivereplicates were considered as being reproducible

The expected degree of reproducibility under neutralitywas quantified using neutral MimicrEE simulations (Koflerand Schlotterer 2013) All parameters used were identical tothe simulations used for the FDR estimates outlined aboveexcept that a total of 50 data sets were simulated in this caseThe simulated data sets were divided into 10 equally sizedsubsets with each subset being further split into a group ofthree replicates and a group of two replicates Followingdownsampling of the data sets to match our empirical cov-erage distributions (discussed earlier) for each set of simula-tions CMH tests were applied to each group of three and tworeplicates The resulting data were compared for reproduc-ibility using ROC curves as above

LD Analyses

The degree of LD around the top 2000 candidates was esti-mated for both treatments following the same methods out-lined in Orozco-terWengel et al (2012) Briefly we estimatedthe median P value for all SNPs flanking the top 2000 candi-dates in nonoverlapping 50-bp windows either side of thefocal candidates The decay in the P value around the focal

candidates provides a proxy for the strength and extent of LDin regions flanking putative selected sites To ascertain thebackground level of LD among similar SNPs 2000 more SNPswere randomly drawn from the genome after excluding allregions within 200 bp of the top 2000 candidates The num-bers of random SNPs drawn from each chromosome armmatched the distribution of the top 2000 candidates ineach treatment

We used MimicrEE simulations (Kofler and Schlotterer2013) to estimate the degree of long-range LD achievablegiven a simple scenario where strong selection acted on asingleton located in a single haplotype The selection strength(05) was based on increasing a single allele from 1 to 30over 15 generations (ie the mean allele frequency changeobserved for the top 2000 candidate SNPs in the hot treat-ment) This was derived using s = 2ln(p0q1p1q0)t where sis the selection coefficient p and q the biallelic frequencies atthe start (subscript 0) and end (subscript 1) generations andt is the change in generations (eq 33 in Charlesworth andCharlesworth [2010]) Other than the selected site all otherparameters were the same as the neutral simulations outlinedabove Nine simulations in which the singleton increased infrequency by 25ndash35 were included in the analyses Thesewere evenly split into three subsets with each subset beingdownsampled and submitted to CMH testing (in the samemanner as the neutral simulations discussed earlier) Theresulting data were then used to produce Manhattan plotsthat enabled visual assessment the degree of long-range LD(fig 7)

Supplementary MaterialSupplementary figures S1ndashS4 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to A Bergland and D Lawrie for shar-ing the coordinates for the short introns and Martin Kapun forsharing unpublished results RT wishes to thank his colleaguesat the Institut fur Populationsgenetik and the Petrov laboratoryat Stanford University for helpful discussions on data interpre-tation This work was supported by a PhD fellowship from theVetmeduni Vienna the Austrian Science Fund (FWF P19467-B11 W1225-B20) and the European Research Council (ERC)grant ldquoArchAdaptrdquo awarded to CS

ReferencesAgresti A 2002 Categorical data analysis New York John Wiley amp SonsAndolfatto P Wall JD 2003 Linkage disequilibrium patterns across a

recombination gradient in African Drosophila melanogasterGenetics 1651289ndash1305

Bastide H Betancourt A Nolte V Tobler R Stobe P Futschik A SchlottererC 2013 A genome-wide fine-scale map of natural pigmentationvariation in Drosophila melanogaster PLoS Genet 9e1003534

Bergelson J Roux F 2010 Towards identifying genes underlying ecologi-cally relevant traits in Arabidopsis thaliana Nat Rev Genet 11867ndash879

Bergland AO Behrman EL OrsquoBrien KR Schmidt PS Petrov DA 2013Genomic evidence of rapid and stable adaptive oscillations overseasonal time scales in Drosophila [cited 2013 Aug 26] Availablefrom httparxivorgabs13035044

374

Tobler et al doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019

Page 12: Massive Habitat-Specific Genomic Response in D. melanogaster …orca.cf.ac.uk/63273/1/mst205.pdf · 2019. 2. 19. · Article Massive Habitat-Specific Genomic Response in D. melanogaster

Bubliy O Loeschcke V 2005 Correlated responses to selection for stressresistance and longevity in a laboratory population of Drosophilamelanogaster J Evol Biol 18789ndash803

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Caracristi G Schlotterer C 2003 Genetic differentiation betweenAmerican and European Drosophila melanogaster populationscould be attributed to admixture of African alleles Mol Biol Evol20792ndash799

Carroll D 2011 Genome engineering with zinc-finger nucleases Genetics188773ndash782

Charlesworth B Charlesworth D 2010 Elements of evolutionary genet-ics Greenwood Village (CO) Roberts and Company Publishers

Clemente F Vogl C 2012 Unconstrained evolution in short introns ndash Ananalysis of genome-wide polymorphism and divergence data fromDrosophila J Evol Biol 251975ndash1990

Comeron JM Ratnappan R Bailin S 2012 The many landscapes ofrecombination in Drosophila melanogaster PLoS Genet 8e1002905

Corbett-Detig RB Hartl DL 2012 Population genomics of inversionpolymorphisms in Drosophila melanogaster PLoS Genet 8e1003056

Dettman JR Rodrigue N Melnyk AH Wong A Bailey SF Kassen R 2012Evolutionary insight from whole-genome sequencing of experimen-tally evolved microbes Mol Ecol 212058ndash2077

Farlow A Dolezal M Hua L Schlotterer C 2012 The genomic signatureof splicing-coupled selection differs between long and short intronsMol Biol Evol 2921ndash24

Fiston-Lavier A-S Singh ND Lipatov M Petrov DA 2010 Drosophilamelanogaster recombination rate calculator Gene 46318ndash20

Futschik A Schlotterer C 2010 The next generation of molecularmarkers from massively parallel sequencing of pooled DNA samplesGenetics 186207ndash218

Haldane J 1957 The cost of natural selection J Genet 55511ndash524Halligan DL Keightley PD 2006 Ubiquitous selective constraints in

the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16875ndash884

Hoffman A Hallas R Sinclair C Partridge L 2001 Rapid loss of stressresistance in Drosophila melanogaster under adaptation to labora-tory culture Evolution 55436ndash438

Itoh M Nanba N Hasegawa M Inomata N Kondo R Oshima MTakano-Shimizu T 2010 Seasonal changes in the long-distance link-age disequilibrium in Drosophila melanogaster J Hered 10126ndash32

Kapun M van Schalkwyk H McAllister B Flatt T Schlotterer C 2013Inference of chromosomal inversion dynamics from Pool-Seq datain natural and laboratory populations of Drosophila melanogaster[cited 2013 Aug 26] Available from httparxivorgabs13072461

Kawecki TJ Lenski RE Ebert D Hollis B Olivieri I Whitlock MC 2012Experimental evolution Trends Ecol Evol 27547ndash560

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V FutschikA Kosiol C Schlotterer C 2011 PoPoolation a toolbox for popula-tion genetic analysis of next generation sequencing data frompooled individuals PLoS One 6e15925

Kofler R Pandey RV Schlotterer C 2011 PoPoolation2 identifyingdifferentiation between populations using sequencing of pooledDNA samples (Pool-Seq) Bioinformatics 273435ndash3436

Kofler R Schlotterer C 2012 Gowinda unbiased analysis of gene setenrichment for genome-wide association studies Bioinformatics 282084ndash2085

Kofler R Schlotterer C 2013 Guidelines for the design of evolve andresequencing studies [cited 2013 Aug 26] Available from httparxivorgabs13074954

Kolss M Kawecki TJ 2008 Reduced learning ability as a consequence ofevolutionary adaptation to nutritional stress in Drosophila melano-gaster Ecol Entomol 33583ndash588

Lawrie DS Messer PW Hershberg R Petrov DA 2013 Strong purifyingselection at synonymous sites in D melanogaster PLoS Genet 9e1003527

Li H Durbin R 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 251754ndash1760

Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth GAbecasis G Durbin R 2009 The Sequence AlignmentMap formatand SAMtools Bioinformatics 252078ndash2079

Liu J Li C Yu Z et al (12 co-authors) 2012 Efficient and specific mod-ifications of the Drosophila genome by means of an easy TALENstrategy J Genet Genomics 39209ndash215

Mackay TFC Richards S Stone EA et al (52 co-authors) 2012 TheDrosophila melanogaster Genetic Reference Panel Nature 482173ndash178

Miyashita N Langley H 1988 Molecular and phenotypic variation of thewhite locus region in Drosophila melanogaster Genetics 120199ndash212

Nunney L 2003 The cost of natural selection revisited Ann Zool Fennici40185ndash194

Orozco-terWengel P Kapun M Nolte V Kofler R Flatt T Schlotterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parsch J Novozhilov S Saminadin-Peter SS Wong KM Andolfatto P2010 On the utility of short intron sequences as a reference for thedetection of positive and negative selection in Drosophila Mol BiolEvol 271226ndash1234

Parts L Cubillos FA Warringer J et al (14 co-authors) 2011 Revealingthe genetic structure of a trait by sequencing a population underselection Genome Res 211131ndash1138

Pool JE Aquadro CF 2006 History and structure of sub-Saharan pop-ulations of Drosophila melanogaster Genetics 174915ndash929

Pool JE Corbett-Detig RB Sugino RP et al (11 co-authors) 2012Population Genomics of sub-saharan Drosophila melanogasterAfrican diversity and non-African admixture PLoS Genet 8e1003080

Remolina S Chang P Leips J Nuzhdin S V Hughes K 2012 Genomicbasis of aging and life-history evolution in Drosophila melanogasterEvolution 663390ndash3403

Rose MR Vu LN Park SU Graves JL Jr 1992 Selection on stress resis-tance increases longevity in Drosophila melanogaster Exp Gerontol27241ndash250

Stalker H 1980 Chromosome inversion polymorphism in Drosophilamelanogaster II Relationship of inversion frequencies to latitudeseason wing-loading and flight activity Genetics 95211ndash223

Stapley J Reger J Feulner PGD Smadja C Galindo J Ekblom R BennisonC Ball A Beckerman AP Slate J 2010 Adaptation genomics thenext generation Trends Ecol Evol 25705ndash712

Takano-Shimizu T Kawabe A Inomata N Nanba N Kondo R Inoue YItoh M 2004 Interlocus nonrandom association of polymorphismsin Drosophila chemoreceptor genes Proc Natl Acad Sci U S A 10114156ndash14161

Tantawy A 1964 Studies on natural populations of Drosophila IIIMorphological and genetic differences of wing length inDrosophila melanogaster and D simulans in relation to Evolution18560ndash570

Turner TL Miller PM Cochrane VA 2013 Combining genome-widemethods to investigate the genetic complexity of courtship songvariation in Drosophila melanogaster Mol Biol Evol 302113ndash2020

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation Drosophilamelanogaster PLoS Genet 7e1001336

Yu Z Ren M Wang Z Zhang B Rong YS Jiao R Gao G 2013 Highlyefficient genome modifications mediated by CRISPRCas9 inDrosophila Genetics 195289ndash291

Zhou D Udpa N Gersten M et al (11 co-authors) 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

375

Adaptation Genomics of D melanogaster doi101093molbevmst205 MBED

ownloaded from

httpsacademicoupcom

mbearticle-abstract312364997936 by C

ardiff University user on 19 February 2019