reconciling differences in pool-gwas between populations ... · 5 scott et al. 2011; david et al....

41
1 1 2 3 Reconciling differences in Pool-GWAS between populations: a case study of 4 female abdominal pigmentation in Drosophila melanogaster 5 6 7 Lukas Endler 1 , Andrea J. Betancourt 1 , Viola Nolte 1 , and Christian Schlötterer 1* 8 9 10 11 1 Institut für Populationsgenetik , Vetmeduni Vienna, 1210 Wien, Austria 12 13 14 * Corresponding author 15 16 Data submitted to European Nucleotide Archive (ENA): 17 ERP001827 and ERP013300 18 19 Genetics: Early Online, published on December 29, 2015 as 10.1534/genetics.115.183376 Copyright 2015.

Upload: dangtruc

Post on 22-Apr-2018

221 views

Category:

Documents


3 download

TRANSCRIPT

1

1

2

3

Reconciling differences in Pool-GWAS between populations: a case study of 4

female abdominal pigmentation in Drosophila melanogaster 5

6

7

Lukas Endler1, Andrea J. Betancourt1, Viola Nolte1, and Christian Schlötterer1* 8

9

10

11

1Institut für Populationsgenetik , Vetmeduni Vienna, 1210 Wien, Austria 12

13

14

*Corresponding author 15

16

Data submitted to European Nucleotide Archive (ENA): 17

ERP001827 and ERP013300 18

19

Genetics: Early Online, published on December 29, 2015 as 10.1534/genetics.115.183376

Copyright 2015.

2

Running Title: Pool-GWAS in multiple populations 1

2

Key words: GWAS, Genome wide association study, pigmentation, dominance, 3

Drosophila melanogaster, pooled sequencing 4

5 Corresponding author: 6

Christian Schlötterer 7

Institut für Populationsgenetik 8

Vetmeduni Vienna 9

Veterinärplatz 1, 1210 Wien, Austria 10

E-mail: [email protected] (CS) 11

Telephone: +43 1 25077-4300 12

13

14

15

3

1 Abstract 2

The degree of concordance between populations in the genetic architecture of a given 3

trait is an important issue in medical and evolutionary genetics. Here, we address this 4

problem, using a replicated pooled genome wide association approach (Pool-GWAS) 5

to compare the genetic basis of variation in abdominal pigmentation in female 6

European and South African Drosophila melanogaster. We find that, in both the 7

European and the South African flies, variants near the tan and bric-à-brac 1 (bab1) 8

genes are most strongly associated with pigmentation. However, the relative 9

contribution of these loci differs: in the European populations, tan outranks bab1, 10

while the converse is true for the South African flies. Using simulations, we show that 11

this result can be explained parsimoniously, without invoking different causal variants 12

between the populations, by a combination of frequency differences between the two 13

populations and dominance for the causal alleles at the bab1 locus. Our results 14

demonstrate the power of cost-effective, replicated Pool-GWAS to shed light on 15

differences in the genetic architecture of a given trait between populations. 16

17

Introduction 18

Insight into the genetic basis of phenotypic variation within species is essential 19

to understanding of short-term evolutionary change, as phenotypic variation is an 20

important substrate for natural selection. This endeavor has benefited from the steep 21

decrease in the cost of rapid genotyping techniques, including DNA sequencing, 22

which facilitates the mapping of traits using either genetic crosses (i.e., quantitative 23

trait mapping), or statistical association studies (i.e., genome-wide association studies, 24

or GWAS, reviewed in Visscher et al. 2012). As a result, it is now feasible to replicate 25

4

association studies in multiple samples. One question we can therefore address is 1

whether the same loci and alleles underlie phenotypic variation in different 2

populations of species with broad geographic distributions, or whether the alleles tend 3

to be population-specific. Human disease studies sometimes attempt to replicate the 4

results of genome-wide association studies in additional populations, with varying 5

success (e.g, Li et al. 2008; Ng et al. 2008; Siontis et al. 2010). Results for disease 6

traits, however, may not be representative of those for adaptive traits. Genetic 7

diseases are probably most often due to unconditionally deleterious alleles kept at low 8

frequency by purifying selection, and thus likely to be population specific due to their 9

rarity (Ioannidis et al. 2009, 2011; Gorlov et al. 2014). Data for adaptively varying 10

traits are sparser than those for diseases. The best-studied examples are lactase 11

persistence in humans, which is due to several independently derived alleles, the 12

frequency of which differs among geographic regions (Ingram et al. 2009; Itan et al. 13

2010; Ranciaro et al. 2014), and cryptic coat coloration in mice, for which the genetic 14

basis also differs between populations (Hoekstra and Nachman 2003; Hoekstra et al. 15

2006). 16

Here, we replicate a study of abdominal pigmentation in Drosophila 17

melanogaster (Bastide et al. 2013), originally done in two European populations, in a 18

geographically distant African population. Pigmentation in D. melanogaster is a 19

useful trait for this purpose for several reasons. For one, the underlying genetic 20

pathways are well-described, with most of the characterized genes playing a role in 21

either biochemical synthesis of the pigments, or in sexually dimorphic and spatial 22

regulation of the pigment-synthesis genes (Wittkopp et al. 2003; Gompel et al. 2005). 23

For another, this trait is both highly variable and highly heritable (Robertson et al. 24

1977; Gibert et al. 1998a; Pool and Aquadro 2007; Dembeck et al. 2015), with a 25

5

reasonably simple genetic basis, qualities which render the genetics underlying this 1

variation tractable to statistical analysis . Finally, spatial patterns of variation in this 2

trait suggest that it is ecologically important. In particular, both thoracic and 3

abdominal pigmentation patterns vary strongly with altitude and latitude (Telonis-4

Scott et al. 2011; David et al. 1985; Munjal et al. 1997; Pool and Aquadro 2007; 5

Capy et al. 1988), with independent clines found on different continents. In the 6

ancestral range in Africa, geographic variation is most strongly associated with levels 7

of UV-radiation, consistent with higher UV tolerance of darkly pigmented flies 8

compared to light ones (Bastide et al. 2014). 9

There have been several previous studies of pigmentation variation in 10

Drosophila, describing the genetic basis of variation in abdominal pigmentation 11

between species and between sexes (Llopart et al. 2002; Wittkopp et al. 2003; 12

Gompel et al. 2005; Jeong et al. 2006, 2008; Williams et al. 2008; Rogers et al. 2013; 13

Rogers et al. 2014; Salomone et al. 2013), and on variation in thoracic and abdominal 14

pigmentation within species (Robertson et al. 1977; Pool and Aquadro 2007; 15

Takahashi et al. 2007; Bickel et al. 2011; Takahashi and Takano-Shimizu 2011; 16

Bastide et al. 2013; Rogers et al. 2014; Dembeck et al. 2015; Wittkopp et al. 2010; 17

Cooley et al. 2012). The latter studies are the most relevant here. Thoracic 18

pigmentation phenotypes, including differences between African strains and variation 19

along a latitudinal cline, map to major effect loci near ebony, a pigment synthesis 20

gene (Takahashi et al. 2007; Telonis-Scott et al. 2011; Takahashi and Takano-21

Shimizu 2011). Abdominal pigmentation variation has a slightly more complex 22

genetic basis, and has been attributed to variants at three loci: bric-à-brac (bab), tan, 23

and ebony (Robertson et al. 1977; Kopp et al. 2003; Pool and Aquadro 2007; Bickel 24

et al. 2011; Bastide et al. 2013; Dembeck et al. 2015). All three of these loci have 25

6

well-described roles in pigmentation. Bab, which comprises two protein coding genes 1

(bab1 and bab2) is a spatial regulator of sexual dimorphic pigmentation at the tip of 2

the abdomen (Robertson et al. 1977; Kopp et al. 2000; Williams et al. 2008), and tan, 3

like ebony, is involved in pigment synthesis (True et al. 2005). Two independent 4

association studies have resulted in genome-wide, fine-resolution maps of 5

associations at these loci (Bastide et al. 2013; Dembeck et al. 2015). Both implicate 6

variants in cis-regulatory elements at tan, bab, and ebony, though the loci with the 7

largest effect varied between studies. In the Bastide et al. 2013 study, the variants 8

with the strongest effect lay upstream of the tan gene, in a cis-regulatory element 9

called the male specific enhancer (MSE). The MSE has also been implicated in the 10

pigmentation differences between D. santomea and D. yakuba (Jeong et al. 2008). 11

Variants in a regulatory element of bab1 (the dimorphic element, or DME), played a 12

significant, but minor role, and those upstream of ebony were of borderline 13

significance. Dembeck et al. (2015), in contrast, found the bab locus had a major 14

effect on pigmentation variation, while tan and ebony and other loci had minor 15

effects. 16

In this study, we replicate the Bastide et al. (2013) study in a South African 17

population of D. melanogaster, using the Pool-GWAS approach, but with some 18

important modifications. Although we find that same variants appear to be causal in 19

both populations, bab, rather than tan, shows the strongest association with 20

pigmentation in the South African population, similar to results from the Dembeck et 21

al. (2015) study and in contrast to the Bastide et al. (2013) study. We analyze these 22

results in detail to explain the reasons for the contrasting genetic associations between 23

the South African and European populations. 24

7

Results and Discussion 1

Variation within and between samples: We used samples of D. 2

melanogaster from three locations for this study. After adding an additional replicate 3

for the Viennese population, we reanalyzed the population samples from Bastide et al. 4

(2013), which consisted of two European populations, from near Bolzano, Italy, and 5

Vienna, Austria, about 200 miles apart. We also collected and analyzed data from a 6

new population, from Kanonkop, near Cape Town, South Africa. We divided each of 7

the three population samples into three replicates, which were then scored for female 8

abdominal pigmentation. Within each replicate, we identified individuals with 9

extreme phenotypes — those with the greatest and least extent of darkly pigmented 10

area on segment A7, as in Bastide et al. (2013) — pooled them, and subjected the 11

pools of light and dark individuals from each replicate to whole genome sequencing 12

(see Table S1 for coverage depth). We then analyzed these pools for allele frequency 13

differences that suggest an association with the pigmentation phenotype. The six 14

replicates from the European populations consisted of 1500 offspring of wild-caught 15

females, with 100 individuals (~7%) per replicate contributing to each extreme pool. 16

The South African samples, for logistical reasons, were kept as 700 isofemale lines 17

for 15 generations before they were scored for pigmentation. Each of the three 18

replicates from this sample consisted of a single individual sampled from each 19

isofemale line, with a similar fraction of the individuals contributing to the extreme 20

pools as in the European populations (55-80 females, or 7-11% of each replicate). 21

After mapping and filtering, we identified ~3.2 million single nucleotide 22

polymorphisms (SNPs) segregating in the European population and ~4 million SNPs 23

in the South African isofemale lines with ~1.9 million common to both. The SNPs 24

shared by both populations had significantly higher minor allele frequencies (MAF) 25

8

than the SNPs found in only one of the two populations (shared SNPs: median MAF 1

EU and SA: 0.12, private SNPs: median MAF EU: 0.006, SA: 0.015; paired Wilcoxon 2

rank sum test EU: W= 1e+12, p < 2.2e-16, SA: W= 3.5e+12, p < 2.2e-16). To estimate 3

divergence between the populations, we calculated pairwise FST values for 200kb 4

windows using PoPoolation2 (Kofler et al. 2011b). As differentiation between the 5

Austrian and Italian populations was low (median FST = 0.0107, 95% CI: [0.0105, 6

0.011]), we combined these samples for subsequent analysis, as in Bastide et al. 7

(2013). The South African sample showed higher, but still moderate, levels of 8

divergence from the European populations [median FST(Aut/SA) = 0.044 (95% CI: 9

[0.043, 0.046]), median FST(Ita/SA) = 0. 040 (95% CI: [0.039, 0.041])], lower than 10

those typically reported for comparisons between European and other African D. 11

melanogaster samples (FST ~ 0.20; Pool et al. 2012). The low levels of population 12

differentiation between Europe and South Africa seen here are probably at least 13

partially due to high levels of migration from cosmopolitan flies, shown to occur in 14

many African populations (Pool and Aquadro 2006), but which may be particularly 15

high for our South African sample due to Cape Town’s status as a shipping port. 16

For the South African population, we used flies from recently established 17

isofemale lines, which have a higher degree of inbreeding than the wild-derived F1s 18

used in the European study. As inbreeding in the South African samples may affect 19

the GWAS results, we first estimated the degree of inbreeding in these lines. To this 20

end, we sequenced 12 individual females from different isofemale lines and compared 21

these to 12 resequenced individual flies from the population collected in Bolzano, 22

Italy. In the South African flies, the median inbreeding coefficient, FIT, is 0.35 (95% 23

CI = [0.337, 0.361]) see Figure S1). This level of inbreeding can be explained by an 24

initial bottleneck of 2 progenitor flies and 15 generations at an effective population 25

9

size (Ne) of around 10-25, consistent with the conditions under which the isofemale 1

lines were propagated. FIT varied moderately but significantly among chromosomes, 2

with 2L and 3L showing the lowest inbreeding (median FIT= 0.28 both) and X and 2R 3

the highest (median FIT=0.43 and 0.5 respectively; Dunn’s test with Benjamin-4

Hochberg correction, comparison: 2L:3L and 2R:X p > 0.05, all other comparisons: p 5

< 10-6). In contrast, resequenced individual flies from the European populations, 6

which have not been maintained as isofemale lines, show little evidence of inbreeding 7

(median FIT ≈ 0.00, 95% CI: [-0.009, 0.001]), as expected for flies with essentially the 8

same genotypes as wild flies. 9

SNPs associated with variation in abdominal pigmentation: To find SNPs 10

associated with abdominal pigmentation, we compared the allele frequencies of the 11

extreme light and dark pools of the European and South African populations across 12

replicates using the Cochran-Mantel-Haenszel (CMH) test as in Bastide et al. (2013). 13

In both populations, most of the 100 highest ranked SNPs (ranked by p-value from the 14

CMH test) are located in non-coding regions – 73% in the European population, and 15

95% in the South African one (see Tables S2-S7). Nevertheless, in the European 16

population, the highly ranked SNPs are enriched for those in coding regions compared 17

to random samples matched in size, chromosomal distribution, and allele frequency 18

(100,000 samples: p < 10-4). This enrichment is, however, mostly due to the overlap 19

of the tan regulatory region with the coding sequence of various flanking genes (14 20

synonymous and 5 non-synonymous SNPs). 21

In general, the results for the European population (Figure 1) are essentially 22

the same as in Bastide et al. (2013), with minor differences likely due to the inclusion 23

of an additional replicate for the Viennese population and a slightly altered SNP 24

calling and filtering protocol. The strongest associations are due to SNPs found 25

10

upstream of the tan locus, with the highest-ranking SNPs in the MSE region (Figure 1

1A,C, Table S3, S6, S7). The remaining strongly associated SNPs are near the bric-a-2

brac locus (hereafter called bab), and weakly associated SNPs occur upstream of the 3

ebony gene on 3R and close to the pdm3 gene on 2R. While only weakly associated 4

with pigmentation, it is plausible that both ebony and pdm3 are causal, as both have 5

been implicated in abdominal pigmentation in other studies (Pool and Aquadro 2007; 6

Rogers et al. 2014). 7

In contrast, in the South African sample (Figure 1B, Table S2, S4, S5), the 8

SNPs with the strongest associations occur at bab, not tan. Specifically, these SNPs 9

lie in the third intron of bab1, near a cis-regulatory region called the dimorphic 10

element (Figure 1D; Williams et al. 2008). The next strongest associations were with 11

SNPs at the tan locus. The change in ranks for the tan and bab loci between Europe 12

and South Africa is investigated in detail below. Finally, we again found SNPs 13

weakly associated with pigmentation at the pdm3 locus, as in the European sample, 14

but none near the ebony locus (Figure 1, Figures S2 & S3). 15

Using D. simulans as an outgroup, we inferred the ancestral states of the 16

strongly associated SNPs via parsimony. Interestingly, the inferred ancestral 17

phenotype differs at the tan and bab loci: For tan, most of the ten highest ranking 18

SNPs with a consistent effect in both samples had an inferred ancestral light allele 19

(Eu: 7/10, SA: 9/10). At the bab locus, in contrast, the majority of SNPs had an 20

ancestral dark allele. (Eu: 8/10, SA: 7/10; Tables S2 and S3). 21

Non-SNP variants associated with abdominal pigmentation: Strongly 22

associated SNPs may not directly cause abdominal pigmentation differences, but may 23

instead be associated with pigmentation via linkage to causal alleles consisting of 24

other kinds of genetic variation, namely, short insertion and deletion (indel) variants, 25

11

and large chromosomal inversions. To investigate these data for differences in indel 1

frequencies between the pools, we identified 279,016 segregating short insertions and 2

deletions in one of the European pools and 371,146 in the South African samples, 3

with 199,676 of these segregating in all populations (with the requirement that they 4

segregate with a minor allele frequencies of at least 15%). As with the SNPs, we 5

compared allele counts of the indels in the extreme pools over replicates using a CMH 6

test. While indels and SNPs could, in principle, be combined and analyzed together, 7

we examined the indel results separately. The reason is that the indel allele counts are 8

likely less reliable than the SNP allele counts, as calling indels is technically difficult 9

compared to identifying SNPs, especially in pooled sequencing samples. 10

Overall, we find patterns similar to those obtained with SNPs (Figures S4 and 11

S5). For the European samples, highly associated indels are found around the same 12

regions close to tan and bab1, while for the South African flies, the only strongly 13

associated indels were near bab1, with none near the tan locus. In general, indels 14

associated with pigmentation variation showed higher p-values and smaller average 15

allele frequency changes than the highly associated SNPs (median of the log(OR) of 16

the 10 most significant variants: Eu: SNPs=1.8, indels=1.3; SA: SNPs=2.1, 17

indels=1.7), suggesting that the observed phenotypic variation is more likely to be due 18

to the SNPs than the indels. 19

We further investigated the pools of light and dark flies for differences in the 20

frequencies of cosmopolitan inversions. The inversions themselves are unlikely to 21

cause pigmentation differences, but as they suppress recombination in heterozygotes, 22

any causal variants contained in the inverted type may be linked to many other 23

variants also associated with the inversion. To investigate this possibility, we looked 24

for differences in the frequencies of six common cosmopolitan inversions on 25

12

chromosomes 2 and 3 between the light and dark pools. We identified inversion types 1

indirectly, via previously identified SNP markers from Kapun et al. (2013). 2

While some inversions showed considerable frequency differences between 3

light and dark pools in the European populations (Figure S6), these differences were 4

not consistent between the Italian and Austrian samples, and were not replicated in the 5

South African pools. Thus, these frequency differences may lead to an overall 6

elevation in allele frequency differences between the pools, but, assuming that the 7

inversions are associated with similar alleles in both European samples, they are 8

unlikely to lead to consistently associated alleles even in the European samples. In 9

addition, linkage disequilibrium between alleles and inversions is likely strongest at 10

the inversion breakpoints (Andolfatto et al. 1999). The closest inversion breakpoint of 11

any of the investigated inversion is ~2MB away from the bab locus, possibly close 12

enough to affect allele frequencies (Stevison et al. 2011), but unlikely to result in 13

strong spurious association at bab and not elsewhere on 3L. The tan locus is X-linked, 14

and the X-chromosome harbors no known common inversions (Lemeunier and Aulard 15

1992). In any case, the inversions are unlikely to cause artifactual signals at tan and 16

bab. 17

Similarities and differences between the populations: We noted several 18

differences between the European and South African populations. In particular, the 19

weak effect of ebony is restricted to the European population, and the strongest 20

signals occur near tan in the European population, but near bab in the South African 21

population. Nevertheless, we identified similar sets of genes overall in the two 22

populations, raising the question as to whether pigmentation is affected by the same 23

alleles in the two populations, or by independent alleles at the same loci. We tested 24

this by asking if highly ranked SNPs (ranked separately by significance for each 25

13

population) had consistent effects in the two samples, using two different measures of 1

consistency. First, we asked the alleles at these SNPs had the same qualitative effect 2

for Europe and South Africa, assuming that the ‘dark’ allele at each SNP is that 3

enriched in the dark sample of flies (and vice-versa). For unassociated diallelic SNPs, 4

we expect half of them to have the same qualitative effect in the two samples. High-5

ranking SNPs, however, have the same inferred dark and light alleles much more 6

often than this (Figure 2A). Second, we asked if high-ranking SNPs in one population 7

had the strongest estimated phenotypic effects in the other population, as expected. 8

We estimated and ranked phenotypic effects using the logOR for dark vs. light pools 9

(Bastide et al. 2013; Figures S7 and S8). The results again show broad consistency 10

between the populations; in particular, high-ranking SNPs in Europe generally had the 11

highest ranked phenotypic effects in South Africa (Figure 2B). For both measures of 12

similarity, the concordance between the samples declines steeply as the effects of 13

highly ranked SNPs are diluted by SNPs with increasingly lower ranks (Figure 2). 14

Overall, we conclude that despite the striking differences between the populations, 15

this analysis suggests that some SNPs contribute similarly to female abdominal 16

pigmentation in both populations. Otherwise, we would not expect to find consistency 17

between the high-ranking SNPs of the two populations, as each would contain two 18

different sets of causal variants and linked SNPs. 19

That said, the ranks of the highest-ranking South African SNPs were poor 20

predictors of the ranks of their effect sizes in Europe compared to the reciprocal 21

comparison (Figure 2). Inspection of the data reveals that this discrepancy arises 22

mainly from the region near bab1, where some SNPs were high ranking in South 23

Africa, but not in Europe. Tellingly, these SNPs segregate at intermediate frequencies 24

in South Africa, and at low frequencies in Europe [median minor allele frequencies 25

14

(MAF) of the 50 highest ranked SNPs in South Africa = 0.42, median MAF in Europe 1

= 0.26; paired Wilcox rank-sum test p = 0.0004]. The logOR estimator is akin to the 2

average effect of an allelic substitution (Falconer and Mackay 1996), in that it is 3

affected in a similar way by dominance and allele frequency. That is, in the absence 4

of dominance, the logOR effect estimates are affected very little by differences in 5

allele frequencies (Figure S7A). With dominance though, logOR estimates do change 6

with allele frequency, reflecting the change in the proportion of individuals expressing 7

the full homozygous effects of the recessive allele (Figure S7B). Thus, the failure to 8

observe a strong effect for the bab1 SNPs in Europe might be explained by a lower 9

MAF of the recessive allele, so that its effects are rarely expressed and such that it 10

contributes little to phenotypic variation in Europe. At the tan gene, in contrast, the 50 11

highest ranked SNPs in the South African sample had comparable median MAFs in 12

reference samples from both populations (median MAF Europe: 0.75, SA: 0.70; 13

paired Wilcox rank-sum test p = 0.99; Figure S9), and the effects for these SNPs were 14

more consistent than for those at the bab locus. 15

Refining candidates using the combined analysis of multiple populations: 16

Despite the high resolution of Pool-GWAS, for closely linked sites, several alleles 17

from a single population are associated with a causal variant via linkage 18

disequilibrium, resulting in false positives. The combined analysis of multiple 19

populations may make use of more historical recombination events than have 20

occurred in a single population. As a result, provided that the genetic architecture of 21

the trait is identical among the populations, their joint analysis can yield fewer false 22

positives and higher confidence candidates than the analysis of a single population. 23

Considering only high ranking SNPs with consistent effects in the two populations, 24

15

for example, we can narrow down the SNPs at bab to a single best candidate SNP 1

(3L:1085454 at bab1). 2

For SNPs in very close proximity, we can use patterns of linkage to further 3

refine the location of SNPs that show consistent changes between populations. We 4

achieve this by using information from SNPs on the same read or read pair, and 5

estimating short-range linkage disequilibrium between the associated SNPs as in 6

Feder et al. (2012). We combined pairs of SNPs with estimated r2 values > 0.75 to 7

infer short haplotypes. We then determined the frequency changes and the associated 8

p-values of SNPs associated with these haplotypes (Figures S9-S11) to determine 9

whether candidate SNPs increase in frequency independently of nearby SNPs, or as 10

part of a haplotype. For example, the three most highly ranked tan SNPs in the 11

European population appear to occur primarily on the same haplotype, consistent with 12

their close physical distance (e.g,, the two most tightly linked SNPs, X:9121094 and 13

X:9121129, show r2(EU)~0.97, r2(SA)~0.83). Unfortunately, this does not implicate a 14

single best candidate SNP, as the response of all three SNPs is likely non-15

independent. In contrast, the analysis of three groups of SNPs upstream of the tan 16

gene (X:9119116-9119160,9120730&912683, 9123892-9123903) demonstrates the 17

usefulness of this approach. In the analysis of the European population, the SNPs 18

within a group show strong linkage disequilibrium resulting in almost identical allele 19

frequency change. In the South African population, these alleles do not show the same 20

pattern of linkage disequilibrium, and only one SNP in each of the groups exhibits 21

consistent allele frequency changes in the South African sample. yielding a single best 22

candidate causal allele for each of the three haplotypes (see green boxes in Figure 23

S10). 24

16

Contrasts between populations at the tan and bab loci: We used 1

simulations to ask if we can simply explain the contrasting effects of effects of tan 2

and bab in Europe and South Africa, assuming a identical genetic architecture of 3

causal loci underlying the phenotype. Specifically, we investigated the role of 4

dominance and allele frequency differences between the populations. We simulated as 5

closely as possible the actual Pool-GWAS procedure for both populations, using the 6

same number of sequenced individuals and replicates, and estimated allele 7

frequencies from the European and South African reference populations (median 8

frequency for the light allele of the 4 best candidates at each locus: Europe: tan: 0.2, 9

bab: 0.83; SA: tan: 0.17, bab: 0.47, see Table 1). We assigned genotypes to 10

individuals based on these allele frequencies, accounting for inbreeding in the South 11

African population (using FIT = 0.35), and phenotypes based on these genotypes, 12

assuming a narrow-sense heritability (h2) of 0.3. This value is conservative compared 13

to the typical estimate of h2 = 0.5 to 0.9 for abdominal pigmentation (Bickel et al. 14

2011; Dembeck et al. 2015), allowing for the effects of background loci. We assigned 15

the same mean phenotype to individuals with the same genotype, regardless of 16

population. As in the real experiment, simulated individuals were divided into 17

replicates, selected for their phenotypes, and their pooled allele counts analyzed with 18

a CMH test. 19

In the simulations, we varied the degree of dominance (h) of the light allele at 20

each locus and the relative additive effect sizes between the two loci (E), and then 21

accepted the 1% of simulations best fitting the following four criteria: the ratio of 22

effects of tan and bab in (i) Europe (actual results, tan-logOR/bab-logOR = 2.51) and 23

(ii) South Africa (tan-logOR/bab-logOR = 0.57), and the ratio of effects of these 24

SNPs across populations (iii) tan [Europe-logOR/South Africa-logOR = 1.89 and for 25

17

(iv) bab (Europe-logOR/South Africa-logOR = 0.44, see Table 1). Additional details 1

of the simulation methods are given in the Methods. The results of rejection sampling 2

are shown in Figure 3, S12 and S13. These most probable parameter combinations 3

had dominant light alleles at both bab (consistent with Robertson et al. 1977), and 4

tan, and a stronger effect of bab vs. tan alleles. Nearly all accepted parameter 5

combinations (92%) resulted in switching of the ranks of tan and bab for the 6

European and South African populations. 7

To explore potential genetic interactions between the two loci, as suggested by 8

the results of Robertson et al. (1977), we also considered a model with a simple 9

epistatic interaction term (see Methods), allowing for synergistic (positive) or 10

antagonistic (negative) effects of the light allele. Including a positive epistatic 11

interaction between light alleles at the two loci substantially improves the model fit 12

(Bayes Factor: P(Data|Model with epistasis)/ P(Data|Model without epistasis) = 644; 13

Csilléry et al. 2012). However, as other factors, such as subtle differences in 14

experimental methods or differences between the populations at other causal loci, may 15

yield a similar effect, we do not consider this analysis strong evidence for epistasis. In 16

any case, the main results remain qualitatively unchanged: Similar to the model 17

without epistasis, this model also suggests dominance of the light alleles of both tan 18

and bab, though support for tan dominance is reduced, and a stronger allelic effect of 19

variants at the bab locus than at tan (Figures S14-S16). 20

Overall, our simulations suggest that the contrasting results between the 21

European and South African population can be at least partly explained by dominance 22

and allele frequency differences between the populations: As the dark allele at bab is 23

at high frequency in the South African population, the recessive homozygote is easily 24

recovered even in the absence of inbreeding. In contrast, the recessive homozygote in 25

18

Europe is rare (at an expected frequency of 2.9%), reducing its effect on phenotypic 1

variation and in the experiment, in spite of its large allelic effect. 2

Comparison of the results with previous studies: We compared the results 3

of our approach to two studies on natural genetic variation of female abdominal 4

pigmentation. For this analysis, we chose to maximize the chance of overlap with the 5

other two studies with a more liberal cutoff than the 5% FDR used for the main 6

results, instead using the 1019 SNPs with p-values below the Bonferroni threshold (p 7

= 1.56e-8 for Europe and p = 1.25e-8 for South Africa). We then looked for overlaps 8

between these SNPs and those identified as candidates in the other studies. 9

The first study, Bickel et al. (2011), investigated variants at the bric-à-brac 10

locus (bab1 and 2) segregating in a sample of 96 inbred lines from an orchard near 11

Winters, CA, USA, for their effects on A6 pigmentation. While this study measured 12

pigmentation for a different tergite, genotypic values for A6 and A7 are strongly 13

correlated (Gibert et al. 1998b), so an overlap between alleles affecting these traits 14

seems plausible. In the 148kb region covered by Bickel et al. 2011, there were 3066 15

uniquely mappable biallelic SNPs with a minor allele frequency ≥ 0.05; of these, 2367 16

were segregating in at least one of the populations used in this study, and 167 of these 17

2367 were identified as candidates by Bickel et al. 2011. Of these SNPs, 15 and 2 18

were significant after Bonferroni correction for multiple testing in the South African 19

and European studies, respectively (Table S8), including the most strongly associated 20

ones at the bab locus in the South African (3L:1084990) and the European 21

(3L:1085454) samples. The overlapping SNPs fall into two regions, the regulatory 22

region in the first intron of bab1, and the noncoding intergenic region between bab1 23

and bab2, ~50kb apart, much further than the usual extent of linkage disequilibrium in 24

19

D. melanogaster (Mackay et al. 2012; Franssen et al. 2015), suggesting that both 1

regions may affect pigmentation variation independently. 2

The second study, Dembeck et al. (2015), looked genome-wide for variants 3

associated with pigmentation of the 5th and 6th abdominal tergites (A5 and A6) of 4

females in a panel of 175 inbred and sequenced lines created from a population at 5

Raleigh, NC, USA (Mackay et al. 2012). We looked for overlap between our 1019 6

SNPs, and the 154 variants with a p-value below 1 x 10-5 in Dembeck et al. (2015) 7

associated with effects on either or both segments. In all, we found 15 candidate SNPs 8

in common between Dembeck et al. (2015) and this study; of these, 11 are in the first 9

intron of bab1 and 4 near the MSE upstream of tan (see Table S9). Encouragingly, the 10

inferred light and dark alleles for the SNPs affecting A7 (present study) and those 11

significantly affecting A6 (8 SNPs) or A5 (3 SNPs, with 2 of them also affecting A6) 12

in Dembeck et al. (2015) were identical between the studies, consistent with similar 13

associations and with correlated trait values for these segments (Gibert et al. 1998b). 14

The remaining 6 of the 15 SNPs had a complex effect on pigmentation, affecting the 15

contrast between the A5 and A6 segments—e.g., at these SNPs, alleles that increase 16

pigmentation in A6, reduce it for A5. As pigmentation of adjacent segments is more 17

strongly correlated than that of more distant ones (Gibert et al. 1998b), we expect the 18

effect of these alleles on A6, rather than that on A5, to predict the effect the A7 19

segment measured here. Consistent with this, the inferred light and dark alleles for all 20

six SNPs were identical for the A6 segment in Dembeck et al. (2015) and for A7 in 21

the present study. 22

We next asked whether the 15 SNPs that overlap between this study and the 23

Dembeck et al. 2015 study show quantitatively similar effects, again using log(OR) as 24

a proxy for effect size in our study, and focusing on the effect on A6 in the Dembeck 25

20

et al. (2015) study. For the European population, we found no correlation, perhaps not 1

surprisingly as the strongest associations occurred at different loci in this comparison. 2

For the South African population, where the strongest associations were near the same 3

locus (i.e, near bab), the estimated effects of the alleles were strongly correlated 4

(Spearman’s ρ = 0.86, p = 0.0013). 5

We did not find any signs of association around the other genes identified to 6

influence pigmentation in Dembeck et al. (2015), possibly due to differences in the 7

trait measured (A6 vs. A7 pigmentation), or due to other differences in the 8

methodology and power of these approaches. Finally, only one SNP– the most highly 9

associated one in bab1 for South Africa (3L:1084990)– overlapped between all three 10

studies (Bickel et al. 2011, Dembeck et al. 2015, and this study). 11

Conclusions:. Using the Pool-GWAS method on three different D. 12

melanogaster populations, we show that variants at tan and bab are associated with 13

female abdominal pigmentation variation. In addition to the strong statistical support 14

from the current study, our results are also supported by other studies on pigmentation 15

variation (Robertson et al. 1977; Kopp et al. 2003; Bastide et al. 2013; Dembeck et 16

al. 2015), and by forward genetic studies showing that both tan and bab affect 17

pigmentation when mutated. While the results for the South African and European 18

populations are broadly consistent, the differences between them are potentially 19

informative. The European study points to a large effect of tan, while the results from 20

the South African population (and other studies on natural variants; Robertson et al. 21

1977; Dembeck et al. 2015) attribute the strongest effects to bab. We show that this 22

difference does not necessarily reflect alternative genetic architectures, but can be 23

most simply explained by a combination of dominance and allele frequency: e.g, in 24

South Africa, the dark allele of bab, which appears to have a strong effect (present 25

21

study and Robertson et al. 1977; Kopp et al. 2003; Dembeck et al. 2015), is at high 1

enough frequency to occur in homozygous form reasonably often, and thus can 2

contribute substantially to phenotypic variation even if recessive. In Europe, in 3

contrast, the dark bab allele is rare, and, if recessive, contributes little to phenotypic 4

variation in wild flies. Regardless of its status in natural populations, a recessive bab 5

allele can play a large role in differences between inbred lines selected for extreme 6

phenotypes, as shown in previous studies (Robertson et al. 1977; Kopp et al. 2003). 7

Further, because distant populations may show different patterns of linkage 8

disequilibrium, studying associations in different populations provides a means to 9

help distinguishing linked but non-causal variants from causal alleles contributing to 10

the phenotype — e.g, a SNP that is significant for one sample may show no 11

association in another sample, or have different inferred ‘dark’ and ‘light’ alleles 12

between samples. Thus, by looking for associations that are repeatable across 13

populations, repeated association studies allow further fine-mapping of potential 14

causal alleles. 15

Finally, we also show that Pool-GWAS can be used to analyze inbred 16

individuals from isofemale lines, a common tool in population genetics (David et al. 17

2005), offering a cost effective way to study the genetic basis of phenotypic variation, 18

including obtaining reasonable estimates of relative effect sizes for different loci. That 19

this modified experimental design yields reliable results is important for logistical 20

reasons, as it allows collection and phenotypic measurement of samples to be 21

separated in time, such that the latter can be done under controlled laboratory 22

conditions. 23

22

Materials and Methods 1

Sample collection, iso-female lines used, and pigmentation scoring: For the 2

European populations around 30,000 flies were collected in Vienna, Austria, in 2010 3

and in Bolzano, Italy, in 2011, and split into replicates and used to create an F1 4

generation in the laboratory under 25ºC and scored for pigmentation as described in 5

Bastide et al. (2013). Briefly, offspring were kept at 18ºC for a few days post-eclosion 6

to allow adult pigmentation pattern to develop and stabilize. Pigmentation was scored 7

by visually classifying each individual according to the extent of darkly pigmented 8

area on the most posterior abdominal segment (A7 tergite). For classification five 9

levels of pigmentation ranging from 0 (not pigmented) to 4 (completely pigmented) 10

were assigned, and 100 individuals with the lightest and darkest levels (0 and 4, 11

respectively) of each replicate were randomly chosen for pool sequencing. Around 12

1500 individuals were scored for each replicate. As a control three replicates of 100-13

160 individuals not selected for abdominal pigmentation were used. 14

The South African population consisted of 700 isofemale lines collected in March 15

2012 in Kanonkop, SA. These isofemale lines were kept at 25ºC and the offspring left 16

for several days to allow for the adult pigmentation to mature and fully develop. For 17

each replicate one female individual was taken from each isofemale line, leading to a 18

replicate size of 700 individuals. The individuals were then visually scored for 19

pigmentation of their A7 tergite, and the 135 lightest and 173 darkest flies from each 20

replicate pooled and sequenced (19.3% and 24.7%, respectively). The average allele 21

frequencies in the Kanonkop population were estimated from pooled females obtained 22

from a subset of 564 isofemale lines. 23

DNA extraction, library preparation and sequencing: DNA from pooled and 24

individual flies was extracted following a modified high salt protocol (Miller et al. 25

23

1988) and fragmented using a Covaris S2 device (Covaris, Inc. Woburn, MA, USA). 1

Paired-end library preparation was based on the NEBNext® DNA Library Prep 2

Master Mix Set (E6040L, New England Biolabs, Ipswich, MA) for the pooled South 3

African sample and the individually sequenced Viennese flies, whereas a modified 4

protocol of the NEBNext® Ultra DNA Library Prep Kit (E7370L) was used for the 5

individually sequenced South African flies. Size selection and final library 6

purification were performed using AMPureXP beads (Beckman Coulter, CA, USA) 7

with an additional gel-based size selection for the pooled South African sample and 8

the individually sequenced Viennese flies. The resulting insert sizes were 400bp for 9

these two samples and approximately 550bp for the individual South African libraries. 10

For amplification of individually barcoded libraries, the PCR mastermix included in 11

the TruSeq DNA LT Sample Prep Kit (FC-121-2001, Illumina, San Diego, CA) or 12

Phusion Polymerase (NEB) and 12 PCR cycles were used. All libraries were 13

sequenced on a HiSeq2000 following a 2x100bp protocol. 14

Read Mapping, Filtering of Contaminants, Realignment: Sequencing reads 15

were trimmed, mapped to a reference genome using a Hadoop based distributed 16

computation framework and filtered as previously described (Kofler et al. 2011a; 17

Pandey and Schlötterer 2013; Bastide et al. 2013). Shortly, the reads were trimmed 18

using PoPoolation and a base quality threshold of 18 (Kofler et al. 2011a), and 19

mapped to the combined genomes of D. melanogaster (v. 5.18), Wolbachia pipientis 20

(AE017196.1), Lactobacillus brevis (CP000416.1), Acetobacter pasteurianus 21

(AP011170), and phage phiX174 (NC_001422.1) using BWA aln v. 0.5.9 (Li and 22

Durbin 2009) without seeding allowing for two gap openings (-o 2), an alignment 23

distance threshold leading to loss of less than 1% of reads assuming a 2% error rate (-24

n 0.01), and up to 12bp gap extensions (-e 12 –d 12). The aligned reads were checked 25

24

for duplicates, and filtered for improper pairs, and a mapping quality of at least 20 1

using samtools v. 0.1.18 and Picard tools v. 1.79 (Li et al. 2009). Finally, all 2

alignments were realigned around short insertions and deletions, and in regions 3

showing high entropy, using GATK v. 2.5.2 (McKenna et al. 2010) considering both 4

the insertions and deletions occurring in the alignments, and those identified in the 5

second freeze version of the Drosophila Genetic Reference Panel (DGRP) lines 6

(Mackay et al. 2012) (DGRP freeze 2: 7

ftp://ftp.hgsc.bcm.edu/DGRP/freeze2_Feb_2013/ ; options RealignerTargetCreator: --8

maxIntervalSize 400 --minReadsAtLocus 15; IndelRealigner: -entropy 0.05 -LOD 3 -9

maxConsensuses 50 -greedy 250 -model USE_READS). 10

The Austrian and Italian populations were found to contain around 1% of D. 11

simulans contamination when inspecting male flies for diagnostic genital differences 12

(Sturtevant 1919). As previously described (Bastide et al. 2013), this can lead to false 13

associations, as D. melanogaster and D. simulans differ from each other in the degree 14

of abdominal pigmentation (Gibert et al. 1998b). To identify and remove reads 15

potentially stemming from D. simulans, the mapped and filtered reads were remapped 16

against five genomes each from D. melanogaster and D. simulans using GMAP (ver. 17

2012-07-20) as previously described (Bastide et al. 2013), and reads mapping to a D. 18

simulans genome with a higher mapping quality than that to the best fitting D. 19

melanogaster genome were removed as contaminations. Previously identified fixed 20

differences between the two fly species were used to independently estimate the 21

amount of D. simulans contamination (Bastide et al. 2013). 22

The cleaned and filtered alignment files were converted into synchronized 23

format requiring a base quality of at least 20 using PoPoolation2 (Kofler et al. 2011b). 24

Repetitive regions and regions around small insertions and deletions (5 bp up- or 25

25

downstream) were removed. RepeatMasker (ver. 3.2.8, www.repeatmasker.org) was 1

used for identifying repetitive regions without considering low complexity regions 2

(option -nolow). Overall, after all filtering steps, an average sequencing depth of 3

around 100 was obtained, with coverages ranging from 27 to 278 (for details see 4

Table S2). 5

Association Mapping and False Detection Rate Calculation: To test for 6

associations of variants with abdominal pigmentation we used a Cochran-Mantel-7

Haenszel (CMH) test as previously described (Bastide et al. 2013). Briefly, the CMH 8

test allows to test for independence of values in contingency tables over multiple 9

strata or - in our case - replicates. For each replicate a 2 x 2 contingency table 10

containing the reference and alternative allele counts of the extremely light and dark 11

pools was created, and the CMH test performed over all replicates for each individual 12

variant as implemented in PoPoolation 2 (Kofler et al. 2011b). For calculation of the 13

common odds ratios and confidence intervals we used custom python scripts using the 14

mantelhaen.test function of the stats package of R (R Core Team 2014) and rpy2 15

(rpy.sourceforge.net). All polymorphic sites were filtered for a minimum coverage of 16

15 for each sample, and a minimum overall count of the minor allele of 8 for each 17

sample We also removed the 2% highest covered sites from the calculations. To 18

correct for multiple testing, we enforced a false detection rate of 0.05 derived from an 19

empirical null distribution as described in Bastide et al. (2013). 20

Identification of small insertions and deletions: Small insertions and 21

deletions were identified and quantified using the UnifiedGenotyper of GATK (ver. 22

2.5.2; McKenna et al. 2010). The alignment files of each population were analyzed 23

together using the a sample ploidy of 25, a minimum base quality of 20, a minimum 24

InDel fraction of 5% and an overall occurrence of 20 counts for each InDel to be 25

26

considered (options: -maxAltAlleles 3 -glm GENERALPLOIDYINDEL -deletions 1

1.1 -mbq 20 -minIndelFrac 0.05 -minIndelCnt 20 -stand_call_conf 10.0 -2

stand_emit_conf 10.0 -sample_ploidy 25). A lower sample ploidy than actually 3

occurring in the pools had to be used for the calculations as GATK was prohibitively 4

slow for ploidies higher than 25. While this may lead to some inaccuracies in 5

genotyping, and scoring of low frequency variants, it should not influence the 6

estimated counts of variants with minor allele frequencies above 5%. The InDels were 7

filtered using GATK to perform variant quality score recalibration taking the DGRP 8

freeze 2 InDels as a training set, and choosing a sensitivity of 95%. Furthermore, all 9

InDels with a frequency below 5% were removed using bcftools (v. 1.0, 10

samtools.github.io/bcftools/). For each position only the two most common alleles 11

where considered and the counts of the most common InDel used in a modified CMH 12

testing against the reference allele count, a procedure similar to single nucleotide 13

variants. The CMH test was performed using the mantelhaen.test function of the stats 14

package of R (R Core Team 2014) using custom python scripts and rpy2 15

(rpy.sourceforge.net). 16

Estimation of inbreeding coefficients and genetic differentiation: 17

Inbreeding coefficients (FIT) were calculated for single females from 12 South 18

African isofemale lines, and 12 female offspring from the freshly collected Italian 19

population. For the calculation of inbreeding coefficients (FIT) 6 individual flies from 20

each the very dark and the very light fraction of the South African samples, and 12 21

individuals not scored for pigmentation from the Italian population. As we estimate 22

FIT genome-wide, we do not expect the values to be much affected by any selection 23

on the pigmentation loci. The reads of each individual were mapped to the reference 24

with BWA, filtered, and realigned with GATK as above. We called SNPs with the 25

27

UnifiedGenotyper of GATK (McKenna et al. 2010) using the default parameters apart 1

from allowing for 3 alleles, and slightly altered call and emit thresholds (-2

maxAltAlleles 3 -glm SNP --output_mode EMIT_VARIANTS_ONLY -3

stand_call_conf 30.0 -stand_emit_conf 15.0). Subsequently we recalibrated the 4

variant quality score based on the combined DGRP freeze 2 and Drosophila 5

Population Genomics Project 2 (DPGP2) (Pool et al. 2012) variants. To calculate 6

inbreeding coefficients, we used only high confidence SNPs called in the 12 South 7

African and the 12 Italian individuals, respectively (GATK Variant Quality Score 8

Recalibration tranche sensitivity threshold: 99%; allele frequency > 0.15). Inbreeding 9

coefficients were calculated for each variant from the fraction of heterozygous 10

individuals (Het) and the frequencies of the reference (fR) and alternative alleles (fA) 11

in the base populations :

FIT =1−Het /(2 fR ⋅ fA) . 12

Genetic differentiation was calculated by estimating pairwise FST values from the 13

pooled data of flies from each population that were unselected for pigmentation, using 14

PoPoolation 2 with default settings (Kofler et al. 2011b) and the same filtering criteria 15

and allele frequency cutoffs as for the CMH test. 16

Simulations: To better understand the combined effects of the experimental 17

setup, allele frequencies, inbreeding coefficients, additive effects, dominance, and 18

epistatic interactions – on the results of these experiments, we performed computer 19

simulations. 20

We simulated each experiment, following the experimental design as closely 21

as possible. For each simulated population, we generated individuals with genotypes 22

for the tan and bab loci according to the measured allele frequencies, assuming 23

Hardy-Weinberg equilibrium for the European population and using the estimated 24

inbreeding coefficient for the South African sample (FIT = 0.3). Phenotypic values for 25

28

each individual were calculated based on its genotype and the underlying genetic 1

architecture, and adding a normally distributed random effect with an environmental 2

variance, which was scaled by heritability. The narrow-sense heritability, h2, for 3

female abdominal pigmentation has been estimated to lie between 0.5 and 0.8 (Gibert 4

et al. 1998a). As only two of the contributing loci were simulated, we conservatively 5

assumed h2 = 0.3 for the European population, and applied the same environmental 6

variance VE applied to both the European and South African populations. 7

The genetic component of the phenotype, was calculated based on the additive 8

effect (eff) and the degree of dominance (h) for each locus. The dominance, h, ranges 9

from 0 for a recessive light allele, over 0.5 for codominance to 1 for dominance of the 10

light allele. We calculated the phenotypic value P depending on the state of loci g1 11

and g2 using

P(g1,g2) = a1(g1) + a2(g2) + ε , where ε represents the environmental 12

contribution to phenotypic variance, and is a normally distributed random variable 13

with mean 0 and variance VE with mean zero, and aN(gN) constitutes the genotypic 14

effect of the locus gN: 15

aN (gN ) = effN ⋅−0.5

hN − 0.50.5

⎨ ⎪

⎩ ⎪

if gN = AAif gN = Aaif gN = aa

16

In addition to dominance and additive effects, we also include potential 17

interactions between the two loci. We defined simple pairwise epistatic interactions 18

by an interaction constant (epsint). An interaction term of zero indicates no epistasis, a 19

negative value corresponds to negative epistasis between light alleles (e.g, with the 20

double light homozygote being darker than expected in the absence of epistasis), and 21

a positive value corresponds to positive epistasis (e.g, with the light double 22

homozygote being lighter than expected; see Figure S17). The effect of the epistatic 23

interaction between the loci g1 and g2 is calculated with24

29

eps(g1,g2) = epsint ⋅ a1(g1)⋅ a2(g2)⋅ (eff1 + eff2) /2 and the overall phenotypic value as 1

P(g1,g2) = a1(g1) + a2(g2) + eps(g1,g2) + ε . 2

For each set of dominance, epistasis, and additive effect parameters, we 3

performed simulations corresponding to the European and South African experiments, 4

with individuals divided into replicates as in the real experiment (1500 individuals 5

randomly assigned to six replicates in the European experiment, or 1 fly per replicate 6

from each of 700 isofemale lines in the South African experiment). The individuals 7

in each replicate were then ranked according to their phenotypic values, and a CMH 8

test was performed using the allele counts from extreme individuals (with 100 9

individuals selected from each extreme for Europe, and 65 individuals for each from 10

South Africa). 11

To infer the most likely combination of parameter values and to compare the 12

two models, we used rejection sampling as implemented in the abc R package 13

(Csilléry et al. 2012). For both models 2 million simulations with parameters drawn 14

from uniform prior distributions ([-0.125, 1.125] for dominances and [0.25,4.5] for 15

the additive effect of bab1 relative to tan and [-1.5,3.5] for the epistatic interaction 16

strength). The posterior distribution of parameters was estimated using a simple 17

rejection method using an acceptance threshold of 0.01 and the following the criteria 18

as summary statistics: logOR(tan)/logOR(bab1): Europe=2.51, South Africa=0.68; 19

logOR(tanEU)/logOR(tanSA)=2.28 and logOR(bab1EU)/logOR(bab1SA)=0.62. For 20

model selection, we used the postpr function of the abc R package with the 21

summary statistics calculated from simulations and a tolerance of 0.01. 22

Data availability: All custom scripts are available for download at github 23

under https://github.com/luenling/Pigmentation2015 . Raw sequencing reads are 24

available at the European Nucleotide Archive (ENA, http://www.ebi.ac.uk/ena) under 25

30

Accession numbers ERP001827 (http://www.ebi.ac.uk/ena/data/view/ERP001827), 1

and ERP013300 (http://www.ebi.ac.uk/ena/data/view/ERP013300). 2

3

Acknowledgments 4

We thank A. Futschik for assistance with the statistical analysis. H. Bastide, P. Stöbe 5

and R. Tobler contributed to the experiment with the European populations. We are 6

especially grateful to H. van Schalkwyk for fly collection and technical assistance. 7

This research was supported by the Austrian Science Funds (FWF, P22725) and the 8

European Research Council (ArchAdapt). The funders had no role in study design, 9

data collection and analysis, decision to publish, or preparation of the manuscript. 10

11

12

References 13

Andolfatto P., Wall J. D., Kreitman M., 1999 Unusual haplotype structure at the 14

proximal breakpoint of In(2L)t in a natural population of Drosophila melanogaster. 15

Genetics 153: 1297–1311. 16

Bastide H., Betancourt A., Nolte V., Tobler R., Stöbe P., et al., 2013 A genome-wide, 17

fine-scale map of natural pigmentation variation in Drosophila melanogaster. PLoS 18

Genet. 9: e1003534. 19

Bastide H., Yassin0 A., Johanning E. J., Pool J. E., 2014 Pigmentation in Drosophila 20

melanogaster reaches its maximum in Ethiopia and correlates most strongly with 21

ultra-violet radiation in sub-Saharan Africa. BMC Evol. Biol. 14: 179. 22

31

Bickel R. D., Kopp A., Nuzhdin S. V., 2011 Composite effects of polymorphisms 1

near multiple regulatory elements create a major-effect QTL. PLoS Genet 7: 2

e1001275. 3

Capy P., David J., Robertson A., 1988 Thoracic trident pigmentation in natural 4

populations of Drosophila simulans: a comparison with D. melanogaster. Heredity 5

61: 263–268. 6

Cooley A. M., Shefner L., McLaughlin W. N., Stewart E. E., Wittkopp P. J., 2012 7

The ontogeny of color: developmental origins of divergent pigmentation in 8

Drosophila americana and D. novamexicana. Evol Dev 14: 317–325. 9

Csilléry K., François O., Blum M. G. B., 2012 abc: an R package for approximate 10

Bayesian computation (ABC). Methods Ecol Evol 3: 475–479. 11

David J., Capy P., Payant V., Tsakas S., 1985 Thoracic trident pigmentation in 12

Drosophila melanogaster: Differentiation of geographical populations. Génétique 13

Sélection Évolution 17: 211–24. 14

David J. R., Gibert P., Legout H., Pétavy G., Capy P., et al., 2005 Isofemale lines in 15

Drosophila: an empirical approach to quantitative trait analysis in natural populations. 16

Heredity 94: 3–12. 17

Dembeck L., Huang W., Magwire M., Lawrence F., Lyman R., et al., 2015 Genetic 18

architecture of abdominal pigmentation in Drosophila melanogaster. PLoS Genet. 19

Falconer D. S., Mackay T. F. C., 1996 Introduction to Quantitative Genetics (4th 20

Edition). Benjamin Cummings. 21

32

Feder A. F., Petrov D. A., Bergland A. O., 2012 LDx: estimation of linkage 1

disequilibrium from high-throughput pooled resequencing data. PLoS One 7: e48588. 2

Franssen S. U., Nolte V., Tobler R., Schlötterer C., 2015 Patterns of linkage 3

disequilibrium and long range hitchhiking in evolving experimental Drosophila 4

melanogaster populations. Mol Biol Evol 32: 495–509. 5

Gibert P., Moreteau B., Moreteau J., David J., 1998a Genetic variability of 6

quantitative traits in Drosophila melanogaster (fruit fly) natural populations: analysis 7

of wild-living flies and of several laboratory generations. Heredity 80: 326–335. 8

Gibert P., Moreteau B., Scheiner S. M., David J. R., 1998b Phenotypic plasticity of 9

body pigmentation in Drosophila: correlated variations between segments. Genet. Sel. 10

Evol. 30: 181. 11

Gompel N., Prud’homme B., Wittkopp P. J., Kassner V., Carroll S. B., 2005 Chance 12

caught on the wing: cis-regulatory evolution and the origin of pigment patterns in 13

Drosophila. Nature 433: 481–7. 14

Gorlov I. P., Moore J. H., Peng B., Jin J. L., Gorlova O. Y., et al., 2014 SNP 15

characteristics predict replication success in association studies. Hum. Genet. 16

Hoekstra H. E., Nachman M. W., 2003 Different genes underlie adaptive melanism in 17

different populations of rock pocket mice. Mol Ecol 12: 1185–1194. 18

Hoekstra H. E., Hirschmann R. J., Bundey R. A., Insel P. A., Crossland J. P., 2006 A 19

single amino acid mutation contributes to adaptive beach mouse color pattern. Science 20

313: 101–104. 21

33

Ingram C. J. E., Raga T. O., Tarekegn A., Browning S. L., Elamin M. F., et al., 2009 1

Multiple rare variants as a cause of a common phenotype: several different lactase 2

persistence associated alleles in a single ethnic group. J Mol Evol 69: 579–588. 3

Ioannidis J. P., Thomas G., Daly M. J., 2009 Validating, augmenting and refining 4

genome-wide association signals. Nat. Rev. Genet. 10: 318–29. 5

Ioannidis J. P., Tarone R., McLaughlin J. K., 2011 The false-positive to false-negative 6

ratio in epidemiologic studies. Epidemiol. Camb. Mass 22: 450–6. 7

Itan Y., Jones B. L., Ingram C. J. E., Swallow D. M., Thomas M. G., 2010 A 8

worldwide correlation of lactase persistence phenotype and genotypes. BMC Evol 9

Biol 10: 36. 10

Jeong S., Rokas A., Carroll S. B., 2006 regulation of body pigmentation by the 11

abdominal-b hox protein and its gain and loss in Drosophila evolution. Cell 125: 12

1387–1399. 13

Jeong S., Rebeiz M., Andolfatto P., Werner T., 2008 The evolution of gene regulation 14

underlies a morphological difference between two Drosophila sister species. Cell 132: 15

783–793. 16

Kapun M., Schalkwyk H. Van, McAllister B., Flatt T., Schalkwyk H. van, et al., 2013 17

Inference of chromosomal inversion dynamics from Pool-Seq data in natural and 18

laboratory populations of Drosophila melanogaster. Mol. Ecol. 23: 1813–1827. 19

Kofler R., Orozco-terWengel P., Maio N. De, Pandey R. V., Nolte V., et al., 2011a 20

PoPoolation: a toolbox for population genetic analysis of next generation sequencing 21

data from pooled individuals. PLoS One 6: e15925. 22

34

Kofler R., Pandey R. V., Schlötterer C., 2011b PoPoolation2: identifying 1

differentiation between populations using sequencing of pooled DNA samples (Pool-2

Seq). Bioinforma. Oxf. Engl. 27: 3435–6. 3

Kopp A., Duncan I., Carroll S. B., 2000 Genetic control and evolution of sexually 4

dimorphic characters in Drosophila. Nature 408: 553–9. 5

Kopp A., Graze R. M., Xu S., Carroll S. B., Nuzhdin S. V., 2003 Quantitative trait 6

loci responsible for variation in sexually dimorphic traits in Drosophila melanogaster. 7

Genetics 163: 771–787. 8

Lemeunier F., Aulard S., 1992 Inversion polymorphism in Drosophila Melanogaster. 9

In: Krimbas CB, R PJ (Eds.), Drosophila Inversion Polymorphism, CRC Press, pp. 10

339–407. 11

Li H., Wu Y., Loos R. J. F., Lin X., 2008 Variants in the Fat mass – and Obesity-12

associated (FTO) gene are not associated with obesity in a Chinese Han population. 13

Diabetes 57: 264–268. 14

Li H., Durbin R., 2009 Fast and accurate short read alignment with Burrows-Wheeler 15

transform. Bioinformatics 25: 1754–1760. 16

Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., et al., 2009 The Sequence 17

Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. 18

Llopart A., Elwyn S., Lachaise D., Coyne J. A., 2002 Genetics of a difference in 19

pigmentation between Drosophila yakuba and Drosophila santomea. Evolution 56: 20

2262–2277. 21

35

Mackay T. F. C., Richards S., Stone E., Barbadilla A., Ayroles J. F., et al., 2012 The 1

Drosophila melanogaster Genetic Reference Panel. Nature 482: 173–178. 2

McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., et al., 2010 The 3

Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation 4

DNA sequencing data. Genome Res. 20: 1297–303. 5

Miller S. A., Dykes D. D., Polesky H. F., 1988 A simple salting out procedure for 6

extracting DNA from human nucleated cells. Nucleic Acids Res 16: 1215. 7

Munjal A., Karan D., Gibert P., Moreteau B., Parkash R., et al., 1997 Thoracic trident 8

pigmentation in Drosophila melanogaster: latitudinal and altitudinal clines in Indian 9

populations. Genet Sel Evol 29: 601–610. 10

Ng M. C. Y., Park K. S., Oh B., Tam C. H. T., Cho Y. M., et al., 2008 Implication of 11

genetic variants near TCF7L2, SLC30A8, HHEX, CDKAL1, CDKN2A/B, IGF2BP2, 12

and FTO in type 2 diabetes and obesity in 6,719 Asians. Diabetes 57: 2226–2233. 13

Pandey R. V., Schlötterer C., 2013 DistMap: a toolkit for distributed short read 14

mapping on a Hadoop cluster. PLoS One 8: e72614. 15

Pool J. E., Aquadro C. F., 2006 History and structure of sub-Saharan populations of 16

Drosophila melanogaster. Genetics 174: 915–29. 17

Pool J. E., Aquadro C. F., 2007 The genetic basis of adaptive pigmentation variation 18

in Drosophila melanogaster. Mol Ecol 16: 2844–2851. 19

Pool J. E., Corbett-Detig R. B., Sugino R. P., Stevens K., Cardeno C. M., et al., 2012 20

Population genomics of sub-saharan Drosophila melanogaster: African diversity and 21

non-African admixture. PLoS Genet. 8: e1003080. 22

36

Ranciaro A., Campbell M. C., Hirbo J. B., Ko W.-Y., Froment A., et al., 2014 Genetic 1

origins of lactase persistence and the spread of pastoralism in Africa. Am J Hum 2

Genet 94: 496–510. 3

R Core Team, 2014 R: A Language and Environment for Statistical Computing. R 4

Foundation for Statistical Computing, Vienna, Austria. 5

Robertson A., Briscoe D., Louw J., 1977 Variation in abdomen pigmentation in 6

Drosophila melanogaster females. Genetica 47: 73–76. 7

Rogers W. A., Salomone J. R., Tacy D. J., Camino E. M., Davis K. A., et al., 2013 8

Recurrent modification of a conserved cis-regulatory element underlies fruit fly 9

pigmentation diversity. PLoS Genet 9: e1003740. 10

Rogers W. A., Grover S., Stringer S. J., Parks J., Rebeiz M., et al., 2014 A survey of 11

the trans-regulatory landscape for Drosophila melanogaster abdominal pigmentation. 12

Dev. Biol. 385: 417–32. 13

Salomone J. R., Rogers W. A., Rebeiz M., Williams T. M., 2013 The evolution of 14

Bab paralog expression and abdominal pigmentation among Sophophora fruit fly 15

species. Evol Dev 15: 442–457. 16

Siontis K. C. M., Patsopoulos N., Ioannidis J. P., 2010 Replication of past candidate 17

loci for common diseases and phenotypes in 100 genome-wide association studies. 18

Eur. J. Hum. Genet. EJHG 18: 832–7. 19

Stevison L. S., Hoehn K. B., Noor M. A. F., 2011 Effects of inversions on within- and 20

between-species recombination and divergence. Genome Biol Evol 3: 830–841. 21

37

Sturtevant A. H., 1919 A New Species Closely Resembling Drosophila 1

Melanogaster. Psyche J. Entomol. 26: 153–155. 2

Takahashi A., Takahashi K., Ueda R., Takano-Shimizu T., 2007 Natural variation of 3

ebony gene controlling thoracic pigmentation in Drosophila melanogaster. Genetics 4

177: 1233–1237. 5

Takahashi A., Takano-Shimizu T., 2011 Divergent enhancer haplotype of ebony on 6

inversion In(3R)Payne associated with pigmentation variation in a tropical population 7

of Drosophila melanogaster. Mol Ecol: 4277–4287. 8

Telonis-Scott M., Hoffmann A., Sgrò C. M., 2011 The molecular genetics of clinal 9

variation: a case study of ebony and thoracic trident pigmentation in Drosophila 10

melanogaster from eastern Australia. Mol. Ecol. 20: 2100–10. 11

True J. R., Yeh S.-D., Hovemann B. T., Kemme T., Meinertzhagen I. A., et al., 2005 12

Drosophila tan encodes a novel hydrolase required in pigmentation and vision. PLoS 13

Genet 1: e63. 14

Visscher P. M., Brown M., McCarthy M. I., Yang J., 2012 Five years of GWAS 15

discovery. Am. J. Hum. Genet. 90: 7–24. 16

Williams T. M., Selegue J. E., Werner T., Gompel N., Kopp A., et al., 2008 The 17

regulation and evolution of a genetic switch controlling sexually dimorphic traits in 18

Drosophila. Cell 134: 610–23. 19

Wittkopp P. J., Carroll S. B., Kopp A., 2003 Evolution in black and white: genetic 20

control of pigment patterns in Drosophila. Trends Genet. TIG 19: 495–504. 21

38

Wittkopp P. J., Smith-Winberry G., Arnold L. L., Thompson E. M., Cooley A. M., 1

Yuan D. C., Song Q., McAllister B. F., 2010 Local adaptation for body color in 2

Drosophila americana. Heredity 106: 592–602. 3

4

5

6

7

Figure 1 Manhattan plots. Manhattan plots for the effect of SNPs on female abdominal 8

pigmentation for segment A7. The horizontal axis shows the genomic location of each tested 9

SNP, with chromosomal location indicated alternately by black and gray and with vertical lines 10

indicating the locations of the tan (red), pdm3 (yellow), bab1 (green), and ebony (blue) genes. 11

The vertical axis shows the –log10 p-value from the CMH test, with a horizontal line indicating 12

the 0.05 FDR threshold. Results for the experiments with European (A) and South African (B) 13

flies, with detailed views of regions of the tan (C), and bab (D) loci that contain SNPs showing 14

strong associations. In the detailed views, only SNPs with p-values < 0.1 are shown. Grey 15

010

2030

40

X 2L 2R 3L 3R

-log 10

(P)

South Africa

X 2L 2R 3L 3R 4

010

2030

40-lo

g 10(P

)50

60 EuropeA

B

C D

4

1040 1080 1120 1160

0

20

40

60

80bab region

DME AE 2bab1bab

9117 9118 9119 9120 9121

0

20

40

60

80

tan Gr8aCG15370 MSE

EuropeSouth Africa

-log 10

(P)

tan region

-log 10

(P)

EuropeSouth Africa

position on X chromosome (kb)1060 1100 1140 1180

position on chromosome 3L (kb)

●●

●●

●●

●● ●●

●●● ●●●●

●●●

●● ●

●●

●●●

●●●●

●●

●●●

●●

●●●●

●●

●● ●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

> > > > > >> > >

●●

●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●

●●●●●

●●

●●

●●●●●●

●●●●●

●●●●●●●●●

●●●

●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●● ●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●

●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●

●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●

●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●

●●

●●●●●●●●

●●●●

●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●●●

●●

●●

●●●●●

●●

●●●●●

●●

●●●●●●

●●

●●

●●

●●

●●●

●●●●●●

●●

●●●●

●●●●

●●●

●●●

●●●●●●●

●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●●

●●●

●●

●●●

●●●●●

●●

●●

●●●

●●●●●

●●●●●●●●●

●●

●●●●●

●●

●●●●●●●●●

●●●●

●●●●●●●

●●

●●

●●●

●●

●●●●●

●●

●●●●●

●●●

●●●●●●

●●●

●●●●

●●●●

●●

●●

●●●●

●●

●●●●

●●●●●

●●

●●●●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●●●●

●●●

●●●●●

●●●●●

●●●●●●●●●●●●●●●●

●●

●●

●●

●●●●●●●

●●

●●

●●●●●●●●

●●

●●●

●●

●●

●●●

●●●

●●●●●●●●●●

●●●●

●●●●●●

●●●●

●●●●●●●

●●●

●●

●●●●

●●●

●●●●●●

●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●

●●

●●●●●●●●●

●●

●●●●

●●

●●●

●●●●●

●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●

●●

●●●

●●●●●●●

●●●●●●●●●●●●●●

●●●●

●●●●

●●●●●●●●●●●●●●

●●

●●●●●●●●●●

●●●●●●●●

●●●

●●

●●●●●●●●

●●

●●●●

●●

●●●

●●

●●●●

●●●

●●●●●

●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●

●●●

●●●

●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●● ●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●

●●

●●●●●●●●

●●●

●●●●●●●●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●●●

●●●●●

●●●●

●●●●●

●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●

●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●

●●

●●

●●●●●●●

●●

●●●●●●●

●●●●

●●●●

●●●●●●●

●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●

●●●

●●●

●●●●●●●●

●●●

●●●●●●

●●

●●●●●●●

●●

●●

●●●●●●●●●●●

●●●

●●●●●

●●●

●●●●●●●●●●●●●

●●

●●●

●●●●●●

●●

●●

●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●

●●●

●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●

●●

●●

●●●●

●●

●●●●●

●●●●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●

●●

●●●●●●

●●●

●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●● ●

●●●●●

●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

39

rectangles mark regions containing coding sequence of the genes, white rectangles indicate 1

known cis-regulatory elements. The arrowheads indicate the direction of transcription. MSE 2

(male-specific enhancer), DME (dimorphic enhancer) and AE (anterior element) indicate 3

previously described cis-regulatory regions (Jeong et al. 2008; Williams et al. 2008). 4

5

Figure 2 Consistency between European and South African samples. SNPs were 6

ranked by CMH test p-values, with different cutoffs shown on the x-axis. SNPs were ranked 7

according to the results in the European (red line) or South African populations (blue line). 8

One thousand random samples of unassociated SNPs— matched in number, allele 9

frequency, and chromosomal distribution with the analyzed SNPs were used as a comparison 10

(dashed lines). (A) The fraction of SNPs with consistent effects, i.e, where light and dark 11

alleles inferred between the two samples were the same. (B) Correlation coefficients 12

(Spearman’s ρ) for the logOR values of the two populations; the logOR is a measure of the 13

effect of the SNP, with the direction of the effect indicated by its sign. 14

15

40

1 2

Figure 3 Rejection Sampling Results. Heatmap showing the percentage of accepted 3

simulations using an ABC rejection method with 2,000,000 simulations uniformly sampling 4

parameter space and an acceptance rate of 1% using these criteria: 5

(logOR(tan)/logOR(bab1): Europe=2.51, South Africa=0.68; logOR(tanEU)/logOR(tanSA)=2.28 6

and logOR(bab1EU)/logOR(bab1SA)=0.62). Parameters were sampled uniformly from [-0.125, 7

1.125] for dominances and [0.25,4.5] for the additive effect of bab1 relative to tan. For 8

presentation in a grid, the parameter values were binned into intervals of 0.25 for dominance 9

and 1 for relative additive effect. The axes labels indicate the center of each binning interval. 10

The numbers in the cells give the percentage of accepted simulations in each binned 11

parameter interval. Each cell represents a particular range of combinations of dominance of 12

the light alleles at bab1 (top axis) and tan (bottom axis), and the relative additive effect of 13

bab1 to tan loci (left axis). 14

15

16

17

0.5

12

34

rela

tive

effe

ct b

ab1:

tan

bab1 locus

0 0.25 0.5

0.75 1 0 0.25 0.5

0.75 1 0 0.25 0.5

0.75 1 0 0.25 0.5

0.75 1 0 0.25 0.5

0.75 1

0 0.25 0.5 0.75 1

1.2 5.71.7

1.61.3

1912

3.3 7.66.81.4

8.1267.62.8

0 10 20 30Percent Accepted

9.1

dominance

tan locusdominance

41

1 2

3

Table 1. The four best candidate SNPs at the tan and bab loci. SNPs were selected based 4

on their rank in both studies, and had consistent allelic effects in both populations (i.e, the 5

identity of the putative light and dark alleles were consistent between Europe and South 6

Africa). The mean allele frequencies for these SNPs and their ratios of log odds ratios were 7

used in simulations and rejections sampling procedures, as described in the text. AF: allele 8

frequency of the ‘light’ allele, OR: odds ratio. P-values and odds ratios were obtained from the 9

CMH procedure. 10

11

EUROPE SOUTH AFRICA

CHR BPS AF -log10P logOR AF -log10P logOR logOR (EU)/

logOR (SA)

X 9120922 0.22 56.78 2.39 0.29 10.64 1.26

X 9121094 0.18 60.35 3.14 0.17 8.58 1.18

X 9121129 0.18 60.35 2.86 0.16 13.50 1.66

X 9121177 0.21 36.45 1.83 0.06 6.96 1.43

tan MEAN 0.20 2.55 0.17 1.38 1.89

3L 1084990 0.90 8.38 1.20 0.52 42.41 3.03

3L 1085137 0.85 4.35 0.65 0.47 37.04 2.54

3L 1085454 0.85 23.41 1.64 0.58 25.65 2.00

3L 1086356 0.73 4.54 0.57 0.30 23.81 2.06

bab MEAN 0.83 1.02 0.47 2.41 0.44

logOR (tan) / logOR

(bab) 2.51

0.57