interspecific introgressive origin of genomic diversity in ...interspecific introgressive origin of...

6
Interspecific introgressive origin of genomic diversity in the house mouse Kevin J. Liu a,1,2 , Ethan Steinberg a , Alexander Yozzo a , Ying Song b,3 , Michael H. Kohn b,1 , and Luay Nakhleh a,b,1 a Department of Computer Science and b BioSciences, Rice University, Houston, TX 77005 Edited by John C. Avise, University of California, Irvine, CA, and approved November 12, 2014 (received for review April 4, 2014) We report on a genome-wide scan for introgression between the house mouse (Mus musculus domesticus) and the Algerian mouse (Mus spretus), using samples from the ranges of sympatry and allopatry in Africa and Europe. Our analysis reveals wide variabil- ity in introgression signatures along the genomes, as well as across the samples. We find that fewer than half of the autosomes in each genome harbor all detectable introgression, whereas the X chromosome has none. Further, European mice carry more M. spretus alleles than the sympatric African ones. Using the length distribution and sharing patterns of introgressed genomic tracts across the samples, we infer, first, that at least three distinct hybridization events involving M. spretus have occurred, one of which is ancient, and the other two are recent (one presumably due to warfarin rodenticide selection). Second, several of the inferred introgressed tracts contain genes that are likely to confer adaptive advantage. Third, introgressed tracts might contain driver genes that determine the evolutionary fate of those tracts. Further, functional analysis revealed introgressed genes that are essential to fitness, including the Vkorc1 gene, which is implicated in rodenticide resistance, and olfactory receptor genes. Our find- ings highlight the extent and role of introgression in nature and call for careful analysis and interpretation of house mouse data in evolutionary and genetic studies. Mus musculus | Mus spretus | hybridization | adaptive introgression | PhyloNet-HMM C lassical laboratory mouse strains, as well as newly established wild-derived ones, are widely used by geneticists for answering a diverse array of questions (1). Understanding the genome con- tents and architecture of these strains is important for studies of natural variation and complex traits, as well as evolutionary studies in general (2). Mus spretus, a sister species of Mus musculus, impacts the findings in M. musculus investigations for at least two reasons. First, it was deliberately interbred with laboratory M. musculus strains to introduce genetic variation (3). Second, Mus musculus domesticus is partially sympatric (naturally cooccurring) with M. spretus (Fig. 1). Recent studies have examined admixture between subspecies of house mice (58), but have not studied introgression with M. spretus. In at least one case (5), the introgressive descent of the mouse genome was hidden due to data postprocessing that masked introgressed genomic regions as missing data. In an- other study reporting whole-genome sequencing of 17 classical laboratory strains (6), M. spretus was used as an outgroup for phylogenetic analysis. The authors were surprised to find that 12.1% of loci failed to place M. spretus as an outgroup to the M. musculus clade. The authors concluded that M. spretus was not a reliable outgroup but did not pursue their observation fur- ther. On the other hand, in a 2002 study (9), Orth et al. compiled data on allozyme, microsatellite, and mitochondrial variation in house mice from Spain (sympatry) and nearby countries in western and central Europe. Interestingly, allele sharing between the species was observed in the range of sympatry but not outside in the range of allopatry. The studies demonstrated the possibility of natural hybridization between these two sister species. Fur- ther, the study of Song et al. (10) demonstrated a recent adaptive introgression from M. spretus into some M. m. domesticus pop- ulations in the wild, involving the vitamin K epoxide reductase subcomponent 1 (Vkorc1) gene, which was later shown to be more widespread in Europe, albeit geographically restricted to parts of southwestern and central Europe (11). Major, unanswered questions arise from these studies. First, is the vicinity around the Vkorc1 gene an isolated case of adaptive introgression in the house mouse genome, or do many other such regions exist? Second, is introgression between M. spretus and M. m. domesticus common outside the range of sympatry? Third, have there been other hybridization events, and, in particular, more ancient ones? Fourth, what role do introgressed genes, and, more generally, genomic regions, play? To investigate these open questions, we used genome-wide variation data from 20 M. m. domesticus samples (wild and wild- derived) from the ranges of sympatry and allopatry, as well as two M. spretus samples. For detecting introgression, we used PhyloNet- HMM (12), a newly developed method for statistical inference of introgression in genomes while accounting for other evolutionary processes, most notably incomplete lineage sorting (ILS). Our analysis provides answers to the questions posed above. First, we find signatures of introgression between M. spretus and each of the M. m. domesticus samples. The amount of intro- gression varies across the autosomes of each genome, with a few chromosomes harboring all detectable introgression, and most of Significance The mouse has been one of the main mammalian model organ- isms used for genetic and biomedical research. Understanding the evolution of house mouse genomes would shed light not only on genetic interactions and their interplay with traits in the mouse but would also have significant implications for human genetics and health. Analysis using a recently developed sta- tistical method shows that the house mouse genome is a mo- saic that contains previously unrecognized contributions from a different mouse species. We traced these contributions to ancient and recent interbreeding events. Our findings reveal the extent of introgression in an important mammalian ge- nome and provide an approach for genome-wide scans of in- trogression in other eukaryotic genomes. Author contributions: K.J.L., M.H.K., and L.N. designed research; K.J.L. performed re- search; K.J.L., E.S., A.Y., and Y.S. contributed new reagents/analytic tools; K.J.L., M.H.K., and L.N. analyzed data; and K.J.L., M.H.K., and L.N. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession no. GSE62906). 1 To whom correspondence may be addressed. Email: [email protected], [email protected], or [email protected]. 2 Present address: Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824. 3 Present address: The State Key Laboratory for Biology of Plant Diseases and Insect Pests and Key Laboratory of Weed and Rodent Biology and Management, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1406298111/-/DCSupplemental. 196201 | PNAS | January 6, 2015 | vol. 112 | no. 1 www.pnas.org/cgi/doi/10.1073/pnas.1406298111 Downloaded by guest on July 13, 2020

Upload: others

Post on 27-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interspecific introgressive origin of genomic diversity in ...Interspecific introgressive origin of genomic diversity in the house mouse Kevin J. Liua,1,2, Ethan Steinberga, Alexander

Interspecific introgressive origin of genomic diversityin the house mouseKevin J. Liua,1,2, Ethan Steinberga, Alexander Yozzoa, Ying Songb,3, Michael H. Kohnb,1, and Luay Nakhleha,b,1

aDepartment of Computer Science and bBioSciences, Rice University, Houston, TX 77005

Edited by John C. Avise, University of California, Irvine, CA, and approved November 12, 2014 (received for review April 4, 2014)

We report on a genome-wide scan for introgression between thehouse mouse (Mus musculus domesticus) and the Algerian mouse(Mus spretus), using samples from the ranges of sympatry andallopatry in Africa and Europe. Our analysis reveals wide variabil-ity in introgression signatures along the genomes, as well asacross the samples. We find that fewer than half of the autosomesin each genome harbor all detectable introgression, whereas theX chromosome has none. Further, European mice carry moreM. spretus alleles than the sympatric African ones. Using thelength distribution and sharing patterns of introgressed genomictracts across the samples, we infer, first, that at least three distincthybridization events involving M. spretus have occurred, one ofwhich is ancient, and the other two are recent (one presumablydue to warfarin rodenticide selection). Second, several of theinferred introgressed tracts contain genes that are likely to conferadaptive advantage. Third, introgressed tracts might containdriver genes that determine the evolutionary fate of those tracts.Further, functional analysis revealed introgressed genes that areessential to fitness, including the Vkorc1 gene, which is implicatedin rodenticide resistance, and olfactory receptor genes. Our find-ings highlight the extent and role of introgression in nature andcall for careful analysis and interpretation of house mouse datain evolutionary and genetic studies.

Mus musculus | Mus spretus | hybridization | adaptive introgression |PhyloNet-HMM

Classical laboratory mouse strains, as well as newly establishedwild-derived ones, are widely used by geneticists for answering

a diverse array of questions (1). Understanding the genome con-tents and architecture of these strains is important for studies ofnatural variation and complex traits, as well as evolutionary studiesin general (2). Mus spretus, a sister species of Mus musculus,impacts the findings in M. musculus investigations for at leasttwo reasons. First, it was deliberately interbred with laboratoryM. musculus strains to introduce genetic variation (3). Second,Mus musculus domesticus is partially sympatric (naturallycooccurring) with M. spretus (Fig. 1).Recent studies have examined admixture between subspecies

of house mice (5–8), but have not studied introgression withM. spretus. In at least one case (5), the introgressive descent ofthe mouse genome was hidden due to data postprocessing thatmasked introgressed genomic regions as missing data. In an-other study reporting whole-genome sequencing of 17 classicallaboratory strains (6), M. spretus was used as an outgroup forphylogenetic analysis. The authors were surprised to find that12.1% of loci failed to place M. spretus as an outgroup to theM. musculus clade. The authors concluded that M. spretus wasnot a reliable outgroup but did not pursue their observation fur-ther. On the other hand, in a 2002 study (9), Orth et al. compileddata on allozyme, microsatellite, and mitochondrial variationin house mice from Spain (sympatry) and nearby countries inwestern and central Europe. Interestingly, allele sharing betweenthe species was observed in the range of sympatry but not outsidein the range of allopatry. The studies demonstrated the possibilityof natural hybridization between these two sister species. Fur-ther, the study of Song et al. (10) demonstrated a recent adaptive

introgression from M. spretus into some M. m. domesticus pop-ulations in the wild, involving the vitamin K epoxide reductasesubcomponent 1 (Vkorc1) gene, which was later shown to bemore widespread in Europe, albeit geographically restricted toparts of southwestern and central Europe (11).Major, unanswered questions arise from these studies. First, is

the vicinity around the Vkorc1 gene an isolated case of adaptiveintrogression in the house mouse genome, or do many other suchregions exist? Second, is introgression between M. spretus andM. m. domesticus common outside the range of sympatry? Third,have there been other hybridization events, and, in particular,more ancient ones? Fourth, what role do introgressed genes,and, more generally, genomic regions, play?To investigate these open questions, we used genome-wide

variation data from 20 M. m. domesticus samples (wild and wild-derived) from the ranges of sympatry and allopatry, as well as twoM. spretus samples. For detecting introgression, we used PhyloNet-HMM (12), a newly developed method for statistical inference ofintrogression in genomes while accounting for other evolutionaryprocesses, most notably incomplete lineage sorting (ILS).Our analysis provides answers to the questions posed above.

First, we find signatures of introgression between M. spretus andeach of the M. m. domesticus samples. The amount of intro-gression varies across the autosomes of each genome, with a fewchromosomes harboring all detectable introgression, and most of

Significance

The mouse has been one of the main mammalian model organ-isms used for genetic and biomedical research. Understanding theevolution of house mouse genomes would shed light not onlyon genetic interactions and their interplay with traits in themouse but would also have significant implications for humangenetics and health. Analysis using a recently developed sta-tistical method shows that the house mouse genome is a mo-saic that contains previously unrecognized contributions froma different mouse species. We traced these contributions toancient and recent interbreeding events. Our findings revealthe extent of introgression in an important mammalian ge-nome and provide an approach for genome-wide scans of in-trogression in other eukaryotic genomes.

Author contributions: K.J.L., M.H.K., and L.N. designed research; K.J.L. performed re-search; K.J.L., E.S., A.Y., and Y.S. contributed new reagents/analytic tools; K.J.L., M.H.K.,and L.N. analyzed data; and K.J.L., M.H.K., and L.N. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequences reported in this paper have been deposited in the GenBankdatabase (accession no. GSE62906).1To whom correspondence may be addressed. Email: [email protected], [email protected], [email protected].

2Present address: Department of Computer Science and Engineering, Michigan StateUniversity, East Lansing, MI 48824.

3Present address: The State Key Laboratory for Biology of Plant Diseases and Insect Pestsand Key Laboratory of Weed and Rodent Biology and Management, Institute of PlantProtection, Chinese Academy of Agricultural Sciences, Beijing 100193, China.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1406298111/-/DCSupplemental.

196–201 | PNAS | January 6, 2015 | vol. 112 | no. 1 www.pnas.org/cgi/doi/10.1073/pnas.1406298111

Dow

nloa

ded

by g

uest

on

July

13,

202

0

Page 2: Interspecific introgressive origin of genomic diversity in ...Interspecific introgressive origin of genomic diversity in the house mouse Kevin J. Liua,1,2, Ethan Steinberga, Alexander

the chromosomes have none. We detected no introgression on theX chromosome. Further, the amount of introgression variedwidely across the samples. Our analyses demonstrate intro-gression outside the range of sympatry. In fact, our results showmore signatures of introgression in the genomes of allopatricsamples from Europe than in sympatric samples from Africa. Forthe third question, we used the length distribution and sharingpatterns of introgressed regions across the samples to showsupport for at least three hybridization events: an ancient hy-bridization event that predates the colonization of Europe byM. m. domesticus and two more recent events, one of whichpresumably occurred about 50 y ago and is related to warfarinresistance selection (10). For the fourth question, our functionalanalysis of the introgressed genes shows enrichment for certaincategories, most notably olfaction—an essential trait for the fit-ness of rodents. Understanding the genomic architecture andevolutionary history of the house mouse has broad implicationson various aspects of evolutionary, genetic, and biomedical re-search endeavors that use this model organism. The PhyloNet-HMM method (12) can be used to detect introgression in othereukaryotic species, further broadening the impact of this work.

ResultsWe now describe our findings of introgression within the in-dividual genomes, as well as across the genomes of the 20M. m. domesticus samples (40 haploid genomes). The four Africansamples, as well as the two samples from Spain, are sympatric withM. spretus, whereas the other samples are allopatric (Fig. 1).

Genome-Wide Signals of Introgression. Our analysis detected in-trogression between M. spretus and M. m. domesticus in thegenomes of all 20 M. m. domesticus samples; Fig. 2 (SI Appendixprovides complete scans of introgression of all 20 samples).However, the patterns of introgression varied across the chro-mosomes within each individual genome, as well as across thegenomes. In terms of within-genome variability, a few chromo-somes in each genome carried almost all of the introgressedregions. For example, all detected introgressed regions resided

on five chromosomes in the sample from La Roca del Vallès,Spain. For all samples, fewer than half the chromosomes of asample’s genome carried any detected introgression (SI Appendix,Figs. S2–S20). The analysis did not detect any introgression onchromosome X (SI Appendix, Fig. S21). Further, in the two sam-ples from Spain and the six Germany–Hamm samples, one or twochromosomes carried over 50% of all detected introgression.Generally, the percentage of introgressed sites in a genome rangedfrom about 0.02% in a sample from Tunisia to about 0.8% insamples from Germany (Fig. 2). The large extent of detected in-trogression between M. spretus and M. m. domesticus seen onchromosome 17 in the samples from Spain (see SI Appendix)merits further investigation. The introgressed regions spatiallycoincide with the known polymorphic recombination-suppressinginversions and t-hapolotypes in house mice (13).The amount of introgression in the genomes of the 20 samples

points qualitatively to three groups of samples: Group I, whichincludes the two samples from Spain and the six Germany–Hamm samples; Group II, which includes the two other Ger-many samples and the Italy and Greece samples; and Group III,which includes the samples from Africa. Variability in theamount of introgression across samples within each group ismuch smaller than that across groups, as is the amount of sharingof introgressed regions. Further, Group I has the most intro-gression, and Group III has the least. Notice that all sampleswithin Group I, except for the one from Spain–Arenal, containthe introgression with M. spretus that carries Vkorc1 (10). GroupII contains all of the allopatric European mice that do not carryVkorc1, and Group III contains all of the sympatric Africansamples. This categorization guides the displays and analyses ofour results below. These results answer the first two questions weposed above in the affirmative: there are introgressed genomicregions beyond the region that contains Vkorc1, and introgressionis present in all 20 samples, pointing to the spread of introgres-sions beyond the range of sympatry. Quantifying how commonsuch introgressions are outside the range of sympatry, however,requires denser sampling that is beyond the scope of this work.

Support for More Than a Single Hybridization Event. To answer thethird question of whether multiple, distinct hybridization eventsinvolving M. spretus and M. m. domesticus have occurred, wefocused on two analyses: inspecting the introgressed tract lengthdistribution, where an introgressed tract is defined as a maximally

Fig. 1. Species ranges and samples used in our study. The species range ofM. spretus is shown in green (4), and the species range of M. m. domesticusincludes the blue regions, the range of M. spretus, and beyond (1). M. m.domesticus and M. spretus samples were obtained from locations markedwith red circles and purple diamonds, respectively. The samples originatedfrom within and outside the area of sympatry between the two species.(SI Appendix, Table S1, provides additional details about the samples used inour study.)

0 5

10 15 20 25 30 35 40 45

Leng

th (

Mb)

Intr

ogre

ssed

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

SpainArenal

SpainRocadelVallès

Germany

Hamm

A

Germany

Hamm

B

Germany

Hamm

C

Germany

Hamm

D

Germany

Hamm

E

Germany

Hamm

F

Germany

Germany

Remderoda

GreeceKorinthos

GreeceLaganas

ItalyM

enconico

ItalySanGirogio

ItalyCassino

ItalyM

ilazzo

TunisiaM

onastirA

TunisiaM

onastirB

Morocco

Algeria

Per

cent

age

(%)In

trog

ress

ed

A

B

Fig. 2. Amount of introgressed genetic material in the 20 M. m. domesticussamples. (A) The amount of introgressed genetic material in Mb per sample.(B) The amount of introgressed genetic material as a percentage of thegenome length per sample.

Liu et al. PNAS | January 6, 2015 | vol. 112 | no. 1 | 197

EVOLU

TION

Dow

nloa

ded

by g

uest

on

July

13,

202

0

Page 3: Interspecific introgressive origin of genomic diversity in ...Interspecific introgressive origin of genomic diversity in the house mouse Kevin J. Liua,1,2, Ethan Steinberga, Alexander

contiguous introgressed region, and inspecting the sharing patternof introgressed regions across the samples. Repeated back-cross-ing, recombination, and drift result in fragmentation of intro-gressed regions, with very long regions pointing to recenthybridization events. On the other hand, selection on adaptivelyintrogressed regions could also maintain them for long periods,confounding the tract length-based analysis of the age of hy-bridization. However, if a long region is shared across some, butnot all, samples from the population, that increases the likeli-hood of a recent hybridization hypothesis.For each of the three groups, we plotted the distribution of

introgressed tract lengths, where an introgressed tract is definedas a maximally contiguous introgressed region (Fig. 3). The fig-ure shows that Group I contains the only samples that haveintrogressed tracts of lengths exceeding 4 Mb. All these tractscorrespond to the adaptively introgressed region that containsVkorc1 between positions 122 and 132 Mb on chromosome 7(Fig. 4A). The exclusivity of these very long introgressed tracts toGroup I points to a very recent hybridization event involvingM. spretus, in agreement with the assessment of ref. 10. Exceptfor this group of introgressed tracts, the three distributions arevery similar, with an excess of very short tracts and a smallernumber of longer tracts (up to 4 Mb). The very short tracts couldbe a signal of ancient hybridization or just incorrect inference bythe method (detecting very short introgressed regions is veryhard due to low signal-to-noise ratio). However, the pattern ofsharing of introgressed regions across the samples supports ahypothesis of ancient hybridization, as we now discuss.Fig. 4 shows examples of three different patterns of in-

trogression across the samples (full genome-wide scans of allsamples can be found in the SI Appendix). Fig. 4A shows intro-gressed tracts that are shared exclusively among samples inGroup I (we hypothesize that the Spain–Arenal sample un-derwent a secondary loss of the introgressed Vkorc1-containingregion that it once had). As we discussed above, these point to atleast one very recent hybridization event involving M. spretus.Fig. 4B shows introgressed tracts that are shared across samplesfrom all three groups. This pattern points to an ancient hybrid-ization event involving M. spretus and that precedes the ancestorof all M. m. domesticus samples in the study. It is important tonote here that this pattern could also be a signature of balancingselection on standing variation before the split of M. musculusand M. spretus. We discuss this possibility in the Discussion sec-tion below. Fig. 4C shows introgressed regions, of considerablelength, in the sample from Morocco. This is, again, a signature ofa recent hybridization event that is different from that involvingthe large tracts on chromosome 7 in Group I.Putting together all of the evidence, the data supports a hy-

pothesis of at least three distinct hybridization events. One hy-bridization event is ancient, predating the colonization ofEurope by M. m. domesticus upward of 2,000 y ago (15). The

other two hybridization events are more recent, and one of thempresumably occurred about 50 y ago and is related to warfarinresistance selection (10).

Adaptive Signals of Introgression. Introgressed genomic tracts andthe genes they carry are generally assumed to be neutral ordeleterious. Further, such tracts would naturally be expected tobe present in the genomes of sympatric hybridizing taxa. Con-sequently, in the samples we considered here, one would expectto find more introgressed tracts, if any, in the sympatric mice (theAfrican and Spanish samples) than in the allopatric ones.However, our results give a very different picture from theseexpectations. We hypothesize that some of these introgressedtracts have conferred selective advantage on the mice that carrythem. For example, the introgressed region on chromosome 7 inGroup I contains the Vkorc1 gene whose introgression andadaptive role were discussed in the context of warfarin resistanceselection in ref. 10. To identify whether other introgressedregions of adaptive roles are associated with that Vkorc1-con-taining region, we applied the selective sweep measure of ref. 14to a comparison of rodenticide-resistant to rodenticide-suscep-tible wild M. m. domesticus samples, which favors detection ofthe recent rodenticide-related selective sweep (within the last∼50 y). Not surprisingly, the selective sweep statistics in theVkorc1-containing region were among the largest of any detectedin our study (Fig. 4A) (see SI Appendix for full results). We alsodetected selective sweeps outside the Vkorc1 region.To assess the potential adaptive benefit of other introgressed

regions, we used the frequencies of the introgressed regions, asreflected by the sharing patterns. For example, the shared intro-gressed regions across sympatric and allopatric samples on chro-mosome 1 and 7, as shown in Fig. 4B, point to a hypothesis ofadaptive roles of parts of these regions. To further zoom in on theseshared regions, we analyzed the sharing patterns of genes across theintrogressed regions in all samples. Fig. 5 shows the Venn diagramof the sets of introgressed genes in Groups I, II, and III.Fig. 5 shows that the two European groups have 399 intro-

gressed genes in common, almost twice the number of intro-gressed genes that are in common between either of them andthe African group. We hypothesize that the set of 157 genes thatare shared across all three groups contain a subset that we call“driver genes”—those that have driven the maintenance of thoseintrogressed regions for a long time across the samples. In ourproposed classification, driver genes would be beneficial uponintrogression and would be subject to selection. Genes that areintrogressed in one group, but not the others, are potentialneutral, linked “passenger” genes. Although passenger geneswould be expected to be neutral, they could also introduce newpolymorphisms into M. m. domesticus genomes and could be-come subject to selection at some point during its sojourn time.

1

10

100

0 20 40 60 80 100 120

Cou

nt

Tract length (100 kb)

1

10

100

0 5 10 15 20 25 30 35 40

Cou

nt

Tract length (100 kb)

1

10

100

0 5 10 15 20 25 30 35 40 45

Cou

nt

Tract length (100 kb)

A B C

Fig. 3. Distributions of introgressed tract lengths detected in the 40 haploid genomes. (A) Group I: The six Germany–Hamm samples and two Spain samples.(B) Group II: The four Italy, two Greece, and two other Germany samples. (C) Group III: The Algeria, Morocco, and two Tunisia samples. Note the x axis scaledifference between panel A and the other two panels. (See main text for the rationale behind the grouping.)

198 | www.pnas.org/cgi/doi/10.1073/pnas.1406298111 Liu et al.

Dow

nloa

ded

by g

uest

on

July

13,

202

0

Page 4: Interspecific introgressive origin of genomic diversity in ...Interspecific introgressive origin of genomic diversity in the house mouse Kevin J. Liua,1,2, Ethan Steinberga, Alexander

Individual driver genes on introgressed tracts are not expectedto result in functional enrichment scores. For example, theintrogressed region that contains Vkorc1 on chromosome 7 hasmany other genes, yet is not enriched for any functional cate-gories. On the other hand, introgressed tracts with an abundanceof genes from a given family tend to result in significant en-richment scores of the tracts. We illustrate this with two exam-ples related to olfaction, a multigenic trait known to be essentialfor the fitness of rodents (full lists of the genes in introgressedregions and their Gene Ontology functional enrichment aregiven in the SI Appendix). The introgressed tract on chromosome1 that is shared by mice from Africa and Europe, including al-lopatric mice, is significantly enriched [P = 5.9E-8 after Benjaminiand Hochberg (16) correction] for genes involved in olfactorytransduction and encodes olfactory receptor genes (13 out of 36genes) located on the contiguous tracts (Fig. 4B). It is conceivablethat this group of genes, or a subset of these, may have acted asa driver for this introgressed fragment carrying at least anotherset of 23 passenger genes. Similarly, the region on chromosome 7shared by the African samples is highly enriched (P = 3.6E-6) forgenes involved in olfactory transduction and encodes at least 15olfactory receptors among at least 62 genes situated on this tract.Evidently, large tracts have become polymorphic for introgressedand native repertoires of olfactory receptor alleles.

DiscussionThe biological significance of hybridization and introgression inthe evolution of new traits in natural eukaryotic populations hasignited much research into these two processes (17). Intro-gressed genetic material can be neutral and go unnoticed interms of phenotypes but can also be adaptive and affect phe-notypes (10, 18). Notably, these processes have played a crucialrole in the domestication of plants and animals and appear to becommon in natural populations of plants (17). Additionally, theimportance of introgression has become a central discussionpoint when reconstructing the evolution of primates, includinghumans (19). Further, it has now become clear that the genomesof model organisms before their adoption as laboratory modelsby humans have been shaped by hybridization and introgressionin their natural ancestral populations, such as in mice andmacaques (9, 20, 21). Such influx of genetic variation of inter-subspecific or interspecific origins is expected to continue, aswild-derived strains of mice will contribute to the CollaborativeCross in laboratory mice (22), and primate research centerscontinue to rely on imports of macaques from Asia.Large-scale efforts have been made to decode the genetic

background of most commonly used laboratory mouse strains,including inbred and wild-derived strains of M. m. domesticus,and of other subspecies of the laboratory mouse, includingM. m. musculus and M. m. castaneus (2). Among the numerousinsights of the evolutionary genomic analyses of the laboratorymouse and its wild relatives were that intersubspecific in-trogression between strains has been common (2). In addition tounderstanding the ancestry and mosaic structure of laboratorymouse genomes, detecting introgression is also of biomedicalsignificance. In a recent study (10), the authors discovered ina mouse model resistance to the commonly used anticoagulantwarfarin (23) through the acquisition of a mutated version ofa key enzyme of the vitamin K cycle, Vkorc1, that is targeted bywarfarin. Whereas previous genome-wide studies in mice focusedon polymorphism and introgression within theM. musculus group(2, 5), we focused here on introgression involving genomic ma-terial of M. spretus and the genomes of several M. m. domesticussamples from the regions of sympatry and allopatry in Africa andEurope. These analyses are now enabled by our recently de-veloped method for statistical inference of introgression in thepresence of other evolutionary events, most notably incompletelineage sorting (12).

AlgeriaMorocco

Tunisia Monastir BTunisia Monastir A

Italy MilazzoItaly Cassino

Italy SanGirogioItaly Menconico

Greece LagunasGreece Korinthos

Germany RemderodaGermanyGermany Hamm FGermany Hamm EGermany Hamm DGermany Hamm CGermany Hamm BGermany Hamm A

Spain RocadelVallèsSpain Arenal

02468

1012

186 189

Nor

mal

ized

XP

CLR

scor

e

chr1

59 63

chr5

Vkorc1122 132

chr7

58 61

chr15

74 79

chr15

102 105

chr15

0 2 4 6 8 10 12

AlgeriaMorocco

Tunisia Monastir BTunisia Monastir A

Italy MilazzoItaly Cassino

Italy SanGirogioItaly Menconico

Greece LagunasGreece Korinthos

Germany RemderodaGermanyGermany Hamm FGermany Hamm EGermany Hamm DGermany Hamm CGermany Hamm BGermany Hamm A

Spain RocadelVallèsSpain Arenal

02468

1012

172 175

Nor

mal

ized

XP

CLR

scor

e

chr1

67 71

chr6

101 108

chr7

22 26

chr10

19 22

chr12

66 69

chr18

0 2 4 6 8 10 12

AlgeriaMorocco

Tunisia Monastir BTunisia Monastir A

Italy MilazzoItaly Cassino

Italy SanGirogioItaly Menconico

Greece LagunasGreece Korinthos

Germany RemderodaGermanyGermany Hamm FGermany Hamm EGermany Hamm DGermany Hamm CGermany Hamm BGermany Hamm A

Spain RocadelVallèsSpain Arenal

02468

1012

57 64

Nor

mal

ized

XP

CLR

scor

e

chr3

82 89

chr7

0 2 4 6 8 10 12

A

B

C

Fig. 4. Three different introgression patterns across the 20 M. m. domesticussamples. (A) Introgressed regions that are exclusive to the Germany–Hamm and Spain samples. (B) Introgressed regions that are shared acrossthe samples. (C ) Introgressed regions that are exclusive to African sam-ples. For each sample, scans from both haploid chromosomes are shown.A posterior decoding cutoff of 95% was used to declare a site intro-gressed (see main text for more details). The red squares on the x axis ofthe top part of each panel denotes the locations of genes in introgressedregions (given the scale, the squares appear overlapping, but the genesare not overlapping). The location of Vkorc1 on chromosome 7 is in-dicated with a dashed vertical line in A. The bottom part of each panelshows selective sweep statistics, which are normalized XP-CLR scores (14)based on a comparison of rodenticide-resistant to rodenticide-susceptiblewild M. m. domesticus samples. Scale of x axis is in megabases.

Liu et al. PNAS | January 6, 2015 | vol. 112 | no. 1 | 199

EVOLU

TION

Dow

nloa

ded

by g

uest

on

July

13,

202

0

Page 5: Interspecific introgressive origin of genomic diversity in ...Interspecific introgressive origin of genomic diversity in the house mouse Kevin J. Liua,1,2, Ethan Steinberga, Alexander

In terms of the debate surrounding the importance of in-trogression in animal evolution, an important result of ourgenome-wide study is that it is not only a recent strong andpotentially human-driven selection (warfarin rodenticide andVkorc1) that has promoted introgression in natural populationsof mice. In fact, hybridization and introgression betweenM. spretus and M. m. domesticus appear to be natural pro-cesses spanning at least several thousand years. Nevertheless,even from a rather dense genome-wide survey such as ours it isdifficult to discern how frequently introgression occurs. This isbecause most hybridization does not lead to introgression, asdrift and selection tend to remove introgressed regions. How-ever, here we infer at least three hybridization events, one in thedistant past and two of more recent timing (including the in-trogression of Vkorc1). We suspect that this is an underestimateof the frequency of hybridization and introgression betweenM. spretus and M. m. domesticus in the wild because the specieshave established secondary contact a few thousand years agowhen house mice reached the Mediterranean Basin on theirwestward spread into northern Africa and Europe.In terms of a role of selection on introgressed tracts, our

genome-wide scan revealed informative patterns. First, in-trogression is limited to a few autosomes and absent from the Xchromosome. This is consistent with a strong role of purifyingselection and drift in the removal of introgressed material. Wefind it noteworthy that tracts are very frequently found in thehomozygous state, which indicates that introgressed variants canbe recessive as well as dominant. The sharing of tracts acrosssamples is consistent with positive selection on introgressed ma-terial (adaptive introgression). Such tract sharing is observed overlong time scales, such as for chromosome 1, as well as over shortertime scales and locally, as is suggested by tract sharing by subsetsof samples from presumably local populations, such as Hamm inGermany or African samples. Finally, it is common to infer thatintrogressed regions are adaptive if these are found outside thearea of sympatry. As we observed numerous introgressed tracts,both presumably old and young, as judged by their tract lengths, itis reasonable to assume that selection might have favored thespread of some of these variants into the allopatric range of housemice. It is important to note here that more sampling from the twospecies and, potentially, subspecies of M. musculus, would benecessary to determine, with more certainty, the directionality ofthe introgression between the two species’ genomes.Selection could have confounding effects on our analyses and

inferences. Specifically, it is important to note that various evo-lutionary events could give rise to genomic patterns and signalsthat resemble those created by introgression, including ILS,convergence, and ancestral polymorphism coupled with balanc-ing selection (24). Indeed, all of these processes could confoundthe detection of introgression (25). The introgression detectionmethod (12) that we use here accounts for ILS (hence, ancestralpolymorphism that is not under balancing selection) and employs

finite-site models in a statistical inference framework, whichhelps account for convergence at the nucleotide level. Currently,methods that automatically distinguish introgression from bal-ancing selection do not, however, exist. For example, recentstudies of adaptive introgression (26, 27) focused solely on in-trogression. Two studies have recently reported on signaturesof balancing selection in house mouse genomes (28, 29).However, each of the studies reported on a single, very shortregion that was shared across all of the samples, including fromthe various subspecies considered. Our current analysis of mul-tiple M. m. domesticus from the ranges of sympatry and allopatryin Africa and Europe mostly point to a very polymorphic signalof introgression across these samples, which, consequently,decreases the likelihood that ancestral polymorphism withbalancing selection acting on it could explain the patterns wesee in the data. We also scanned genomes of samples fromM. m. musculus, which is a sister subspecies of M. m. domesticus(SI Appendix, Fig. S23). As the figure shows, no introgressionwas detected in the M. m. musculus sample (a similar result tothat shown in ref. 12), which further weakens the plausibility ofbalancing selection acting from the most recent common an-cestor of M. musculus and M. spretus until the present day.Furthermore, many of the introgressed regions we detect aremuch longer than regions with balancing selection signaturesreported in previous studies. For example, in humans, DeGiorgioet al. (30) reported that the maximum contiguous genomic regionwith balancing selection signal was of length ∼40 kb (near theFANK1 gene) and which “surprised” them because it was “ab-normally large for balancing selection” (ref. 29, p. 11). Intro-gressed tract lengths on the order of megabases would requiremuch stronger balancing selection than have been previouslyreported in the literature. Still, for some of the shared intro-gressed regions, we cannot rule out the possibility of balancingselection, as an alternative to introgression. Whereas our methodfor detecting introgression can be confounded by balancing se-lection, a similar issue might arise for methods that detect bal-ancing selection. For example, DeGiorgio et al. noted recentlythat gene flow is very likely to confound their test for balancingselection (30). New methods need to be developed to account forthe two processes simultaneously, as detecting balancing selectioncould shed light on the evolution of species (31).Adaptive introgressive hybridization may be an important

source for novel functional genetic variants, and combinationsthereof, that encode novel traits upon which selection could act.Here, one objective of the study is to discern whether multipleintrogressed genes encode the previously reported warfarin re-sistance trait. Our functional annotation of the introgressed tractdid not reveal any discernible enrichment for genes that clearlymodulate warfarin resistance. Moreover, we did not observea pattern where all mice that carry the Vkorc1 introgression shareintrogressed tracks of comparable length. We therefore concludethat, until further samples are analyzed and tracts are annotated interms of function more comprehensively, the adaptive introgressivehybridization leading to warfarin resistance in house mice requiresthe introgression of only Vkorc1. Whether the introgressed Vkorc1interacts with the native genes in pathways resulting in warfarinresistance cannot be deduced from our analysis at this time.We observed that the raw material for a complex trait known to

be important to rodent life history could be introgressed.We noticedthe large numbers of olfactory receptor genes for which now poly-morphic and divergent copies segregate in the populations ofhouse mice (32). It is known that both minor nucleotide differ-ences, as well as larger scale differences in the olfactory receptorrepertoire, have measurable phenotypic consequences in mice(32). Thus, we hypothesize that wild mice from Europe that haveexperienced gene flow, such as seen for chromosome 1 which isenriched for olfactory receptor clusters, may indeed be the sub-ject of natural selection.

Fig. 5. Venn diagram of the three sets of genes in introgressed regions inGroups I, II, and III of samples. Introgression was called based on a posteriordecoding probability cutoff of 95%. For each circle in the Venn diagram, twoquantities are shown: (Top) the number of genes and (Bottom) the per-centage of all introgressed genes found in our study.

200 | www.pnas.org/cgi/doi/10.1073/pnas.1406298111 Liu et al.

Dow

nloa

ded

by g

uest

on

July

13,

202

0

Page 6: Interspecific introgressive origin of genomic diversity in ...Interspecific introgressive origin of genomic diversity in the house mouse Kevin J. Liua,1,2, Ethan Steinberga, Alexander

Materials and MethodsOur study used two M. spretus samples and twenty M. m. domesticus sam-ples from the ranges of sympatry and allopatry that were either newlysampled or from previous publications. The work relied on tissue sharing andwas exempted by Rice University’s institutional review board. For details onthe samples, compiling the sequence data, genotyping and phasing it, as wellas single nucleotide polymorphism (SNP) calling, see the SI Appendix. Weused PhyloNet-HMM (12) to scan M. musculus genomes for segments withintrogressed origin from M. spretus. For every haploid M. m. domesticusgenome, we analyzed it along with the M. spretus genomes and detectedintrogression. See the SI Appendix for full details on how the analysis wasdone. We used XP-CLR (14) version 1.0 to scan for selective sweep patterns.

The method was run using its default settings. See the SI Appendix for the setof samples used in these scans.

ACKNOWLEDGMENTS. We thank Yun Yu for help with the PhyloNet-HMMsoftware and Stefan Endepols for sharing mouse tissues. The work waspartially supported by R01-HL091007-01A1 (to M.H.K.) from the NationalInstitutes of Health, National Heart, Lung and Blood Institute, and by startupfunds from Rice University (to M.H.K.). L.N. was supported in part byNational Science Foundation Grants DBI-1062463 and CCF-1302179 andGrant R01LM009494 from the National Library of Medicine (NLM). K.J.L.was partially supported by a training fellowship from the Keck Center of theGulf Coast Consortia, on the NLM Training Program in Biomedical In-formatics, NLM T15LM007093.

1. Guénet J-L, Bonhomme F (2003) Wild mice: An ever-increasing contribution to apopular mammalian model. Trends Genet 19(1):24–31.

2. Yang H, Bell TA, Churchill GA, Pardo-Manuel de Villena F (2007) On the subspecificorigin of the laboratory mouse. Nat Genet 39(9):1100–1107.

3. Bonhomme F, Martin S, Thaler L (1978) Hybridation en laboratoire deMus musculus L.et Mus spretus Lataste. Experientia 34(9):1140–1141.

4. Palomoa L, Justob E, Vargasa J (2009) Mus spretus (Rodentia: Muridae). Mamm Spe-cies 840:1–10.

5. Yang H, et al. (2011) Subspecific origin and haplotype diversity in the laboratorymouse. Nat Genet 43(7):648–655.

6. Keane TM, et al. (2011) Mouse genomic variation and its effect on phenotypes andgene regulation. Nature 477(7364):289–294.

7. Staubach F, et al. (2012) Genome patterns of selection and introgression of hap-lotypes in natural populations of the house mouse (Mus musculus). PLoS Genet 8(8):e1002891.

8. Teeter KC, et al. (2008) Genome-wide patterns of gene flow across a house mousehybrid zone. Genome Res 18(1):67–76.

9. Orth A, et al. (2002) [Natural hybridization between 2 sympatric species of mice, Musmusculus domesticus L. and Mus spretus Lataste]. C R Biol 325(2):89–97.

10. Song Y, et al. (2011) Adaptive introgression of anticoagulant rodent poison resistanceby hybridization between old world mice. Curr Biol 21(15):1296–1301.

11. Pelz H-J, et al. (2012) Distribution and frequency of Vkorc1 sequence variants con-ferring resistance to anticoagulants in Mus musculus. Pest Manag Sci 68(2):254–259.

12. Liu KJ, et al. (2014) An HMM-based comparative genomic framework for detectingintrogression in eukaryotes. PLOS Comput Biol 10(6):e1003649.

13. Hammer MF, Schimenti J, Silver LM (1989) Evolution of mouse chromosome 17 andthe origin of inversions associated with t haplotypes. Proc Natl Acad Sci USA 86(9):3261–3265.

14. Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selectivesweeps. Genome Res 20(3):393–402.

15. Auffray J-C, Britton-Davidian J (1992) When did the house mouse colonize Europe?Biol J Linn Soc Lond 45(2):187–190.

16. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical andpowerful approach to multiple testing. J R Stat Soc, B 57(1):289–300.

17. Arnold ML (2004) Transfer and origin of adaptations through natural hybridization:Were Anderson and Stebbins right? Plant Cell 16(3):562–570.

18. Schmidt LH, Fradkin R, Harrison J, Rossan RN (1977) Differences in the virulence ofPlasmodium knowlesi for Macaca irus (fascicularis) of Philippine and Malayan origins.Am J Trop Med Hyg 26(4):612–622.

19. Arnold ML, Meyer A (2006) Natural hybridization in primates: One evolutionarymechanism. Zoology (Jena) 109(4):261–276.

20. Stevison LS, Kohn MH (2008) Determining genetic background in captive stocks ofcynomolgus macaques (Macaca fascicularis). J Med Primatol 37(6):311–317.

21. Osada N, et al. (2010) Ancient genome-wide admixture extends beyond the currenthybrid zone between Macaca fascicularis and M. mulatta. Mol Ecol 19(14):2884–2895.

22. Churchill GA, et al. (2004) The Collaborative Cross, a community resource for thegenetic analysis of complex traits. Nat Genet 36(11):1133–1137.

23. Scully M (2002) Warfarin therapy. The Biochemist 24:15–17.24. Hedrick PW (2013) Adaptive introgression in animals: Examples and comparison to

new mutation and standing variation as sources of adaptive variation. Mol Ecol22(18):4606–4618.

25. Nakhleh L (2013) Computational approaches to species phylogeny inference and genetree reconciliation. Trends Ecol Evol 28(12):719–728.

26. Green RE, et al. (2010) A draft sequence of the Neandertal genome. Science 328(5979):710–722.

27. Heliconius Genome Consortium (2012) Butterfly genome reveals promiscuous ex-change of mimicry adaptations among species. Nature 487(7405):94–98.

28. FergusonW, Dvora S, Gallo J, Orth A, Boissinot S (2008) Long-term balancing selectionat the West Nile virus resistance gene, Oas1b, maintains transspecific polymorphismsin the house mouse. Mol Biol Evol 25(8):1609–1618.

29. Linnenbrink M, et al. (2011) Long-term balancing selection at the blood group-relatedgene B4galnt2 in the genus Mus (Rodentia; Muridae).Mol Biol Evol 28(11):2999–3003.

30. DeGiorgio M, Lohmueller KE, Nielsen R (2014) A model-based approach for identi-fying signatures of ancient balancing selection in genetic data. PLoS Genet 10(8):e1004561.

31. Leffler EM, et al. (2013) Multiple instances of ancient balancing selection sharedbetween humans and chimpanzees. Science 339(6127):1578–1582.

32. Young JM, Trask BJ (2002) The sense of smell: Genomics of vertebrate odorant re-ceptors. Hum Mol Genet 11(10):1153–1160.

Liu et al. PNAS | January 6, 2015 | vol. 112 | no. 1 | 201

EVOLU

TION

Dow

nloa

ded

by g

uest

on

July

13,

202

0