a rare deep-rooting d0 african y-chromosomal haplogroup ......2019/06/13  · email:...

22
1 A Rare Deep-Rooting D0 African Y-chromosomal Haplogroup and its Implications for the Expansion of Modern Humans Out of Africa Marc Haber,* Abigail L. Jones,† Bruce A. Connell,‡ Asan, § Elena Arciero,* Huanming Yang, §, ** Mark G. Thomas,†† Yali Xue,* and Chris Tyler-Smith* *The Wellcome Sanger Institute, Hinxton, Cambs. CB10 1SA, UK, †Liverpool Women's Hospital, Liverpool L8 7SS, UK, ‡Glendon College, York University, Toronto, Ontario M4N 3N6, Canada, § BGI-Shenzhen, Shenzhen 518083, China, **James D. Watson Institute of Genome Science, 310008 Hangzhou, China, ††Research Department of Genetics, Evolution and Environment and UCL Genetics Institute, University College London, Gower Street, London WC1E 6BT, UK. Corresponding author: Chris Tyler-Smith, The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambs. CB10 1SA, UK. Tel: [+44] [0] 1223 495376. Email: [email protected] ABSTRACT Present-day humans outside Africa descend mainly from a single expansion out ~50,000-70,000 years ago, but many details of this expansion remain unclear, including the history of the male-specific Y chromosome at this time. Here, we re-investigate a rare deep-rooting African Y-chromosomal lineage by sequencing the whole genomes of three Nigerian men described in 2003 as carrying haplogroup DE* Y-chromosomes, and analyzing them in the context of a calibrated worldwide Y-chromosomal phylogeny. We confirm that these three chromosomes do represent a deep-rooting DE lineage, branching close to the DE bifurcation, but place them on the D branch as an outgroup to all other known D chromosomes, and designate the new lineage D0. We consider three models for the expansion of Y lineages out of Africa ~50,000-100,000 years ago, incorporating migration back to Africa where necessary to explain present-day Y-lineage distributions. Considering both the Y-chromosomal phylogenetic structure incorporating the D0 lineage, and published evidence for modern humans outside Africa, the most favored model involves an origin of the DE lineage within Africa with D0 and E remaining there, and migration out of the three lineages (C, D and FT) that now form the vast majority of non-African Y chromosomes. The exit took place 50,300-81,000 years ago (latest date for FT lineage expansion outside Africa - earliest date for the D/D0 lineage split inside Africa), and most likely 50,300-59,400 years ago (considering Neanderthal admixture). This work resolves a long-running debate about Y-chromosomal out- of-Africa/back-to-Africa migrations, and provides insights into the out-of-Africa expansion more generally. umans outside Africa derive most of their genetic ancestry from a single migration event 50,000- 70,000 years ago, according to the current model supported by genetic data from genome-wide (Mallick et al. 2016; Pagani et al. 2016), mtDNA (Van Oven and Kayser 2009) and Y-chromosomal (Wei et al. 2013; Hallast et al. 2015; Karmin et al. 2015; Poznik et al. 2016) analyses. The migrating population carried only a small subset of African genetic diversity, particularly strikingly for the non-recombining mtDNA and Y chromosome where robust calibrated high-resolution phylogenies can be constructed, and in each case all non-African lineages descend from a single African lineage, L3 for mtDNA or CT-M168 for the Y chromosome. Yet there has been a long-running debate about the early spread of Y- chromosomal lineages because their current distributions do not fit a simple phylogeographical model. The CT-M168 branch diverged within a short time interval into three lineages (C-M130, DE-M145 and FT- M89), and just a few thousand years later the lineage DE- M145 further split into D-M174 and E-M96 (Poznik et al. 2016), illustrated in Figure S1. Thus around the time of the expansion out of Africa, between one (CT-M168) and four (C-M130, D-M174, E-M96 and F-M89) of the known extant non-African lineages were in existence (plus additional African lineages). The complexity arises because three of these four early lineages (C-M130, D- M174 and FT-M89) are exclusively non-African, apart from recent gene flow; while the fourth lineage (E-M96) is largely African, where it constitutes the major lineage in most African populations. The debate began in the absence of reliable calibration, and these distributions were interpreted as arising in two contrasting ways: (1) an Asian origin of DE-M145 (also known as the YAP+ lineage), implying migration of CT-M168 out of Africa followed by divergence into the four lineages outside Africa and then migration of E-M96 back to Africa (Altheide and Hammer 1997; Hammer et al. 1998; Bravi et al. 2000), or (2) an African origin of DE-M145, implying divergence of CT-M168 within Africa followed by migration of C-M130, D-M174 and FT-M89 out (Underhill et al. 2001; Underhill and Roseman 2001). The first scenario requires two inter-continental lineage migrations, while the second requires three and is thus slightly less parsimonious. An additional very rare haplogroup, DE*, carrying variants that define DE but none of those that define D H Genetics: Early Online, published on June 13, 2019 as 10.1534/genetics.119.302368 Copyright 2019.

Upload: others

Post on 13-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

1

A Rare Deep-Rooting D0 African Y-chromosomal Haplogroup and its Implications for the Expansion of Modern Humans Out of Africa

Marc Haber,* Abigail L. Jones,† Bruce A. Connell,‡ Asan,§ Elena Arciero,* Huanming Yang, §,** Mark G. Thomas,†† Yali Xue,* and Chris Tyler-Smith* *The Wellcome Sanger Institute, Hinxton, Cambs. CB10 1SA, UK, †Liverpool Women's Hospital, Liverpool L8 7SS, UK, ‡Glendon College, York University, Toronto, Ontario M4N 3N6, Canada, §BGI-Shenzhen, Shenzhen 518083, China, **James D. Watson Institute of Genome Science, 310008 Hangzhou, China, ††Research Department of Genetics, Evolution and Environment and UCL Genetics Institute, University College London, Gower Street, London WC1E 6BT, UK. Corresponding author: Chris Tyler-Smith, The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambs. CB10 1SA, UK. Tel: [+44] [0] 1223 495376. Email: [email protected]

ABSTRACT Present-day humans outside Africa descend mainly from a single expansion out ~50,000-70,000 years ago, but many details of this expansion remain unclear, including the history of the male-specific Y chromosome at this time. Here, we re-investigate a rare deep-rooting African Y-chromosomal lineage by sequencing the whole genomes of three Nigerian men described in 2003 as carrying haplogroup DE* Y-chromosomes, and analyzing them in the context of a calibrated worldwide Y-chromosomal phylogeny. We confirm that these three chromosomes do represent a deep-rooting DE lineage, branching close to the DE bifurcation, but place them on the D branch as an outgroup to all other known D chromosomes, and designate the new lineage D0. We consider three models for the expansion of Y lineages out of Africa ~50,000-100,000 years ago, incorporating migration back to Africa where necessary to explain present-day Y-lineage distributions. Considering both the Y-chromosomal phylogenetic structure incorporating the D0 lineage, and published evidence for modern humans outside Africa, the most favored model involves an origin of the DE lineage within Africa with D0 and E remaining there, and migration out of the three lineages (C, D and FT) that now form the vast majority of non-African Y chromosomes. The exit took place 50,300-81,000 years ago (latest date for FT lineage expansion outside Africa - earliest date for the D/D0 lineage split inside Africa), and most likely 50,300-59,400 years ago (considering Neanderthal admixture). This work resolves a long-running debate about Y-chromosomal out-of-Africa/back-to-Africa migrations, and provides insights into the out-of-Africa expansion more generally.

umans outside Africa derive most of their genetic ancestry from a single migration event 50,000-

70,000 years ago, according to the current model supported by genetic data from genome-wide (Mallick et al. 2016; Pagani et al. 2016), mtDNA (Van Oven and Kayser 2009) and Y-chromosomal (Wei et al. 2013; Hallast et al. 2015; Karmin et al. 2015; Poznik et al. 2016) analyses. The migrating population carried only a small subset of African genetic diversity, particularly strikingly for the non-recombining mtDNA and Y chromosome where robust calibrated high-resolution phylogenies can be constructed, and in each case all non-African lineages descend from a single African lineage, L3 for mtDNA or CT-M168 for the Y chromosome. Yet there has been a long-running debate about the early spread of Y-chromosomal lineages because their current distributions do not fit a simple phylogeographical model. The CT-M168 branch diverged within a short time interval into three lineages (C-M130, DE-M145 and FT-M89), and just a few thousand years later the lineage DE-M145 further split into D-M174 and E-M96 (Poznik et al. 2016), illustrated in Figure S1. Thus around the time of the expansion out of Africa, between one (CT-M168) and four (C-M130, D-M174, E-M96 and F-M89) of the known

extant non-African lineages were in existence (plus additional African lineages). The complexity arises because three of these four early lineages (C-M130, D-M174 and FT-M89) are exclusively non-African, apart from recent gene flow; while the fourth lineage (E-M96) is largely African, where it constitutes the major lineage in most African populations. The debate began in the absence of reliable calibration, and these distributions were interpreted as arising in two contrasting ways: (1) an Asian origin of DE-M145 (also known as the YAP+ lineage), implying migration of CT-M168 out of Africa followed by divergence into the four lineages outside Africa and then migration of E-M96 back to Africa (Altheide and Hammer 1997; Hammer et al. 1998; Bravi et al. 2000), or (2) an African origin of DE-M145, implying divergence of CT-M168 within Africa followed by migration of C-M130, D-M174 and FT-M89 out (Underhill et al. 2001; Underhill and Roseman 2001). The first scenario requires two inter-continental lineage migrations, while the second requires three and is thus slightly less parsimonious.

An additional very rare haplogroup, DE*, carrying variants that define DE but none of those that define D

H

Genetics: Early Online, published on June 13, 2019 as 10.1534/genetics.119.302368

Copyright 2019.

2

or E individually, added to this complexity. First identified in five out of 1247 Nigerians within a worldwide study of >8000 men (Weale et al. 2003), DE* chromosomes were subsequently reported in a single man among 282 from Guinea-Bissau in West Africa (Rosa et al. 2007) and in two out of 722 Tibetans within a study of 5783 East Asians (Shi et al. 2008). While the phylogeographic significance of these rare lineages was immediately recognized, their interpretation was hindered by the incomplete resolution of the phylogenetic branching pattern and the possibility that they might originate from back-mutations at the small numbers of variants used to define the key D and E haplogroups, or genotyping errors rather than representing deeply divergent lineages, plus the lack of a robust timescale. Large-scale sequencing of Y chromosomes has now provided both the full phylogenetic resolution and the timescale needed (Wei et al. 2013; Hallast et al. 2015; Karmin et al. 2015; Poznik et al. 2016), so we have therefore reinvestigated the original Nigerian DE* chromosomes using whole-genome sequencing to clarify their phylogenetic position. We then consider the implications for the out-of-Africa/back-to-Africa debate related to Y-chromosomal lineages, and the expansion out of Africa more generally.

METHODS

Samples and sequencing

We analyzed five DE* samples described previously (Weale et al. 2003), in the context of published worldwide Y-chromosomal sequences including Japanese D and many E Y chromosomes (Mallick et al. 2016). We also included four haplogroup D samples from Tibet (Xue et al. 2006), which were newly sequenced for this study; the Japanese and Tibetan D chromosomes represent the deepest known split within D, since Andamanese D chromosomes lie on the same branch as the Japanese (Mondal et al. 2017). Sequencing of the Nigerian samples was carried out at the Wellcome Sanger institute on the Illumina HiSeq X Ten platform (paired-end read length 150bp) to a Y chromosome mean coverage of ~16X. Sequences were processed using biobambam version 2.0.79 to remove adapters, mark duplicates, and sort reads. bwa-mem version 0.7.16a was used to map the reads to the hs37d5 reference genome. We found that two pairs of individuals were likely duplicates (Figure S2) and thus one of each pair was removed, leaving three Nigerian individuals for further analysis. The four individuals from Tibet were sequenced in the same way to a Y chromosome mean coverage of ~18X.

For comparative data from other haplogroups, we obtained Y chromosome bam files for 173 males representing worldwide populations from the Simons Genome Diversity Project (SGDP) (Mallick et al. 2016).

Data analysis

Y chromosome genotypes were called jointly from all 180 samples using FreeBayes v1.2.0 (Garrison and Marth 2012) with the arguments “--report-monomorphic” and “--ploidy 1”. Calling was restricted to 10.3 Mb of the Y chromosome previously determined to be accessible to short-read sequencing (Poznik et al. 2013). Then sites with depth across all samples <1900 or >11500 (corresponding to DP/2 or DP*3), or missing in more than 20% of the samples, were filtered. In individuals, alleles with DP<5 or GQ<30 were excluded and if multiple alleles were observed at a position, the fraction of reads supporting the called allele was required to be >0.8.

Genome-wide genotypes from the Nigerian samples were called using BCFtools version 1.6 (bcftools mpileup -C50 -q30 -Q30 | bcftools call -c), then merged with data from ~2,500 people genotyped on the Affymetrix Human Origins array (Patterson et al. 2012; Lazaridis et al. 2016). Principal Component Analysis (PCA) using genome-wide SNPs was performed using EIGENSOFT v7.2.1 (Patterson et al. 2006) and plotted using R (R Core Team 2017).

We inferred a maximum likelihood phylogeny of Y chromosomes using RAxML v8.2.10 (Stamatakis 2014) with the arguments “-m ASC_GTRGAMMA” and “--asc-corr=stamatakis”, using only variable sites with QUAL ≥1, and selecting the tree with the best likelihood from 100 runs, then replicating the tree 1000 times for bootstrap values. The tree was plotted using Interactive Tree Of Life (iTOL) v3 (Letunic and Bork 2016) and annotated with haplogroup names assigned using yHaplo (Poznik 2016) from SNPs reported by the International Society of Genetic Genealogy (ISOGG v11.01).

The ages of the internal nodes in the tree were estimated using the ρ statistic (Forster et al. 1996), the standard approach for the Y chromosome. We defined the ancestral state of a site by assigning alleles as ancestral when they were monomorphic in the nine samples belonging to the A and B haplogroups in our dataset. We then determined the age of a node as follows: Having an ancestral node leading to two clades, we select one sample from each clade and divide the number of derived variants found in the first sample but absent from the second, by the total number of sites having the ancestral state in both samples. We compare all possible pairs under a node and report the average value of divergence times in units of years by applying a point mutation rate of 0.76 × 10−9 mutations per site per year (Fu et al. 2014). We report 95% confidence intervals of the divergence times based on the 95% highest posterior density (HPD) when estimating the mutation rate (0.67-0.86 × 10-9) (Fu et al. 2014). This model assumes that mutations accumulated on the chromosomes in the different lineages at similar rates, and thus expects all individuals in our dataset to have comparable branch

3

lengths from the AB root. But we found considerable differences among individuals in the number of their derived mutations from the root. This heterogeneity in the accumulation of mutations has been previously reported (Scozzari et al. 2014; Barbieri et al. 2016) and appears to be haplogroup-specific (Figure S3), and therefore in our divergence time estimates, we calibrate all lineages to have identical branch length from the root, equal to the average branch length estimated from all individuals in our data set. We first calculated the average number of mutations which accumulated on the branches of all individuals in our dataset and found 768.59 derived mutations on average from the root (corresponding to ~100,000 years). We then derived a calibration coefficient α for each individual by dividing 768.59 by the normalized (in 10,000,000 bp) number of derived mutations an individual has accumulated from the root. And thus for calibrating the branches length between any two samples when calculating the split times, we multiply α by the number of derived variants found in the first sample but absent from the second.

Data Availability Statement

New sequence data from the Nigerian samples are available through the European Genome-phenome Archive (EGA) under study accession number EGAS00001002674, and for Tibetan samples under study accession number EGAS00001003500.

Three supplementary figures and two supplementary tables accompany this paper:

Figure S1 Y-chromosomal phylogeny as understood before the current study.

Figure S2 PCA of worldwide populations.

Figure S3 Number of mutations from the AB root.

Table S1. SNPs defining the D0 haplogroup.

Table S2. Split-time estimates using the ρ statistic.

RESULTS

Construction of a calibrated Y-chromosomal phylogeny

We constructed a series of phylogenetic trees based on all the Y-chromosomal sequences in our dataset, or subsets of them. All showed a consistent structure, in which the Nigerian DE* chromosomes formed a clade branching from the DE lineage close to the divergence of D and E chromosomes (Figure 1A) in comparison with a set of Y chromosomes representing most of the world (Figure 1B). The Nigerian chromosomes had 489 derived SNPs exclusive to their branch in addition to a large deletion spanning ~118,000 bp (Y:28,457,736-28,576,276). All DE-M154 chromosomes shared 29 SNPs.

The Nigerian chromosomes shared seven SNPs with other D chromosomes, one SNP with E chromosomes, one SNP with C1b2a chromosomes, and one SNP with an F2 chromosome (Table S1). The reads overlapping these SNPs were visually investigated using the Integrative Genomics Viewer (IGV) version 2.4.10 and seen to support the calls. We consider sharing of a single SNP as a recurrent mutation in different lineages and interpret the Nigerian chromosomes as lying on the D lineage, diverging from other D chromosomes at 71,400 years ago (Figure 1C), very soon after its divergence from E at 73,200 years ago. We name the lineage formed by the Nigerian samples D0, in order to reflect its position on the tree and avoid the need to rename all the other D lineages.

The three D0 chromosomes are distinguishable from one another, and have a coalescence time of ~2,500 years (Figure 1C), consistent with their collection from different villages, languages, ethnic backgrounds, and paternal birthplaces (Weale et al. 2003). The autosomal genomes of these individuals confirm their genetic ancestry as West Africans (Figure S2).

Models for the expansion of Y-chromosomal lineages out of Africa

The updated phylogeny including the D0 lineage adds two key pieces of information to the debate about the phylogeography of the Y lineages ~50,000-100,000 years ago and the mode of expansion out of Africa. First, it increases the number of relevant lineages at this early time period from four to five, and second, it provides a reliable timescale for the branching times of these lineages, and thus for the lineages in existence at any particular time point.

In the phylogeny (Figure 1A), the DE lineage now contains three, rather than two early sub-lineages: one exclusively African (D0), one mainly African (E), and one exclusively non-African (D). We therefore consider the implications of this revised structure for interpreting the present-day Y phylogeography as the result of male movements at different times between 28,000 and 100,000 years ago (Figure 2). In order to do this, we need to calibrate the phylogeny, and for this use the ancient-DNA-based mutation rate (Fu et al. 2014), which has been widely adopted (e.g. Poznik et al. 2016); we consider in the Discussion the implications of alternative mutation rates and some of the other simplifying assumptions we make here.

We consider three scenarios based on our split-time estimates of the Y-chromosomal lineages (Table S2). First, between 101,000 years ago (divergence of the B and CT lineages) and 77,000 years ago (divergence of the DE and CF lineages) only one lineage with present-day non-African descendants is present in the phylogeny (CT;

4

Figure 2A), so present-day Y lineage distributions could be explained by migration of the single lineage CT out of Africa, followed by back-migration of the D0 and E lineages between 71,000 years ago (origin of D0) and 59,000 years ago (divergence of E within Africa) (Figure 2B). This and all other scenarios require migration out of E-M35 after 47,000 years ago (its origin) and before 28,500 years ago (its divergence) in order to explain its presence outside Africa (Figure 2B-D). Second, between 76,000 years ago (divergence between C and FT) and 73,000 years ago (divergence between D and E), three relevant lineages are present (the C, DE and FT lineages, Figure 2A), so migration out of these three followed by back-migration of D0 and E as above (Figure 2C) would explain the distributions. Third, between 71,000 years ago (split of D and of D0) and 57,000 years ago (divergence within FT), five relevant lineages are present, and migration out of three of these (C, D and FT) would explain the present-day distributions without requiring back-migration (Figure 2D). For simplicity, we do not include the short intervals between these three scenarios of 500 years and 1,800 years (Figure 2A, Table S2).

DISCUSSION

The new D0 data presented in this work are based on just three Y chromosomes, but have far-ranging implications for the structure of the Y-chromosomal phylogeny and hence male movements and migration out of Africa more generally. Our phylogenetic results are consistent with three scenarios (Figure 2B-D), and we now consider some of the complexities associated with these, and how they fit with non-genetic data.

Complexities arise because although the phylogenetic structure, including the branching order, is very robust (Wei et al. 2013; Hallast et al. 2015; Karmin et al. 2015; Poznik et al. 2016), its calibration depends entirely on the mutation rate used. The mutation rate chosen above, based on the number of mutations ‘missing’ in a 45,000-year-old Siberian Y chromosome (Fu et al. 2014), has been widely adopted (Poznik et al. 2016; Balanovsky 2017), but a large-scale study of Icelandic pedigrees encompassing the last few centuries suggested a rate about 14% faster (Helgason et al. 2015). This faster mutation rate would translate directly into 14% more recent time estimates so that, for example, the migrations out of Africa in the three scenarios presented above would be 87,000-66,000, 65,000-63,000 and 61,000-49,000 years ago, respectively. These differences between mutation rates inferred in different ways may be seen within the context of a wider debate about human mutation rates, previously based largely on autosomal data (Scally and Durbin 2012). Each mutation rate is also accompanied by its own uncertainty, leading to the 95% confidence intervals in Table S2, which

include the mutation rate uncertainty. We also assume that the mutation rate is constant over time and does not differ between lineages. The first assumption is very reasonable for the time period of most interest here, 50,000-60,000 years, when the mutation rate averaged over 45,000 years (Fu et al. 2014) is used. A flexible mutation rate that assumed a real increase in recent times would not influence these estimates since the Fu et al. rate already includes the last few centuries. Differences in mutation rate between lineages need further investigation, but would not be sufficient to affect the scenarios presented in Figure 2. For these reasons, we believe that the Fu et al rate, averaged over 45,000 years, is the appropriate one to use for the times of interest here.

These genetic times can be compared with dates from non-genetic sources for modern humans outside Africa. The 45,000-year-old Siberian fossil (Fu et al. 2014) was reliably dated using carbon-14, while a ~43,000-year-old fragment of human maxilla from the Kent’s Cavern site in the UK was dated using Bayesian modelling of stratigraphic, chronological and archaeological data (Higham et al. 2011). Archaeological deposits at Boodie Cave in Australia were dated to ~50,000 years ago using optically-stimulated luminescence (Veth et al. 2017). Thus there is strong support for the widespread presence of modern humans outside Africa 45,000-50,000 years ago. Earlier dates have also been reported, for example the Madjedbebe rock shelter in northern Australia dated by optically-stimulated luminescence to at least 65,000 years ago (Clarkson et al. 2017), a modern human cranium from Tam Pa Ling, Laos was dated by Uranium-Thorium to ~63,000 years ago (Demeter et al. 2012) and 80 teeth from Fuyan Cave in southern China dated using the same method to 80,000-120,000 years ago (Liu et al. 2015), raising the possibility of a substantially earlier exit (Bae et al. 2017). Such early archaeological dates also, however, raise the question of whether or not the humans associated with them contributed genetically to present-day populations (Mallick et al. 2016; Pagani et al. 2016). Archaeological data alone therefore do not provide an unequivocal date for the migration of the ancestors of present-day humans out of Africa.

All non-Africans carry around 2% Neanderthal DNA in their genomes (Green et al. 2010), and Neanderthal fossils have only been reported outside Africa. The geographical distribution of Neanderthals thus suggests that the mixing probably occurred outside Africa, and the ubiquitous presence of Neanderthal DNA in present-day non-Africans is most easily explained if the mixing took place once, soon after the migration out. This mixing has been dated with some precision using the length of the introgressed segments in the 45,000-year-old (43,210-46,880 years) Siberian (Ust’-Ishim) to 232-430 generations before he lived, i.e. 49,900-59,400 years ago

5

assuming a generation time of 29 years (Fu et al. 2014). If this date represented the time of the migration out of Africa, it would exclude the first two scenarios (Figure 2B and 2C). Thus the combination of Y phylogenetic structure and dating of the out-of-Africa migration based on the 45,000-year-old Siberian fossil (Fu et al. 2014) favors the third scenario (Figure 2D) involving the migration out of C, D and FT between 50,300 years ago (lower bound of the FT diversification, Table S2) and 59,400 years ago (upper bound of the introgression; see Figure 3) which is in accordance with suggested models incorporating an African origin of the DE lineages (Underhill et al. 2001; Underhill and Roseman 2001). According to this interpretation, the reported Tibetan DE* chromosomes (Shi et al. 2008) would most likely represent back-mutations or genotyping errors at the one SNP used to define haplogroup D, but require further investigation.

mtDNA sequences also provide a robust phylogeny which demonstrates that non-African mtDNAs descend from a single African branch with rapid diversification outside Africa into the M and N lineages and many subsequent branches (Ingman et al. 2000; Deviese et al. 2019). Dating using ancient mtDNA suggests a separation of non-African from African lineages after 62,000-95,000 years ago (Fu et al. 2013), while an analysis of present-day mtDNAs suggested divergence outside Africa 57,000-65,000 years ago (Fernandes et al. 2012). These estimates are based on <1% of the sequence length used from the Y chromosome, but are nonetheless very consistent. This Discussion has thus far assumed that present-day distributions of Y haplogroups are relevant to events 50,000-100,000 years ago and thus that Y phylogeography carries information informative about the major migration out of Africa. Ancient population structure within Africa that segregated C, D and FT from other Y haplogroups beginning after 76,000 years ago with migration out only 50,000-59,000 years ago would also fit the evidence presented above. Present-day Y-chromosomal structure in Africa has been massively re-shaped by events in the last 10,000 years, including the Bantu-speaker expansion in central and southern Africa (Poznik et al. 2016; Patin et al. 2017) and entry of

Eurasian lineages into northern and central Africa (Haber et al. 2016; D'atanasio et al. 2018), and is thus a poor guide to structure before 10,000 years ago. Despite this, it is striking that western central Africa is the location of the deepest-rooting A00 lineage in Cameroon (Mendez et al. 2013), a major location of the A0 lineage in Cameroon, The Gambia and Ghana (Scozzari et al. 2014; Poznik et al. 2016) and the D0 lineage in Nigeria and Guinea-Bissau (Weale et al. 2003; Rosa et al. 2007). This retention of the deepest Y-chromosomal diversity in western central Africa contrasts with the autosomal genetic structure, where the deepest roots have been reported in southern African hunter-gatherers (Gronau et al. 2011; Schlebusch et al. 2012; Veeramah et al. 2012; Mallick et al. 2016; Schlebusch et al. 2017; Skoglund et al. 2017), perhaps supporting the hypothesis of deep population subdivision (Henn et al. 2018; Scerri et al. 2018). Analysis of ancient African DNA from 50,000-100,000 years ago would provide conclusive information on Y haplogroup distributions at this time, but is not currently available. In the meantime, further focus on present-day Y-chromosomal lineages in central and western Africa in order to understand more about deep African lineages seems warranted, and the current study illustrates the broad insights that can sometimes be revealed by very rare lineages.

In conclusion, sequencing of the D0 Y chromosomes and placing them on a calibrated Y-chromosomal phylogeny identifies the most likely model of Y-chromosomal exit from Africa: an origin of the DE lineage inside Africa and expansion out of the C, D and FT lineages. It suggests an exit time interval that overlaps with the time of Neanderthal admixture estimated from autosomal analyses, and slightly refines it. These findings are consistent with a shared history of Y chromosomes and autosomes, and illustrate how study of Y lineages may lead to general new insights.

ACKNOWLEDGEMENTS

We thank the sample donors for making this work possible, and the Sanger DNAP Bespoke Sequencing Team for especial efforts in generating sequences from small amounts of DNA. Our work was supported by Wellcome (098051).

LITERATURE CITED

Altheide, T. K., and M. F. Hammer, 1997 Evidence for a possible Asian origin of YAP+ Y chromosomes. Am. J. Hum. Genet. 61: 462-466.

Bae, C. J., K. Douka and M. D. Petraglia, 2017 On the origin of modern humans: Asian perspectives. Science 358.

Balanovsky, O., 2017 Toward a consensus on SNP and STR mutation rates on the human Y-chromosome. Hum. Genet. 136: 575-590.

Barbieri, C., A. Hubner, E. Macholdt, S. Ni, S. Lippold et al., 2016 Refining the Y chromosome phylogeny with southern African sequences. Hum Genet 135: 541-553.

Bravi, C. M., G. Bailliet, V. L. Martinez-Marignac and N. O. Bianchi, 2000 Origin of YAP+ lineages of the

6

human Y-chromosome. Am. J. Phys. Anthropol. 112: 149-158.

Clarkson, C., Z. Jacobs, B. Marwick, R. Fullagar, L. Wallis et al., 2017 Human occupation of northern Australia by 65,000 years ago. Nature 547: 306-310.

D'Atanasio, E., B. Trombetta, M. Bonito, A. Finocchio, G. Di Vito et al., 2018 The peopling of the last Green Sahara revealed by high-coverage resequencing of trans-Saharan patrilineages. Genome Biol. 19: 20.

Demeter, F., L. L. Shackelford, A. M. Bacon, P. Duringer, K. Westaway et al., 2012 Anatomically modern human in Southeast Asia (Laos) by 46 ka. Proc Natl Acad Sci U S A 109: 14375-14380.

Deviese, T., D. Massilani, S. Yi, D. Comeskey, S. Nagel et al., 2019 Compound-specific radiocarbon dating and mitochondrial DNA analysis of the Pleistocene hominin from Salkhit Mongolia. Nat Commun 10: 274.

Fernandes, V., F. Alshamali, M. Alves, M. D. Costa, J. B. Pereira et al., 2012 The Arabian cradle: mitochondrial relicts of the first steps along the southern route out of Africa. Am J Hum Genet 90: 347-355.

Forster, P., R. Harding, A. Torroni and H. J. Bandelt, 1996 Origin and evolution of Native American mtDNA variation: a reappraisal. Am. J. Hum. Genet. 59: 935-945.

Fu, Q., H. Li, P. Moorjani, F. Jay, S. M. Slepchenko et al., 2014 Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514: 445-449.

Fu, Q., A. Mittnik, P. L. F. Johnson, K. Bos, M. Lari et al., 2013 A revised timescale for human evolution based on ancient mitochondrial genomes. Curr Biol 23: 553-559.

Garrison, E., and G. Marth, 2012 Haplotype-based variant detection from short-read sequencing. arXiv.

Green, R. E., J. Krause, A. W. Briggs, T. Maricic, U. Stenzel et al., 2010 A draft sequence of the Neandertal genome. Science 328: 710-722.

Gronau, I., M. J. Hubisz, B. Gulko, C. G. Danko and A. Siepel, 2011 Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet. 43: 1031-1034.

Haber, M., M. Mezzavilla, A. Bergstrom, J. Prado-Martinez, P. Hallast et al., 2016 Chad genetic diversity reveals an African history marked by multiple Holocene Eurasian migrations. Am. J. Hum. Genet. 99: 1316-1324.

Hallast, P., C. Batini, D. Zadik, P. Maisano Delser, J. H. Wetton et al., 2015 The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs

covering the majority of known clades. Mol. Biol. Evol. 32: 661-673.

Hammer, M. F., T. Karafet, A. Rasanayagam, E. T. Wood, T. K. Altheide et al., 1998 Out of Africa and back again: nested cladistic analysis of human Y chromosome variation. Mol. Biol. Evol. 15: 427-441.

Helgason, A., A. W. Einarsson, V. B. Guethmundsdottir, A. Sigurethsson, E. D. Gunnarsdottir et al., 2015 The Y-chromosome point mutation rate in humans. Nat. Genet. 47: 453-457.

Henn, B. M., T. E. Steele and T. D. Weaver, 2018 Clarifying distinct models of modern human origins in Africa. Curr. Opin. Genet. Dev. 53: 148-156.

Higham, T., T. Compton, C. Stringer, R. Jacobi, B. Shapiro et al., 2011 The earliest evidence for anatomically modern humans in northwestern Europe. Nature 479: 521-524.

Ingman, M., H. Kaessmann, S. Paabo and U. Gyllensten, 2000 Mitochondrial genome variation and the origin of modern humans. Nature 408: 708-713.

Karmin, M., L. Saag, M. Vicente, M. A. Wilson Sayres, M. Jarve et al., 2015 A recent bottleneck of Y chromosome diversity coincides with a global change in culture. Genome Res. 25: 459-466.

Lazaridis, I., D. Nadel, G. Rollefson, D. C. Merrett, N. Rohland et al., 2016 Genomic insights into the origin of farming in the ancient Near East. Nature 536: 419-424.

Letunic, I., and P. Bork, 2016 Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 44: W242-245.

Liu, W., M. Martinon-Torres, Y. J. Cai, S. Xing, H. W. Tong et al., 2015 The earliest unequivocally modern humans in southern China. Nature 526: 696-699.

Mallick, S., H. Li, M. Lipson, I. Mathieson, M. Gymrek et al., 2016 The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538: 201-206.

Mendez, F. L., T. Krahn, B. Schrack, A. M. Krahn, K. R. Veeramah et al., 2013 An African American paternal lineage adds an extremely ancient root to the human Y chromosome phylogenetic tree. Am. J. Hum. Genet. 92: 454-459.

Mondal, M., A. Bergström, Y. Xue, F. Calafell, H. Laayouni et al., 2017 Y-chromosomal sequences of diverse Indian populations and the ancestry of the Andamanese. Hum. Genet. 136: 499-510.

Pagani, L., D. J. Lawson, E. Jagoda, A. Morseburg, A. Eriksson et al., 2016 Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538: 238-242.

Patin, E., M. Lopez, R. Grollemund, P. Verdu, C. Harmant et al., 2017 Dispersals and genetic adaptation of

7

Bantu-speaking populations in Africa and North America. Science 356: 543-546.

Patterson, N., P. Moorjani, Y. Luo, S. Mallick, N. Rohland et al., 2012 Ancient admixture in human history. Genetics 192: 1065-1093.

Patterson, N., A. L. Price and D. Reich, 2006 Population structure and eigenanalysis. PLoS Genet 2: e190.

Poznik, G. D., 2016 Identifying Y-chromosome haplogroups in arbitrarily large samples of sequenced or genotyped men. bioRxiv.

Poznik, G. D., B. M. Henn, M. C. Yee, E. Sliwerska, G. M. Euskirchen et al., 2013 Sequencing Y chromosomes resolves discrepancy in time to common ancestor of males versus females. Science 341: 562-565.

Poznik, G. D., Y. Xue, F. L. Mendez, T. F. Willems, A. Massaia et al., 2016 Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nat. Genet. 48: 593-599.

R Core Team, 2017 R: A language and environment for statistical computing, pp. R Foundation for Statistical Computing, Vienna, Austria.

Rosa, A., C. Ornelas, M. A. Jobling, A. Brehm and R. Villems, 2007 Y-chromosomal diversity in the population of Guinea-Bissau: a multiethnic perspective. BMC Evol. Biol. 7: 124.

Scally, A., and R. Durbin, 2012 Revising the human mutation rate: implications for understanding human evolution. Nat. Rev. Genet. 13: 745-753.

Scerri, E. M. L., M. G. Thomas, A. Manica, P. Gunz, J. T. Stock et al., 2018 Did our species evolve in subdivided populations across Africa, and why does it matter? Trends Ecol. Evol. 33: 582-594.

Schlebusch, C. M., H. Malmstrom, T. Gunther, P. Sjodin, A. Coutinho et al., 2017 Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago. Science 358: 652-655.

Schlebusch, C. M., P. Skoglund, P. Sjodin, L. M. Gattepaille, D. Hernandez et al., 2012 Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science 338: 374-379.

Scozzari, R., A. Massaia, B. Trombetta, G. Bellusci, N. M. Myres et al., 2014 An unbiased resource of novel SNP markers provides a new chronology for the human Y chromosome and reveals a deep phylogenetic structure in Africa. Genome Res. 24: 535-544.

Shi, H., H. Zhong, Y. Peng, Y. L. Dong, X. B. Qi et al., 2008 Y chromosome evidence of earliest modern human settlement in East Asia and multiple origins of Tibetan and Japanese populations. BMC Biol. 6: 45.

Skoglund, P., J. C. Thompson, M. E. Prendergast, A. Mittnik, K. Sirak et al., 2017 Reconstructing prehistoric African population structure. Cell 171: 59-71 e21.

Stamatakis, A., 2014 RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30: 1312-1313.

Underhill, P. A., G. Passarino, A. A. Lin, P. Shen, M. Mirazon Lahr et al., 2001 The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann. Hum. Genet. 65: 43-62.

Underhill, P. A., and C. C. Roseman, 2001 The case for an African rather than an Asian origin of the human Y-chromosome YAP insertion, pp. 43-56 in Genetic, Linguistic and Archaeological Perspectives on Human Diversity in Southeast Asia: Recent Advances in Human Biology, edited by L. Jin, M. Seielstad and C. Xiao. World Scientific Publishing, Singapore.

van Oven, M., and M. Kayser, 2009 Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30: E386-394.

Veeramah, K. R., D. Wegmann, A. Woerner, F. L. Mendez, J. C. Watkins et al., 2012 An early divergence of KhoeSan ancestors from those of other modern humans is supported by an ABC-based analysis of autosomal resequencing data. Mol. Biol. Evol. 29: 617-630.

Veth, P., I. Ward, T. Manne, S. Ulm, K. Ditchfield et al., 2017 Early human occupation of a maritime desert, Barrow Island, North-West Australia. Quat. Sci. Rev. 168: 19-29.

Weale, M. E., T. Shah, A. L. Jones, J. Greenhalgh, J. F. Wilson et al., 2003 Rare deep-rooting Y chromosome lineages in humans: lessons for phylogeography. Genetics 165: 229-234.

Wei, W., Q. Ayub, Y. Chen, S. McCarthy, Y. Hou et al., 2013 A calibrated human Y-chromosomal phylogeny based on resequencing. Genome Res. 23: 388-395.

Xue, Y., T. Zerjal, W. Bao, S. Zhu, Q. Shu et al., 2006 Male demography in East Asia: a north-south contrast in human population expansion times. Genetics 172: 2431-2439.

8

Figure 1 Y Chromosome phylogenetic tree from worldwide samples. (A) A maximum-likelihood tree of 180 Y Chromosome sequences from worldwide populations. Different branch colours and symbols represent different haplogroups assigned based on ISOGG v11.01. The Nigerian chromosomes sequenced in this study are highlighted in blue and assigned to the novel D0 haplogroup. Bootstrap values from 1000 replications are shown on the branches. (B) Map showing location of the studied individuals with coloured symbols reflecting the haplogroups assigned in Figure 1A. The clade consisting of the D0 and D haplogroups is represented by blue squares and is observed in Africa and East Asia. (C) Ages of the nodes leading to haplogroup D0 in the phylogenetic tree (point estimates; branch lengths are not to scale). Haplogroups D0 and D are estimated to have split 71,400 (63,100-81,000) years ago while the D0 individuals in this study coalesced 2,500 (2,200-2,800) years ago.

Figure 2 Models for the early movements of Y-chromosomal haplogroups out of Africa and back. (A) Simplified Y-chromosomal phylogeny showing the key lineages, including D0. Lineages currently located in Africa are coloured yellow; those currently outside Africa are blue. Triangle widths are not meaningful, except that they show that E and FT are the predominant lineages inside and outside Africa, respectively. The small orange triangle in FT represents the R1b-V88 back-to-Africa migration that took place after the time period considered here. (B-D) Models for lineage movements that could lead to the present-day African or non-African distributions of lineages, using point estimates of dates derived from the phylogeny (see Table S2). The three models represent migrations out of Africa at different

9

time intervals, indicated by the purple, brown and green shading in A. Arrows in B-D indicate inter-continental movements and their direction, but do not represent particular locations or routes. The first coloured arrow(s) represent the lineage(s) that migrated out, during the time period shown at the top of the maps. Additional uncoloured arrows represent subsequent migrations and their time intervals needed to produce the present-day distributions.

Figure 3 Estimation of the time of the out-of-Africa migration incorporating information from Y-chromosomal lineages (green, this work), archaeological dates (brown, Fu et al. 2014) and ancient DNA (red, Fu et al. 2014).

Figure S1 Y-chromosomal phylogeny as understood before the current study. This simplified phylogeny shows the lineages discussed in the Introduction as relevant to the understanding of the expansion of Y chromosomes out of Africa. Lineages currently located in Africa are coloured yellow; those currently outside Africa are blue.

10

Figure S2 PCA of worldwide populations. (A) PCA using ~600K genome-wide SNPs from global populations shows the Nigerian D0 individuals cluster with sub-Saharan Africans. (B) Magnified area of the PCA containing the Nigerian samples and other genetically close populations.

Figure S3 Number of mutations from the AB root. Violin plots show heterogeneity in the number of mutations that have accumulated within different Y haplogroups.

11

Table S1. SNPs defining the D0 haplogroup

CHR POS REF ALT Ancestral Allele

Haplogroup D0

Haplogroup D

Haplogroup E Notes

Y 2668496 G T G T G G SNP on branch D0

Y 2684790 G A,T G A G G SNP on branch D0

Y 2746443 T G T G T T SNP on branch D0

Y 2759823 C A,G C G C C SNP on branch D0

Y 2763977 T C,G T G T T SNP on branch D0

Y 2774691 C T C T C C SNP on branch D0

Y 2794108 C G C G C C SNP on branch D0

Y 2824252 G A G A G G SNP on branch D0

Y 2848910 A G A G A A SNP on branch D0

Y 2850963 C T C T C C SNP on branch D0

Y 2853798 C A C A C C SNP on branch D0

Y 6676225 C T C T C C SNP on branch D0

Y 6708971 G A G A G G SNP on branch D0

Y 6721247 C A,T C T C C SNP on branch D0

Y 6747175 C T C T C C SNP on branch D0

Y 6747201 T A T A T T SNP on branch D0

Y 6753316 C A,G C G C C SNP on branch D0

Y 6797098 A G A G A A SNP on branch D0

Y 6821366 T C T C T T SNP on branch D0

Y 6831372 A C,G A G A A SNP on branch D0

Y 6868284 C G C G C C SNP on branch D0

Y 6889983 C T C T C C SNP on branch D0

Y 6893436 G A G A G G SNP on branch D0

Y 6895235 G A,C G A G G SNP on branch D0

Y 6917321 T C,G T C T T SNP on branch D0

Y 6941294 A T A T A A SNP on branch D0

Y 6949806 G A G A G G SNP on branch D0

Y 7011465 G A G A G G SNP on branch D0

Y 7034919 C A C A C C SNP on branch D0

Y 7079439 G A G A G G SNP on branch D0

Y 7188219 A T A T A A SNP on branch D0

Y 7210681 C T C T C C SNP on branch D0

Y 7242606 G C G C G G SNP on branch D0

Y 7255622 A T A T A A SNP on branch D0

Y 7265843 T C T C T T SNP on branch D0

Y 7270582 G T G T G G SNP on branch D0

Y 7273551 G A G A G G SNP on branch D0

Y 7277533 A G A G A A SNP on branch D0

Y 7341810 C T C T C C SNP on branch D0

Y 7347990 A C,G A G A A SNP on branch D0

Y 7378003 A G A G A A SNP on branch D0

Y 7410122 C T C T C C SNP on branch D0

Y 7570251 A G A G A A SNP on branch D0

Y 7578738 G A,C G A G G SNP on branch D0

Y 7579824 G C G C G G SNP on branch D0

Y 7582262 G T G T G G SNP on branch D0

Y 7620047 A G A G A A SNP on branch D0

12

Y 7622045 C T C T C C SNP on branch D0

Y 7678762 A T A T A A SNP on branch D0

Y 7694383 A C A C A A SNP on branch D0

Y 7716819 A T A T A A SNP on branch D0

Y 7774460 G A G A G G SNP on branch D0

Y 7774466 A C A C A A SNP on branch D0

Y 7774495 G A G A G G SNP on branch D0

Y 7801104 C G C G C C SNP on branch D0

Y 7851738 C G,T C G C C SNP on branch D0

Y 7876921 A G A G A A SNP on branch D0

Y 7893690 C A C A C C SNP on branch D0

Y 7901352 G T G T G G SNP on branch D0

Y 7929183 G T G T G G SNP on branch D0

Y 7982853 A G A G A A SNP on branch D0

Y 8016678 C T C T C C SNP on branch D0

Y 8058955 G A G A G G SNP on branch D0

Y 8060600 A G A G A A SNP on branch D0

Y 8063190 G C,T G T G G SNP on branch D0

Y 8081026 A G A G A A SNP on branch D0

Y 8088679 C G C G C C SNP on branch D0

Y 8090376 A G A G A A SNP on branch D0

Y 8095674 A G A G A A SNP on branch D0

Y 8121376 G A G A G G SNP on branch D0

Y 8155013 C G C G C C SNP on branch D0

Y 8156277 G A G A G G SNP on branch D0

Y 8174610 G A G A G G SNP on branch D0

Y 8184538 A G A G A A SNP on branch D0

Y 8200057 T C T C T T SNP on branch D0

Y 8226806 A G,T A T A A SNP on branch D0

Y 8227145 T A,C T C T T SNP on branch D0

Y 8240623 C A,T C T C C SNP on branch D0

Y 8254941 A C A C A A SNP on branch D0

Y 8280068 C T C T C C SNP on branch D0

Y 8310894 A G A G A A SNP on branch D0

Y 8323918 G T G T G G SNP on branch D0

Y 8329083 T C T C T T SNP on branch D0

Y 8343705 G A G A G G SNP on branch D0

Y 8356071 A C A C A A SNP on branch D0

Y 8361702 C T C T C C SNP on branch D0

Y 8364936 T C T C T T SNP on branch D0

Y 8372651 A G A G A A SNP on branch D0

Y 8445499 C G C G C C SNP on branch D0

Y 8465733 C T C T C C SNP on branch D0

Y 8466101 C T C T C C SNP on branch D0

Y 8467167 G A G A G G SNP on branch D0

Y 8503452 C G C G C C SNP on branch D0

Y 8538145 G T G T G G SNP on branch D0

Y 8575106 T A T A T T SNP on branch D0

Y 8584879 A G A G A A SNP on branch D0

Y 8602076 A C A C A A SNP on branch D0

13

Y 8691053 A C A C A A SNP on branch D0

Y 8722220 C T C T C C SNP on branch D0

Y 8793332 C T C T C C SNP on branch D0

Y 8794276 T G T G T T SNP on branch D0

Y 8797330 G A G A G G SNP on branch D0

Y 8824517 T C,G T C T T SNP on branch D0

Y 8839643 G A G A G G SNP on branch D0

Y 8842697 C T C T C C SNP on branch D0

Y 8856116 C T C T C C SNP on branch D0

Y 8859409 G A G A G G SNP on branch D0

Y 8901707 A T A T A A SNP on branch D0

Y 8989907 G C G C G G SNP on branch D0

Y 9017409 G C G C G G SNP on branch D0

Y 9025308 C A C A C C SNP on branch D0

Y 9037457 G A G A G G SNP on branch D0

Y 9038621 A G A G A A SNP on branch D0

Y 9143448 G A G A G G SNP on branch D0

Y 9393586 A G A G A A SNP on branch D0

Y 9446396 G C G C G G SNP on branch D0

Y 9799632 C G C G C C SNP on branch D0

Y 9806232 A C,T A T A A SNP on branch D0

Y 9832852 T A,C,G T G T T SNP on branch D0

Y 9836568 T G T G T T SNP on branch D0

Y 9869610 C T C T C C SNP on branch D0

Y 13888821 C T C T C C SNP on branch D0

Y 13930633 A C A C A A SNP on branch D0

Y 13992933 T G T G T T SNP on branch D0

Y 13994915 A T A T A A SNP on branch D0

Y 14002365 G A G A G G SNP on branch D0

Y 14003204 G A G A G G SNP on branch D0

Y 14021992 G A G A G G SNP on branch D0

Y 14053839 C T C T C C SNP on branch D0

Y 14081375 C T C T C C SNP on branch D0

Y 14097346 G A G A G G SNP on branch D0

Y 14103822 T C T C T T SNP on branch D0

Y 14109752 C T C T C C SNP on branch D0

Y 14134168 G A,C G A G G SNP on branch D0

Y 14141253 G A G A G G SNP on branch D0

Y 14179641 C T C T C C SNP on branch D0

Y 14187102 T C T C T T SNP on branch D0

Y 14194942 C A C A C C SNP on branch D0

Y 14208960 C T C T C C SNP on branch D0

Y 14211461 A G A G A A SNP on branch D0

Y 14235752 G A G A G G SNP on branch D0

Y 14237523 T C T C T T SNP on branch D0

Y 14245933 A C,T A T A A SNP on branch D0

Y 14246908 G A G A G G SNP on branch D0

Y 14248863 G T G T G G SNP on branch D0

Y 14257701 C G C G C C SNP on branch D0

Y 14260577 A T A T A A SNP on branch D0

14

Y 14280476 T C T C T T SNP on branch D0

Y 14288503 T C T C T T SNP on branch D0

Y 14288505 A T A T A A SNP on branch D0

Y 14305631 G C G C G G SNP on branch D0

Y 14312629 A G A G A A SNP on branch D0

Y 14318958 T C T C T T SNP on branch D0

Y 14330845 G A G A G G SNP on branch D0

Y 14374090 A T A T A A SNP on branch D0

Y 14374809 G A G A G G SNP on branch D0

Y 14399354 G A G A G G SNP on branch D0

Y 14414388 G T G T G G SNP on branch D0

Y 14419460 G C,T G C G G SNP on branch D0

Y 14450770 C T C T C C SNP on branch D0

Y 14500439 G A G A G G SNP on branch D0

Y 14525419 C T C T C C SNP on branch D0

Y 14585420 C G C G C C SNP on branch D0

Y 14609328 G A,C G A G G SNP on branch D0

Y 14666408 G A G A G G SNP on branch D0

Y 14670529 C T C T C C SNP on branch D0

Y 14719988 G A G A G G SNP on branch D0

Y 14747047 C T C T C C SNP on branch D0

Y 14749175 A G A G A A SNP on branch D0

Y 14779913 T C T C T T SNP on branch D0

Y 14804607 T C T C T T SNP on branch D0

Y 14871358 G T G T G G SNP on branch D0

Y 14880739 G A G A G G SNP on branch D0

Y 14899954 G C G C G G SNP on branch D0

Y 14906414 C A C A C C SNP on branch D0

Y 14932294 T A T A T T SNP on branch D0

Y 14937914 T C T C T T SNP on branch D0

Y 14939843 C T C T C C SNP on branch D0

Y 14958661 T G T G T T SNP on branch D0

Y 14968252 G C G C G G SNP on branch D0

Y 14988080 C A C A C C SNP on branch D0

Y 15008294 G A,T G A G G SNP on branch D0

Y 15027821 A C A C A A SNP on branch D0

Y 15091079 A G A G A A SNP on branch D0

Y 15093392 T C T C T T SNP on branch D0

Y 15134900 A C,T A T A A SNP on branch D0

Y 15156004 G A G A G G SNP on branch D0

Y 15263809 G A G A G G SNP on branch D0

Y 15267283 C T C T C C SNP on branch D0

Y 15304575 T C T C T T SNP on branch D0

Y 15329617 C A C A C C SNP on branch D0

Y 15346240 C T C T C C SNP on branch D0

Y 15359753 C T C T C C SNP on branch D0

Y 15361848 G A G A G G SNP on branch D0

Y 15386553 C T C T C C SNP on branch D0

Y 15405581 A G A G A A SNP on branch D0

Y 15407380 C G C G C C SNP on branch D0

15

Y 15409664 T C T C T T SNP on branch D0

Y 15421012 C T C T C C SNP on branch D0

Y 15430391 T C T C T T SNP on branch D0

Y 15454212 T C T C T T SNP on branch D0

Y 15485975 T C T C T T SNP on branch D0

Y 15492613 A G A G A A SNP on branch D0

Y 15556620 G A G A G G SNP on branch D0

Y 15565074 C G C G C C SNP on branch D0

Y 15569204 T C T C T T SNP on branch D0

Y 15573158 G A G A G G SNP on branch D0

Y 15576256 T A,C T C T T SNP on branch D0

Y 15586075 T C,G T C T T SNP on branch D0

Y 15590430 C A C A C C SNP on branch D0

Y 15606025 C T C T C C SNP on branch D0

Y 15649155 C T C T C C SNP on branch D0

Y 15650150 A C,G A G A A SNP on branch D0

Y 15650584 C G C G C C SNP on branch D0

Y 15654719 A G A G A A SNP on branch D0

Y 15654720 G A G A G G SNP on branch D0

Y 15720552 G T G T G G SNP on branch D0

Y 15728677 T C T C T T SNP on branch D0

Y 15741085 G A,C G A G G SNP on branch D0

Y 15753305 G A G A G G SNP on branch D0

Y 15759024 T C T C T T SNP on branch D0

Y 15773171 C T C T C C SNP on branch D0

Y 15813009 C A C A C C SNP on branch D0

Y 15857645 G A G A G G SNP on branch D0

Y 15877770 A C,G A G A A SNP on branch D0

Y 15904335 A G A G A A SNP on branch D0

Y 15915838 G A G A G G SNP on branch D0

Y 15919672 T C T C T T SNP on branch D0

Y 15928533 A C,G A G A A SNP on branch D0

Y 15940113 C T C T C C SNP on branch D0

Y 15949650 T C T C T T SNP on branch D0

Y 15976108 C T C T C C SNP on branch D0

Y 15995957 C T C T C C SNP on branch D0

Y 15997967 C A,T C A C C SNP on branch D0

Y 16207875 T C T C T T SNP on branch D0

Y 16215815 C T C T C C SNP on branch D0

Y 16247178 C T C T C C SNP on branch D0

Y 16250850 T A,C T C T T SNP on branch D0

Y 16285874 C T C T C C SNP on branch D0

Y 16304373 C A,T C T C C SNP on branch D0

Y 16312046 A G A G A A SNP on branch D0

Y 16351218 C T C T C C SNP on branch D0

Y 16362847 G A G A G G SNP on branch D0

Y 16398835 G C G C G G SNP on branch D0

Y 16428359 C T C T C C SNP on branch D0

Y 16435213 A C A C A A SNP on branch D0

Y 16500555 A C A C A A SNP on branch D0

16

Y 16504575 T A T A T T SNP on branch D0

Y 16508626 G A G A G G SNP on branch D0

Y 16547686 G A G A G G SNP on branch D0

Y 16578892 A G A G A A SNP on branch D0

Y 16648170 A T A T A A SNP on branch D0

Y 16688265 G A G A G G SNP on branch D0

Y 16756545 C A C A C C SNP on branch D0

Y 16766202 G T G T G G SNP on branch D0

Y 16790529 C A C A C C SNP on branch D0

Y 16801665 C A,T C T C C SNP on branch D0

Y 16835918 T G T G T T SNP on branch D0

Y 16852006 C A C A C C SNP on branch D0

Y 16874041 A T A T A A SNP on branch D0

Y 16929836 G T G T G G SNP on branch D0

Y 16972336 C A,T C T C C SNP on branch D0

Y 16975173 C T C T C C SNP on branch D0

Y 16981686 G T G T G G SNP on branch D0

Y 16982518 T C T C T T SNP on branch D0

Y 16999327 A C A C A A SNP on branch D0

Y 17070326 C T C T C C SNP on branch D0

Y 17098346 C T C T C C SNP on branch D0

Y 17111150 C T C T C C SNP on branch D0

Y 17131671 G T G T G G SNP on branch D0

Y 17152734 G T G T G G SNP on branch D0

Y 17221253 G A G A G G SNP on branch D0

Y 17232821 A G A G A A SNP on branch D0

Y 17238111 G A G A G G SNP on branch D0

Y 17277598 A G A G A A SNP on branch D0

Y 17300065 C A,T C T C C SNP on branch D0

Y 17323499 G C,T G C G G SNP on branch D0

Y 17372204 C T C T C C SNP on branch D0

Y 17420429 G A G A G G SNP on branch D0

Y 17426707 T C T C T T SNP on branch D0

Y 17465843 G A G A G G SNP on branch D0

Y 17496835 G A,C G A G G SNP on branch D0

Y 17530902 A C A C A A SNP on branch D0

Y 17546996 C T C T C C SNP on branch D0

Y 17569596 G A G A G G SNP on branch D0

Y 17572599 G C G C G G SNP on branch D0

Y 17583888 G T G T G G SNP on branch D0

Y 17600941 T A T A T T SNP on branch D0

Y 17605224 C A C A C C SNP on branch D0

Y 17611578 A G A G A A SNP on branch D0

Y 17612716 C A C A C C SNP on branch D0

Y 17616784 A C A C A A SNP on branch D0

Y 17625292 T C T C T T SNP on branch D0

Y 17631594 C T C T C C SNP on branch D0

Y 17635514 G A G A G G SNP on branch D0

Y 17673481 A G A G A A SNP on branch D0

Y 17708490 A C,G A G A A SNP on branch D0

17

Y 17713132 T A,C T C T T SNP on branch D0

Y 17730899 T A T A T T SNP on branch D0

Y 17774409 G A G A G G SNP on branch D0

Y 17795962 G A G A G G SNP on branch D0

Y 17837771 C T C T C C SNP on branch D0

Y 17839512 T C T C T T SNP on branch D0

Y 17840656 A G A G A A SNP on branch D0

Y 17854404 G A G A G G SNP on branch D0

Y 17871478 A G A G A A SNP on branch D0

Y 17905232 C G C G C C SNP on branch D0

Y 17957591 T C T C T T SNP on branch D0

Y 17963043 G T G T G G SNP on branch D0

Y 17965193 A C A C A A SNP on branch D0

Y 18063984 C T C T C C SNP on branch D0

Y 18066111 C G C G C C SNP on branch D0

Y 18096914 G C,T G C G G SNP on branch D0

Y 18101403 C T C T C C SNP on branch D0

Y 18159807 C A C A C C SNP on branch D0

Y 18206875 C T C T C C SNP on branch D0

Y 18211128 T C T C T T SNP on branch D0

Y 18254196 C T C T C C SNP on branch D0

Y 18562963 T C T C T T SNP on branch D0

Y 18592596 G A G A G G SNP on branch D0

Y 18620031 G A,T G A G G SNP on branch D0

Y 18687010 C A C A C C SNP on branch D0

Y 18708674 C A,T C T C C SNP on branch D0

Y 18749926 T C T C T T SNP on branch D0

Y 18753056 G A G A G G SNP on branch D0

Y 18777364 A G A G A A SNP on branch D0

Y 18806613 C T C T C C SNP on branch D0

Y 18889866 G T G T G G SNP on branch D0

Y 18911543 A G A G A A SNP on branch D0

Y 18964792 G C G C G G SNP on branch D0

Y 18966290 G A G A G G SNP on branch D0

Y 19025196 A G A G A A SNP on branch D0

Y 19092442 C T C T C C SNP on branch D0

Y 19133279 T C T C T T SNP on branch D0

Y 19134383 C T C T C C SNP on branch D0

Y 19155428 G T G T G G SNP on branch D0

Y 19160994 G A G A G G SNP on branch D0

Y 19209312 C T C T C C SNP on branch D0

Y 19267189 A C,G A G A A SNP on branch D0

Y 19277949 T G T G T T SNP on branch D0

Y 19284117 C T C T C C SNP on branch D0

Y 19328534 A T A T A A SNP on branch D0

Y 19367635 T G T G T T SNP on branch D0

Y 19411026 A G A G A A SNP on branch D0

Y 19424791 C T C T C C SNP on branch D0

Y 19426445 C T C T C C SNP on branch D0

Y 19437058 C A,G,T C T C C SNP on branch D0

18

Y 19438023 G A G A G G SNP on branch D0

Y 19448656 C T C T C C SNP on branch D0

Y 19495738 G A G A G G SNP on branch D0

Y 19496243 T C,G T C T T SNP on branch D0

Y 19499938 G A G A G G SNP on branch D0

Y 19501594 C T C T C C SNP on branch D0

Y 19509942 C G,T C T C C SNP on branch D0

Y 19527979 C A C A C C SNP on branch D0

Y 19538674 T C T C T T SNP on branch D0

Y 21073079 C A C A C C SNP on branch D0

Y 21082726 C T C T C C SNP on branch D0

Y 21134961 G A,C G A G G SNP on branch D0

Y 21138948 G T G T G G SNP on branch D0

Y 21143021 A C A C A A SNP on branch D0

Y 21143443 T C,G T C T T SNP on branch D0

Y 21250107 G A G A G G SNP on branch D0

Y 21260548 G A G A G G SNP on branch D0

Y 21263431 C T C T C C SNP on branch D0

Y 21270765 T A T A T T SNP on branch D0

Y 21282240 T C T C T T SNP on branch D0

Y 21284949 A C,G A G A A SNP on branch D0

Y 21297607 G A G A G G SNP on branch D0

Y 21318800 G A G A G G SNP on branch D0

Y 21326964 C T C T C C SNP on branch D0

Y 21353436 G A G A G G SNP on branch D0

Y 21369802 A G A G A A SNP on branch D0

Y 21379315 T G T G T T SNP on branch D0

Y 21447391 A T A T A A SNP on branch D0

Y 21472585 A T A T A A SNP on branch D0

Y 21472598 A G A G A A SNP on branch D0

Y 21481057 G A G A G G SNP on branch D0

Y 21491737 A G A G A A SNP on branch D0

Y 21493978 G T G T G G SNP on branch D0

Y 21537520 A C C A A A SNP on branch D0

Y 21539806 G A,T G A G G SNP on branch D0

Y 21545069 G A G A G G SNP on branch D0

Y 21551016 C T C T C C SNP on branch D0

Y 21555791 G C G C G G SNP on branch D0

Y 21559261 G T G T G G SNP on branch D0

Y 21562163 A C A C A A SNP on branch D0

Y 21571786 C T C T C C SNP on branch D0

Y 21579488 T C T C T T SNP on branch D0

Y 21626431 C A C A C C SNP on branch D0

Y 21635462 G A G A G G SNP on branch D0

Y 21674409 A G A G A A SNP on branch D0

Y 21677887 A G A G A A SNP on branch D0

Y 21686075 G A G A G G SNP on branch D0

Y 21718932 C A C A C C SNP on branch D0

Y 21738640 A T A T A A SNP on branch D0

Y 21817774 G T G T G G SNP on branch D0

19

Y 21862376 G A G A G G SNP on branch D0

Y 21870701 A G A G A A SNP on branch D0

Y 21888494 G A G A G G SNP on branch D0

Y 21914507 C T C T C C SNP on branch D0

Y 21963685 C A C A C C SNP on branch D0

Y 21980440 A T A T A A SNP on branch D0

Y 21991481 C T C T C C SNP on branch D0

Y 22023057 T C T C T T SNP on branch D0

Y 22066023 C T C T C C SNP on branch D0

Y 22090416 C A C A C C SNP on branch D0

Y 22091959 A C,G A G A A SNP on branch D0

Y 22095736 A C A C A A SNP on branch D0

Y 22204423 C A C A C C SNP on branch D0

Y 22206482 T C T C T T SNP on branch D0

Y 22545071 A C A C A A SNP on branch D0

Y 22554520 G T G T G G SNP on branch D0

Y 22570658 C G C G C C SNP on branch D0

Y 22656596 T C T C T T SNP on branch D0

Y 22695457 C A,T C T C C SNP on branch D0

Y 22695458 A G A G A A SNP on branch D0

Y 22713623 G A,C G A G G SNP on branch D0

Y 22741850 C G C G C C SNP on branch D0

Y 22762400 C T C T C C SNP on branch D0

Y 22795562 C A C A C C SNP on branch D0

Y 22828227 C A C A C C SNP on branch D0

Y 22831056 C T C T C C SNP on branch D0

Y 22834876 A G A G A A SNP on branch D0

Y 22859408 G T G T G G SNP on branch D0

Y 22877026 A G A G A A SNP on branch D0

Y 22891209 A G A G A A SNP on branch D0

Y 22957062 T G T G T T SNP on branch D0

Y 22981592 G A G A G G SNP on branch D0

Y 22983357 A T A T A A SNP on branch D0

Y 22989279 T C T C T T SNP on branch D0

Y 22989825 T G T G T T SNP on branch D0

Y 23021904 A G A G A A SNP on branch D0

Y 23036719 C A,T C T C C SNP on branch D0

Y 23045034 T G T G T T SNP on branch D0

Y 23046699 A G A G A A SNP on branch D0

Y 23056406 G C G C G G SNP on branch D0

Y 23079345 G A G A G G SNP on branch D0

Y 23089277 C T C T C C SNP on branch D0

Y 23096248 G A G A G G SNP on branch D0

Y 23098745 A G A G A A SNP on branch D0

Y 23108446 G A G A G G SNP on branch D0

Y 23111012 C A C A C C SNP on branch D0

Y 23122442 A T A T A A SNP on branch D0

Y 23124543 G A G A G G SNP on branch D0

Y 23144492 G A,T G A G G SNP on branch D0

Y 23146811 G C G C G G SNP on branch D0

20

Y 23154044 C T C T C C SNP on branch D0

Y 23155355 A G A G A A SNP on branch D0

Y 23273837 C G C G C C SNP on branch D0

Y 23292618 C T C T C C SNP on branch D0

Y 23348356 T C,G T G T T SNP on branch D0

Y 23350698 T C T C T T SNP on branch D0

Y 23369345 G A G A G G SNP on branch D0

Y 23377470 A G A G A A SNP on branch D0

Y 23382836 C A C A C C SNP on branch D0

Y 23436143 A C,T A T A A SNP on branch D0

Y 23441139 G C G C G G SNP on branch D0

Y 23450248 A G A G A A SNP on branch D0

Y 23462529 G A G A G G SNP on branch D0

Y 23499910 A G A G A A SNP on branch D0

Y 23504298 A T A T A A SNP on branch D0

Y 23523496 G C G C G G SNP on branch D0

Y 23575674 G A G A G G SNP on branch D0

Y 23583903 T C,G T C T T SNP on branch D0

Y 23594139 C T C T C C SNP on branch D0

Y 23749105 T C,G T C T T SNP on branch D0

Y 23758375 C G C G C C SNP on branch D0

Y 23761946 A G A G A A SNP on branch D0

Y 23797822 C A,T C T C C SNP on branch D0

Y 23849932 G A G A G G SNP on branch D0

Y 23852103 G A G A G G SNP on branch D0

Y 23857560 A C A C A A SNP on branch D0

Y 23860258 A G A G A A SNP on branch D0

Y 23872967 G A G A G G SNP on branch D0

Y 23968194 C T C T C C SNP on branch D0

Y 24459882 A G A G A A SNP on branch D0

Y 28604656 G C G C G G SNP on branch D0

Y 28615774 G T G T G G SNP on branch D0

Y 28639397 T C T C T T SNP on branch D0

Y 28654892 G A G A G G SNP on branch D0

Y 28655745 T C T C T T SNP on branch D0

Y 28676222 G A G A G G SNP on branch D0

Y 28705352 C T C T C C SNP on branch D0

Y 28731389 G A G A G G SNP on branch D0

Y 28741469 C T C T C C SNP on branch D0

Y 28747656 T A T A T T SNP on branch D0

Y 28750611 T A,G T A T T SNP on branch D0

Y 28761112 G A G A G G SNP on branch D0

Y 8685674 C T C T C C SNP on branch D0 recurs elsewhere (C1b2a)

Y 2708779 A C,G A G A A SNP on branch D0 recurs elsewhere (F2)

Y 23554761 A C A C A C SNP on branch D0 recurs on E (or back mutation on D)

Y 15281995 G A G A A G SNP on branch D0|D

Y 15343571 T C T C C T SNP on branch D0|D

Y 15343575 T G T G G T SNP on branch D0|D

Y 17545328 C T C T T C SNP on branch D0|D

21

Y 17839184 C T C T T C SNP on branch D0|D

Y 19025435 A T A T T A SNP on branch D0|D

Y 23286394 T C T C C T SNP on branch D0|D

Y 2728927 A G A G G G SNP on branch D0|D|E

Y 6931590 T A,C,G T C C C SNP on branch D0|D|E

Y 7169899 C T C T T T SNP on branch D0|D|E

Y 7320623 G A G A A A SNP on branch D0|D|E

Y 7332132 C T C T T T SNP on branch D0|D|E

Y 7540450 A G A G G G SNP on branch D0|D|E

Y 8072664 C T C T T T SNP on branch D0|D|E

Y 8816007 T G T G G G SNP on branch D0|D|E

Y 9142309 C T C T T T SNP on branch D0|D|E

Y 9851457 G T G T T T SNP on branch D0|D|E

Y 14842223 C T C T T T SNP on branch D0|D|E

Y 15543875 C A,G,T C T T T SNP on branch D0|D|E

Y 15590674 T A T A A A SNP on branch D0|D|E

Y 15591537 G A,C,T G C C C SNP on branch D0|D|E

Y 17011456 G A,T G A A A SNP on branch D0|D|E

Y 18561042 T G T G G G SNP on branch D0|D|E

Y 18907927 A G A G G G SNP on branch D0|D|E

Y 19338287 T C T C C C SNP on branch D0|D|E

Y 19370916 G A G A A A SNP on branch D0|D|E

Y 21227423 G T G T T T SNP on branch D0|D|E

Y 21306315 G A G A A A SNP on branch D0|D|E

Y 21398584 T A,G T A A A SNP on branch D0|D|E

Y 21458735 A G,T A G G G SNP on branch D0|D|E

Y 21582351 A C,G A G G G SNP on branch D0|D|E

Y 21717208 C T C T T T SNP on branch D0|D|E

Y 22014732 C A,G C G G G SNP on branch D0|D|E

Y 22721522 T C T C C C SNP on branch D0|D|E

Y 23109543 T C,G T C C C SNP on branch D0|D|E

Y 24392653 C G C G G G SNP on branch D0|D|E

22

Haplogroup 1† Haplogroup 2 T (kya) Lower Upper

AB CT 101.1 89.4 114.7

DE CF 76.5 67.6 86.7

C F 76.0 67.2 86.2

D0|D E 73.2 64.7 83.0

D0 D 71.4 63.1 81.0

Nigerian-1|Nigerian-2 Nigerian-3 2.5 2.2 2.8

Nigerian-1 Nigerian-2 1.9 1.7 2.2

E1 E2 58.9 52.1 66.8

E1b1a E1b1b 47.4 41.9 53.7

E1b1b1a E1b1b1b 39.6 35.0 44.9

E1b1b1a1a E1b1b1a1b 14.8 13.1 16.8

C1 C2 53.3 47.1 60.5

G HIJK 56.9 50.3 64.6

H IJK 56.4 49.9 64.0

IJ K 55.2 48.8 62.6

I J 49.9 44.1 56.6

I1 I2 32.9 29.0 37.3

J1 J2 37.1 32.8 42.1

J2a J2b 32.6 28.8 37.0

LT MNOPS 53.4 47.2 60.6

L T 51.0 45.0 57.8

NO MPS 52.8 46.7 59.9

MS P 52.8 46.7 59.9

M S 52.7 46.6 59.8

N O 46.7 41.2 52.9

O1 O2 38.5 34.0 43.7

Q R 37.7 33.3 42.8

Q1a Q1b 35.4 31.2 40.1

R1 R2 32.4 28.7 36.8

R1a R1b 25.6 22.6 29.0

R1b1a2a2 R1b1a2a1 7.8 6.9 8.9

R1a1a1b1 R1a1a1b2 6.2 5.5 7.1†Assigned using ISOGG v11.01‡Based on 95% HPD Interval of the mutation rate estimate

95% CI‡

Table S2. Split-time estimates using the ρ statistic