extreme patterns of variance in small populations: placing ... · human communities, and...

17
doi: 10.1111/j.1469-1809.2006.00327.x Extreme Patterns of Variance in Small Populations: Placing Limits on Human Y-Chromosome Diversity through Time in the Vanuatu Archipelago M. Cox Arizona Research Laboratories – Biotechnology, 1041 East Lowell Street, Biological Sciences West, Room 246B, University of Arizona, Tucson, AZ 85721, USA Summary Small populations are dominated by unique patterns of variance, largely characterized by rapid drift of allele frequen- cies. Although the variance components of genetic datasets have long been recognized, most population genetic studies still treat all sampling locations equally despite differences in sampling and effective population sizes. Be- cause excluding the effects of variance can lead to significant biases in historical reconstruction, variance components should be incorporated explicitly into population genetic analyses. The possible magnitude of variance effects in small populations is illustrated here via a case study of Y-chromosome haplogroup diversity in the Vanuatu Archipelago. Deme-based modelling is used to simulate allele frequencies through time, and conservative confidence bounds are placed on the accumulation of stochastic variance effects, including diachronic genetic drift and contempo- rary sampling error. When the information content of the dataset has been ascertained, demographic models with parameters falling outside the confidence bounds of the variance components can then be accepted with some statistical confidence. Here I emphasize how aspects of the demographic history of a population can be disentangled from stochastic variance effects, and I illustrate the extreme roles of genetic drift and sampling error for many small human population datasets. Keywords: Demography, Variance Components, Monte Carlo Simulation Introduction A detailed reconstruction of prehistoric demography remains the primary goal of human molecular an- thropology. One standard approach towards this aim involves screening history-informative genetic loci (usually mitochondrial or from the non-recombining Y-chromosome, NRY) in a small number of individuals sampled at discrete locations over a broad geographical area (e.g. Cox & Lahr 2006). Subsequently, past relation- ships are inferred by calculating similarities and differ- ences within and between these contemporary groups, and extrapolating the patterns observed today backwards Corresponding Author: Dr. Murray P. Cox. Tel: +1-520- 621 9791; Fax: +1-520-626 8050, E-mail: mpcox@ email.arizona.edu into the past. This practice has proved highly informative for recognizing patterns in the modern genetic record on broad geographical scales (e.g. Cavalli-Sforza et al. 1994; Jobling & Tyler-Smith 2003), but has proved less success- ful at reconstructing the historical processes underlying those patterns. In particular, few contemporary studies explicitly account for the extensive stochastic variance that occurs from generation to generation in small hu- man communities – even though this effect has long been recognized as an important component of genetic demography (Wright, 1931). Even fewer studies quan- titatively address the effects that variance components have on historical reconstructions from contemporary patterns of genetic diversity. However, because demo- graphic processes in small groups are the currency of large-scale demographic events, a better appreciation of stochasticity in small human populations is a prerequisite 390 Annals of Human Genetics (2006) 71,390–406 C 2006 The Author Journal compilation C 2006 University College London

Upload: others

Post on 28-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

doi: 10.1111/j.1469-1809.2006.00327.x

Extreme Patterns of Variance in Small Populations:Placing Limits on Human Y-Chromosome Diversitythrough Time in the Vanuatu Archipelago

M. CoxArizona Research Laboratories – Biotechnology, 1041 East Lowell Street, Biological Sciences West, Room 246B, University ofArizona, Tucson, AZ 85721, USA

Summary

Small populations are dominated by unique patterns of variance, largely characterized by rapid drift of allele frequen-

cies. Although the variance components of genetic datasets have long been recognized, most population genetic

studies still treat all sampling locations equally despite differences in sampling and effective population sizes. Be-

cause excluding the effects of variance can lead to significant biases in historical reconstruction, variance components

should be incorporated explicitly into population genetic analyses. The possible magnitude of variance effects in small

populations is illustrated here via a case study of Y-chromosome haplogroup diversity in the Vanuatu Archipelago.

Deme-based modelling is used to simulate allele frequencies through time, and conservative confidence bounds

are placed on the accumulation of stochastic variance effects, including diachronic genetic drift and contempo-

rary sampling error. When the information content of the dataset has been ascertained, demographic models with

parameters falling outside the confidence bounds of the variance components can then be accepted with some

statistical confidence. Here I emphasize how aspects of the demographic history of a population can be disentangled

from stochastic variance effects, and I illustrate the extreme roles of genetic drift and sampling error for many small

human population datasets.

Keywords: Demography, Variance Components, Monte Carlo Simulation

Introduction

A detailed reconstruction of prehistoric demography

remains the primary goal of human molecular an-

thropology. One standard approach towards this aim

involves screening history-informative genetic loci

(usually mitochondrial or from the non-recombining

Y-chromosome, NRY) in a small number of individuals

sampled at discrete locations over a broad geographical

area (e.g. Cox & Lahr 2006). Subsequently, past relation-

ships are inferred by calculating similarities and differ-

ences within and between these contemporary groups,

and extrapolating the patterns observed today backwards

Corresponding Author: Dr. Murray P. Cox. Tel: +1-520-

621 9791; Fax: +1-520-626 8050, E-mail: mpcox@

email.arizona.edu

into the past. This practice has proved highly informative

for recognizing patterns in the modern genetic record on

broad geographical scales (e.g. Cavalli-Sforza et al. 1994;

Jobling & Tyler-Smith 2003), but has proved less success-

ful at reconstructing the historical processes underlying

those patterns. In particular, few contemporary studies

explicitly account for the extensive stochastic variance

that occurs from generation to generation in small hu-

man communities – even though this effect has long

been recognized as an important component of genetic

demography (Wright, 1931). Even fewer studies quan-

titatively address the effects that variance components

have on historical reconstructions from contemporary

patterns of genetic diversity. However, because demo-

graphic processes in small groups are the currency of

large-scale demographic events, a better appreciation of

stochasticity in small human populations is a prerequisite

390 Annals of Human Genetics (2006) 71,390–406 C© 2006 The AuthorJournal compilation C© 2006 University College London

Diachronic Variance in Small Populations

to understanding major demographic events in human

history. Here I illustrate the extreme effects of histori-

cal and contemporary variance components on a rep-

resentative genetic dataset, and present simple analytical

framework to address them.

Monte Carlo genetic simulation can be used to infer

minimum levels of stochastic variance in experimentally

determined genetic profiles from small populations. Be-

cause sampling effects involve random events, only sim-

ulation methods can estimate the relative contributions

of the two main variance factors: diachronic (through

time) genetic drift and modern sampling error. How-

ever, this approach is not readily available to population

genetic researchers. Because the timeframe over which

small population studies are useful is so short, mutation

events are rare, and therefore simulation under the coa-

lescent (e.g. Hudson, 2002) is impractical. Deme-based

simulation is a more practical approach, but generally

requires relatively sophisticated programming skills (e.g.

Thomas et al. 2006). Software intended for evolution-

ary prediction in a classroom setting (e.g. Alstad, 2003;

Felsenstein, 2005) is often less useful for research infer-

ence. The approach implemented here is a simple way to

fit historical reconstructions with statistical confidence

bounds inferred from contemporary genetic datasets.

This allows the researcher to address two fundamental

questions: which aspects of a genetic pattern likely re-

flect real underlying demographic processes, and which

are probably just random distortions in the data?

The Indo-Pacific region provides a textbook model

system in which to examine variance dynamics in small

human communities, and Y-chromosome haplogroup

data from ten islands in the Vanuatu Archipelago are

presented here as an exemplar. Genetic surveys in Ocea-

nia typically involve small sample sizes, although these

are often proportionally large relative to the magni-

tude of local communities. In Cox and Lahr’s (2006)

compilation of published Indo-Pacific Y-chromosome

data, samples had a median size of only 38 individuals

[range: 11-94]. Island communities are also small rela-

tive to many continental groups. Using ethno-linguistic

group sizes as a proxy for the maximum extent of lo-

cal communities, the median group size in Vanuatu is

only 450 individuals [range: 50-7,000], and this dis-

tribution is skewed markedly towards populations of

fewer than a thousand (Gordon, 2005). Because these

ethno-linguistic communities develop relatively slowly,

it seems reasonable that similar conditions have held in

Oceania through history, and the genetic profiles ob-

served in small human populations today must have been

influenced strongly by stochastic processes of genetic

drift in the past. Furthermore, although limited sam-

ple collection is a natural outcome of the difficulties

inherent in human population sampling (Cox & Lahr

2006), this factor still strongly biases the observed fre-

quencies of alleles and their inferred geographic dis-

tributions. Confounding influences such as these are

not commonly factored into reconstructions of human

genetic history, particularly those of the Indo-Pacific

region, although attempts to estimate the founding pop-

ulation size of New Zealand remain notable exceptions

(Murray-McIntosh et al. 1998; Whyte et al. 2005).

Here, I illustrate the extreme effects of variance com-

ponents in representative population genetic datasets

from small human communities, and I present an an-

alytical approach to help discriminate between demog-

raphy and stochastic variance using these data. Firstly, I

show how minimum bounds for the confounding effects

of diachronic genetic drift and modern sampling error

can be estimated from contemporary genetic data, and

secondly, I show how this information helps to deter-

mine how much confidence can be placed on historical

reconstructions inferred from a given genetic dataset.

This reconstructive approach is explained via a small

scale case study of Y-chromosome haplogroup varia-

tion in 235 men (called ni-Vanuatu) from ten island

groups in the Vanuatu Archipelago (Cox, 2003). Aspects

of the reconstructed genetic history are compared with

anthropological evidence for the initial settlement and

subsequent history of the Vanuatu archipelago, and the

tangled effects of historical demography and stochas-

tic variance are exposed within this exemplar dataset.

The primary focus of this paper is variance factors in

small communities. Readers interested in a broader his-

tory of the Indo-Pacific region as inferred from genetic

data should consult the recent extensive literature on

this topic (e.g. Capelli et al. 2001; Forster et al. 2001;

Kayser et al. 2001a; Underhill et al. 2001; Friedlaen-

der et al. 2002; Hurles et al. 2002; Lum et al. 2002;

Tommaseo-Ponzetta et al. 2002; Hurles et al. 2003;

Ingman & Gyllensten 2003; Kayser et al. 2003; Cann

& Lum 2004; Cox, 2005b; Friedlaender et al. 2005;

C© 2006 The AuthorJournal compilation C© 2006 University College London

Annals of Human Genetics (2006) 71,390–406 391

M. Cox

Table 1 Vanuatu Population Samples: island group, island sampling location, sample size, Y-chromosome haplogroup frequencies,

estimated total census size, and estimated male effective population size.

Y-Chromosome Haplogroup Frequencies

Group Sampling Location NS C K∗ K5 M O3 R Nc Nm

North Banks and Torres 3 0.33 0.33 0 0.33 0 0 - -

Central Lamenu 27 0.26 0.56 0.04 0.15 0 0 1,177 466

Maewo 20 0.40 0.25 0 0.25 0.10 0 1,679 664

Malekula 23 0.17 0.83 0 0 0 0 25,737 10,186

Maskelynes 22 0 0.05 0 0.95 0 0 1,883 745

Nguna 28 0 0.54 0 0.46 0 0 4,707 1,863

Paama 36 0 0.61 0 0.17 0.22 0 9,413 3,725

South Aniwa 27 0.74 0.11 0 0.04 0 0.11 471 186

Futuna 24 0.04 0.25 0.54 0.13 0 0.04 471 186

Tanna 25 0 0.40 0 0.60 0 0 26,278 10,400

Total 235 0.174 0.413 0.060 0.294 0.043 0.017 71,816 28,421

Karafet et al. 2005; Trejaut et al. 2005; Cox & Lahr

2006; Scheinfeldt et al. 2006).

Materials and Methods

Biological Samples

Don Bowden (Monash University, Australia) and Mary

Ganczakowski (Queen Alexandra Hospital, U.K.) ob-

tained blood samples from ni-Vanuatu volunteers

during the early 1990s. Biological collections were

made with the informed consent of participants, who

came from rural communities and displayed symp-

toms putatively associated with haemoglobin disor-

ders. A sampling strategy focused on individuals with

haemoglobinopathies should not bias genetic varia-

tion observed on the non-recombining Y-chromosome

(NRY), because there is complete lineage sorting

between NRY markers and autosomal loci, and

haemoglobin disorders are common in the archipelago

(and therefore not restricted to kin groups with cor-

respondingly biased Y-chromosome diversity; Williams

et al. 1996).

Screening NRY Variation

Male DNA samples (N = 235) from ten island popula-

tions in the Vanuatu Archipelago (Table 1) were screened

for Y-chromosome haplogroup lineages (Y Chromo-

some Consortium, 2002; Jobling & Tyler-Smith 2003).

Men were assigned to six Y-chromosome haplogroups:

C-M130, K∗-M9(xK1-M177, K5-M230, L-M11,

M-M106, NO-M214, P-92R7), K5-M230, M-M106,

O3-M122, and R-M207 (Cox, 2003). All lineages, ex-

cept K5-M230 and O3-M122, were screened using the

hierarchical protocols of Cox (2006). K5-M230 was

assayed using the method of Kayser et al. (2003). An

amplicon containing the O3-M122 marker was gen-

erated with primers M122f 5′-TGG TAA ACT CTA

CTT AGT TGC CTT T-3′ and M122r 5′-TTT GCT

TTG TTA GAT TCT GGT GTT G-3′ , and if ances-

tral, cleaved into two fragments of 147 and 74 bp with

the restriction endonuclease Hsp92II (Cox, 2003).

Estimating Demographic Parameters

The dynamics of human populations in the Vanuatu

Archipelago have been characterized only superficially.

The best information sources include the 1999 national

census (National Statistics Office, 2000), which is ac-

curate but does not present demographic data on geo-

graphic/linguistic groupings; and the 2005 Ethnologue

report (Gordon, 2005), which has older and less accurate

information, but provides more detailed regional data

focusing on the languages present in Vanuatu and the

numbers of their speakers. Census sizes (Nc) of the geo-

graphic/linguistic groups studied here (Table 1) are es-

timated from both sources. For each sampling location,

the number of individuals speaking each language was

calculated from Ethnologue statistics, and scaled pro-

portionally to contemporary levels by weighting against

the total population size registered by the 1999 national

census.

392 Annals of Human Genetics (2006) 71,390–406 C© 2006 The AuthorJournal compilation C© 2006 University College London

Diachronic Variance in Small Populations

Census population sizes include individuals who do

not contribute biologically to following generations,

and therefore overestimate effective population sizes

(Ne). For Y-chromosome studies, effective male popula-

tion sizes (Nm, Table 1) are more realistic estimates of the

number of men contributing to each study population

over time than raw census values (cf. Hedrick, 2005).

Nm estimates were calculated by scaling regional census

sizes (Nc) by a ratio of the total effective population size

to the census population size (Ne/Nc = 0.786) and a ra-

tio of female to male effective population sizes (Nf /Nm

= 0.984). These values were derived from ethnologi-

cal data of the Gainj (northern New Guinea), the only

Oceanic population for which such detailed informa-

tion is currently available (Storz et al. 2001). These val-

ues represent best approximations for the ni-Vanuatu.

Population History by Monte Carlo

Simulation

Estimates of sampling error and genetic drift were

inferred using time-reversible genetic simulation im-

plemented in the programe, DRIFTER (Cox, 2005a).

Isolated, panmictic and constant-sized Fisher-Wright

demes of size Nm were conditioned with allele fre-

quencies t0[A1, A2, . . . , An] at generation g0. Allele

frequencies were allowed to drift within demes for a dis-

crete number of non-overlapping generations until final

allele frequencies tn[A1, A2, . . . , An] were produced at

generation gn.

To mimic the effects of sampling error, a fraction

of each simulated deme equivalent to the experimen-

tal sample size, Ns, was chosen randomly from the final

generation. Sampling errors inferred by simulation were

checked by comparing against their deterministic solu-

tion,

s 2(pa ) =pa (1 − pa )

n(1)

where pa is the allele frequency, and n the sample size.

Final allele frequencies were recorded after this sampling

step, and the entire process repeated 10,000 times for

each population.

The demographic model employed here makes three

initial assumptions. Firstly, new mutations do not arise

during the course of the simulation (μ = 0). This can

be justified because the Vanuatu Archipelago was only

settled ca. 3,200 years ago (Spriggs, 1997; Bedford et al.

1998), by which time the observed Y-chromosome lin-

eages had already developed (e.g. Kayser et al. 2000,

2001b, 2003; Scheinfeldt et al. 2006). This absence of

mutation events over short time periods is a primary

reason why coalescent approaches cannot be used to

model the sort of demographic system under study here.

Secondly, population sizes are treated as constants. The

paucity of archaeological and genetic information on

population size change over time necessarily imposes

this assumption, although current archaeological evi-

dence is entirely consistent with populations spreading

rapidly across Vanuatu and quickly reaching carrying

capacity. Thirdly, no migration occurs between popula-

tions within the timeframe of the simulation (m = 0).

Deviations from these premises are discussed later in the

text.

Placed in real terms, this demographic equilibrium

model assumes that demes are geographically scattered

and genetically isolated before the simulation begins.

This can be interpreted as a near instantaneous disper-

sal of populations across the Vanuatu archipelago, which

seems a reasonable proxy for the incredibly swift settle-

ment of these islands by rapidly expanding agricultural

populations (Bedford, 2003). Nevertheless, all Oceanic

populations involved themselves in some level of gene

flow – soon after settlement and during later history.

Here, the effects of migration in Vanuatu are initially

assumed to be significantly less than the combined ef-

fects of genetic drift and sampling error. This assump-

tion is supported by some Oceanic ethnographic data

(Friedlaender, 1975; Long et al. 1986; M. Cox, un-

published genealogies from Solomon Islands, 2004), al-

though less well by others (Friedlaender et al. 2005).

However, the Y-chromosome data themselves further

espouse this assumption.

Deme-based genetic simulation must be conditioned

on a set of founding allele frequencies. Because historical

lineage frequencies cannot be determined directly from

modern sources, simulation studies must use practical

alternatives. Backwards-in-time simulation with initial

frequencies from experimental modern data is useful for

placing confidence limits on historical allele frequencies.

Within the limits of the simulation model, genetic drift

could not have produced the allele frequencies observed

C© 2006 The AuthorJournal compilation C© 2006 University College London

Annals of Human Genetics (2006) 71,390–406 393

M. Cox

in the present if historical allele frequencies had fallen

outside these bounds in the past.

Alternatively, we can mimic the effect of popula-

tion subdivision on allele frequency distributions using

forward-in-time simulation. Because the effect of ge-

netic drift is inversely proportional to Ne, large pop-

ulations diverge more slowly from ancestral lineage

frequencies than small populations. Therefore, for a

model of populations with homogeneous initial allele

frequencies, the averages of modern allele frequencies

weighted by each island group’s effective population size

are reasonable proxies:

f a =∑

Nmi pa i∑Nmi

(2)

where fa is the weighted frequency of allele a, and Nm

and pa are the male effective population size and fre-

quency of allele a in the ith population, respectively. For

a model of populations with heterogeneous initial allele

frequencies, simulations can be conditioned on allele

frequencies chosen by dual-step bootstrapping of the

experimental data. Demic arrangements with allele fre-

quencies showing the desired range of starting FST

values can be adopted to seed genetic simulations.

Given that historical allele frequency distributions are

inherently unknowable, approaches such as these are

necessary approximations for any deme-based simula-

tion framework.

Finally, times inferred in generation intervals in the

simulations were converted to chronological time using

a mean male inter-generation interval of 31 years, as cal-

culated from cross-cultural ethnographic data (Fenner,

2005).

Geographical Distances and Matrix

Correlation

Geographical distances between sampling locations were

calculated as geodesic arcs with EARTH (Byers, 2005),

and correlated against genetic distances using the

Mantel (1967) test implemented in the ADE4 module

of the R statistical package (R Project, 2006). Geo-

graphical distances were ln transformed to compensate

for two-dimensional spatial analysis, and pairwise FST

values were linearised by FST /(1 − FST ) to generate

values varying from 0 → ∞ in analogy with geograph-

ical distance (Rousset, 1997).

Moment Estimators for Migration and

Population Subdivision

Hierarchical F statistics, a moment estimator measure of

population subdivision, were calculated using the hierar-

chical method of Yang (1998) implemented in HIERFSTAT

(Goudet, 2005), a module of the statistical package R

(R Project, 2006). The significance of pairwise FST val-

ues was determined by bootstrapping the experimental

data (10,000 replicates), and setting the observed value’s

rank as its probability.

Equilibrium values for inter-deme migration rates

were inferred using Wright’s moment estimator

(Wright, 1931, 1951; Slatkin, 1985) for haploid data:

FST ≈ 1

2Nm m + 1(3)

where m is the migration rate per generation between

two given populations averaged over time. Although

the assumptions of this statistic (e.g. an island model,

no selection, no mutation) are invalid for many real-

world situations (Whitlock & McCauley 1999), they

are reasonable first approximations for the case study

region.

Results

Experimental Results

Cox (2003) identified six Y-chromosome haplogroup

lineages in 235 men from ten islands in the Vanuatu

Archipelago: C-M130, K∗-M9(xK1-M177, K5-M230,

L-M11, M-M106, NO-M214, P-92R7) – hereafter ab-

breviated to K∗-M9, K5-M230, M-M106, O3-M122,

and R-M207 (Figure 1). All lineages, except R-M207,

are prevalent in Oceania and Southeast Asia. The four

R-M207 carriers identified in this study most likely

descend from recent European ancestors, and are only

found on two southern islands (Aniwa and Futuna) with

a long history of Western influence (e.g. Buxton, 1926).

These individuals were therefore ignored in the fol-

lowing analysis, which has its primary focus on recon-

structing variance in pre-contact demography. The small

size of the Banks or Torres assemblage (N = 3) limits

its statistical power, and so these individuals were also

excluded.

394 Annals of Human Genetics (2006) 71,390–406 C© 2006 The AuthorJournal compilation C© 2006 University College London

Diachronic Variance in Small Populations

Figure 1 Sampling Locations and Observed Y-Chromosome

Haplogroup Frequencies in the Vanuatu Archipelago. Pie charts

are proportional to sample size (except for Banks and Torres, as

this pie chart has been enlarged for readability). Dashed lines

delineate the northern, central and southern regions of the

archipelago.

F statistics were adopted as a measure of popula-

tion subdivision. Panmictic populations are indicated

by FST ≈ 0, and populations are increasingly subdi-

vided as FST → 1. Extensive population substructuring

is not expected among communities in Vanuatu, because

the archipelago was settled only ∼ 3,200 years before

present (BP). Yet contemporary geographic/linguistic

groups show strong signals of subdivision (FST = 0.34,

p < 0.001) equivalent to those at inter-continental scales

(Hammer et al. 2001). However, Y-chromosome subdi-

vision (as measured by FST distances) is not correlated

with geographical distance (Mantel test, r = −0.077,

ns). This pattern – genetically divergent populations

with no geographical structuring – most likely reflects

stochastic, rather than deterministic, processes. There

is an increased rate of stochastic genetic drift both in

small island communities and with haploid loci, where

the effective population size (Ne) is one-fourth that of

autosomal regions (4Ne). To help place minimum con-

fidence limits on the demographic information content

of this dataset, Monte Carlo simulation was performed

to infer the relative contributions of stochastic genetic

drift and sampling error alone to the genetic profiles

observed from each population sample. Employing a

subtractive approach, confidence bounds could then be

placed on selected demographic parameters of inter-

est, such as estimates of historical allele frequencies and

the extent of ancestral population substructure. These

bounds are likely to underestimate the true uncertainty

of the genetic data – the vagaries of history will naturally

increase the variance of the dataset while simultaneously

preventing us from modelling it absolutely.

Confounding Effects of Migration

The demographic model implemented here assumes

that migration does not occur between populations

during the course of the simulation. Violations of this

assumption have little impact if migration rates are

low. In practice, some human populations can exhibit

very high rates of inter-deme mobility, but supporting

evidence suggests that inter-island male gene flow is

probably infrequent for the case study region. Firstly,

historical records of human mobility (male or female)

between island groups are sparse (e.g. Rivers, 1914, pp.

25, 47ff, 189, 340, 355) and suggest that men tra-

ditionally stayed in their home village after marriage

(i.e. patrilocality dominated; Rodman, 1979, p. 189).

Furthermore, the high rate of linguistic subdivision in

Vanuatu (greater than one hundred extant Austronesian

languages; Gordon, 2005), and the brief period in which

this must have developed (∼3,200 years), are also con-

sistent with limited historical mobility. Secondly (and

more quantitatively), approximate migration rate esti-

mates support historically low levels of gene flow within

Vanuatu (Table 2). The exceptions are three popula-

tion pairs – Lamenu/Maewo, Lamenu/Malekula and

Nguna/Tanna – which have genetic profiles consistent

C© 2006 The AuthorJournal compilation C© 2006 University College London

Annals of Human Genetics (2006) 71,390–406 395

M. Cox

Table 2 Moment Estimators for Population Subdivision and Migration Rates: pairwise FST values (below diagonal) and mean inter-

population migration rates (bidirectional number of individuals per generation, above diagonal). Bolded values are not significant and

reflect large inter-population migration rates. Highlighted areas represent comparisons between the Central and Southern groups of

Vanuatu populations; unhighlighted areas represent comparisons within the Central group of Vanuatu populations (top left) and within

the Southern group (bottom right).

Lamenu Maewo Malekula Maskelynes Nguna Paama Aniwa Futuna Tanna

Lamenu – 8.8 7.3 0.4 4.5 7.1 1.0 1.7 2.2

Maewo 0.054 – 1.2 0.6 2.5 2.4 2.4 1.7 2.3

Malekula 0.064 0.288 – 0.1 1.4 3.8 0.3 0.7 0.8

Maskelynes 0.551 0.456 0.795 – 0.7 0.4 0.1 0.4 1.4

Nguna 0.101 0.168 0.262 0.408 – 5.0 0.4 1.2 ∞Paama 0.066 0.174 0.115 0.554 0.091 – 0.5 1.3 2.2

Aniwa 0.342 0.174 0.600 0.791 0.548 0.511 – 0.5 0.4

Futuna 0.226 0.231 0.407 0.584 0.292 0.279 0.504 – 1.1

Tanna 0.183 0.177 0.399 0.267 -0.002 0.185 0.566 0.316 –

with substantial inter-island mobility (non-significant

FST values where FST → 0 and m → ∞ ). These island

pairs are effectively unified population groups from the

perspective of the NRY. However, even with elevated

levels of gene flow between some island groups, the me-

dian number of individuals moving between the sam-

pled locations are estimated at only ∼1.25 individuals

per generation. Male migration in Vanuatu has proba-

bly been negligible throughout much of prehistory (cf.

Spriggs, 1997, p. 136), especially when compared to

the effects of intra-deme genetic drift and sampling er-

ror.

Modelling Variance Factors via Monte Carlo

Simulation

Sampling Error

The effects of sampling error were estimated by deter-

ministic approximation (equation 1) and by genetic sim-

ulation. Theoretical uncertainty limits can be calculated

for specific allele frequency/sample size combinations

(Figure 2, solid and dotted lines), but this determinis-

tic solution masks important random deviations that are

observable in practice (Figure 2, circles). Sampling error

limits the accuracy of inferences about human history

that can be made from population genetic datasets. For

instance, allele frequency (p) estimates have an uncer-

tainty that decreases non-linearly with sample size as

p → [0,1]; alleles with intermediate frequency are in-

ferred less confidently. Increased sampling from a pop-

ulation also yields diminishing returns. Therefore, pop-

ulation samples with different sizes and different allele

frequency profiles are not directly comparable.

Genetic Drift

Genetic drift is a stochastic process, which is also

time-reversible under the specific demographic model

implemented here. Allele frequency change between

non-overlapping generations can be approximated by

a Markov process:

pt+1 = pt + ε (4)

where pt is the current allele frequency, pt+1 is the allele

frequency in the following generation, and ε is a normal

random variable with variance inversely proportional to

the effective population size,

ε ∼ N

(μ = 0, σ 2 ∝ 1

4Ne

)(5)

Because genetic drift involves a random component, it

has no strictly deterministic representation and must be

modelled by simulation.

Genetic drift is the variance factor most likely to con-

found inferences from modern experimental data be-

cause its effect increases as populations become smaller.

To illustrate the influence of historical genetic drift on

Y-chromosome variation in Vanuatu, simulations were

conditioned on the observed frequency data and ef-

fective male population size (Nm) of each sampling

location. Simulations were run for 102 generations,

a period equivalent to the settlement history of the

396 Annals of Human Genetics (2006) 71,390–406 C© 2006 The AuthorJournal compilation C© 2006 University College London

Diachronic Variance in Small Populations

Figure 2 95% Confidence Intervals for Lineage Frequencies

Relative to Sample Size. Analytical expectations for the effect of

sampling error with allele frequencies of p = [0.1, 0.9], p =[0.2, 0.8], p = [0.3, 0.7], and p = 0.5 (dotted lines, bottom to

top). The solid line represents the expected confidence interval

for alleles of p = [0.4, 0.6]; circles show variance in the mean

confidence interval by Monte Carlo simulation for p = 0.4

(10,000 replicates).

archipelago (∼3,200 years). Probable rates of allele fre-

quency drift over this timeframe were estimated from

10,000 simulation replicates, and 95% confidence inter-

vals for Y-chromosome lineage frequencies over time

were fitted to each Vanuatu population (Figure 3).

Within the limitations of the simulation model, these

intervals represent the probable frequency range (α =0.05) for a given Y-chromosome lineage at a particu-

lar moment backwards in time. Populations with small

Nm values (e.g. Futuna) have large minimum confidence

bounds, whereas populations with large Nm values (e.g.

Tanna) have smaller minimum confidence bounds. For

instance, even under the simple model implemented

here, we cannot exclude the possibility that Futuna

had a K∗-M9 frequency somewhere within the range

of 0-94% around 3,200 BP, whereas we can suggest

with the same level of confidence that the K∗-M9 fre-

quency on Tanna probably fell within a much tighter

range (31-50%). Population genetic studies typically

treat all samples equally, even though demes with larger

effective population sizes are buffered against genetic

drift and retain more historic information about greater

time depths. As a rough rule of thumb, Y-chromosome

SNP lineages from populations with small census sizes

(<1,000 men) will not be informative about demo-

graphic processes that occurred more than ∼1,000 years

ago. However, given that this relationship is non-linear

with time, larger populations are increasingly informa-

tive about processes further back in the past, just as

smaller populations are even less informative about more

recent historical processes.

Confidence bounds can also nominally be calculated

for the composite Vanuatu dataset as if it were an

unstructured population sample. However, ignor-

ing population subdivision artificially increases the

bias and decreases the variance of composite allele

frequency estimates. Confidence intervals are better

calculated for each sampling location separately, and av-

eraged by weighting against each population’s effective

size (equation 2). The final panel of Figure 3 (Van-

uatu) shows allele frequency bounds inferred for the

K∗-M9 lineage, firstly, assuming an unstructured

composite dataset (lower 95% confidence intervals), and

secondly, accounting for population sizes and K∗-M9

frequencies of the dataset’s constituent groups (upper

95% confidence intervals). Treating the Vanuatu dataset

as a composite sample not only underestimates the true

variance of the data, but also significantly biases inferred

frequency estimates. Such assemblages are notably less

reliable statistically than weighted analyses of their con-

stituent parts, which is problematic because composite

samples are commonly employed in population studies.

Testing for Ancestral Genetic Structure

Homogeneous Founding Populations

Drift and sampling variance can be co-opted to help

determine the certainty of contrasting historical recon-

structions. As an example, we can address the question

of population structure early in the settlement his-

tory of Vanuatu. Modern ni-Vanuatu have both Asian

and Melanesian ancestries, as evidenced by their

Y-chromosome lineages (Table 1; Cox & Lahr 2006), but

little information addresses the nature of early popula-

tion structure. Were the archipelago’s founding popula-

tions genetically heterogeneous, as might be expected

if Vanuatu was settled by distinct immigrant groups

with quite different Asian and Melanesian genetic con-

stitutions? Or were they genetically homogeneous, as

might be expected if Asian and Melanesian admixture

C© 2006 The AuthorJournal compilation C© 2006 University College London

Annals of Human Genetics (2006) 71,390–406 397

M. Cox

Figure 3 95% Confidence Intervals inferred for Y-Chromosome Haplogroup Lineage

Frequencies in Nine Vanuatu Populations. Graphs are ranked by effective male population size

(least to greatest, left to right, top to bottom). Allele frequency bounds were generated from

observed modern population frequencies via backwards-in-time simulation towards the initial

settlement of the archipelago ∼3,200 years BP (102 generations ago). The final panel, Vanuatu,

shows 95% confidence bounds of the K∗-M9 lineage for the entire Vanuatu dataset as a

composite (unstructured) assemblage and as an aggregate (structured) assemblage.

Y-chromosome lineages: C-M130, dotted; K∗-M9, solid; K5-M230, dotted-dashed; M-M106,

dashed; and O3-M122, long dashed.

398 Annals of Human Genetics (2006) 71,390–406 C© 2006 The AuthorJournal compilation C© 2006 University College London

Diachronic Variance in Small Populations

primarily occurred farther north in Melanesia (Hill &

Serjeantson 1989) with only a few already-admixed

founding groups settling the archipelago. To address this

question, minimum confidence bounds were placed on

the level of substructuring likely in the founding popu-

lations by conditioning genetic simulations with either

homogeneous or heterogeneous starting conditions.

After accounting for modern sampling error and ge-

netic drift over 102 generations (∼3,200 years), subdi-

vision in the simulated data was compared to the level

of subdivision observed today. The null hypothesis for

homogeneous founding populations is:

H0: Homogeneous founding populations, FST = 0, are

consistent with the modern observed value of population

subdivision, FST = 0.34.

Nine demes with Nm values equivalent to the sampled

populations were conditioned with weighted experi-

mental allele frequencies (equation 2). Genetic drift was

simulated (10,000 replicates) for each population over

102 generations (∼ 3,200 years), the experimental sam-

ple size (NS) was chosen randomly from the final gen-

eration, and allele frequencies recorded. The aggregate

FST was calculated for each set of Vanuatu replicates,

and the probability of the observed FST value was ascer-

tained by its rank in the distribution of simulated FST

values.

The mean simulated FST (= −3.1×10−4 ≈ 0) had

a 95% confidence interval distributed about zero (95%

CI [−0.023, 0.033]) suggesting that the subdivision ob-

served in Vanuatu today (FST = 0.34) is highly sig-

nificantly unlikely (p << 0.0001) under a model with

no ancestral structure (Figure 4, FST 0). Simply boot-

strapping the observed experimental data gave identi-

cal results (Figure 4, Resampled). Therefore, drift alone

produces only low levels of population subdivision after

102 generations given homogeneous founding popu-

lations, and the null hypothesis can be discounted in

this instance. Modern ni-Vanuatu men display signifi-

cant demic structure, which is inconsistent with genet-

ically homogeneous founding populations.

Heterogeneous Founding Populations

Alternative hypotheses are those in which genetically

heterogeneous populations founded Vanuatu. Because

there is no a priori reason for assuming a particular level

Figure 4 Distributions of Simulated FST Values. Population

samples bootstrapped (10,000 replicates) using observed allele

frequencies from the composite Vanuatu sample (Resampled),

and FST ranges for simulated population samples (10,000

replicates) after 102 generations (∼3,200 years) of genetic drift

(incorporating final-generation sampling error) starting from

four founding arrangements with variable levels of substructure,

FST = [0, 0.05, 0.10, 0.15]. The lower horizontal lines

demarcate founding FST values, and the uppermost horizontal

line delineates the observed level of population subdivision (FST

= 0.34).

of heterogeneity in the founding populations, a range

of subdivision values were simulated:

H0: Heterogeneous founding populations, FST = [0.05,

0.10, 0.15], are consistent with the modern observed value,

FST = 0.34.

Population sets with desired FST values were chosen

randomly from the experimental data using a dual-step

selection scheme implemented in the statistical pack-

age R. Nine populations with experimental Nm val-

ues were conditioned to give the desired aggregate FST

value. Simulations were run as described above. The

FST value of the experimental data (= 0.34) did not

differ significantly from any of the simulated FST dis-

tributions (Figure 4), even when founding populations

were given only minor levels of subdivision (e.g. initial

FST = 0.05; final FST = 0.22, 95% Confidence Inter-

val [0.095-0.38]). The extent of modern subdivision is

therefore consistent with all of these models. It seems

C© 2006 The AuthorJournal compilation C© 2006 University College London

Annals of Human Genetics (2006) 71,390–406 399

M. Cox

probable that the founding populations of Vanuatu bore

at least low levels of genetic heterogeneity, but consis-

tent with the extreme effects of variance components,

the true extent of ancestral subdivision in Vanuatu can

no longer be reconstructed from the information con-

tent of this dataset.

Discussion

Dissecting the processes behind broad-scale genetic

patterns is currently a major focus in molecular anthro-

pology (e.g. Pinhasi et al. 2005; Thomas et al. 2006).

However, little research has been directed towards re-

constructing the unique population dynamics of small

human communities, even though small populations

are the currency of large demographic processes. The

genetics of small groups are often treated in a similar

fashion to the genetics of large groups, but the role of

variance factors differs markedly between the two. Pop-

ulation genetic studies in many parts of the world,

particularly the Indo-Pacific region, involve limited

sampling from very small human groups (Merriwether

et al. 1999; Cox & Lahr 2006; Friedlaender, 2007). In

this setting, two major variance factors strongly affect

the genetic record: stochastic genetic drift and sampling

error. Studies with significant sampling coverage (e.g.,

for Bali, Indonesia; Karafet et al. 2005) may be less af-

fected by sampling error, just as studies of large popu-

lations (e.g., for Japanese groups; Hammer et al. 2005)

may be less affected by genetic drift. However, vari-

ance from diachronic genetic drift and sampling error is

probably a significant bias for historical reconstructions

of many regions of the world, and are better incor-

porated explicitly into population genetic analyses. In

situations involving small effective population sizes, sig-

nificant population subdivision or suboptimal sampling,

robust reconstructions of prehistory require more than

a synchronic comparison of modern genetic diversity.

Genetic data from archaeological remains, sampled

consistently through time, is one theoretical approach

towards this problem (e.g. Hagelberg & Clegg 1993).

However, regular sampling of ancient DNA is unreal-

istic for most study regions. In practice, ancient DNA

data points are usually sparse, irregularly spaced through

time, unavailable for all geographical regions of inter-

est, extremely expensive, and often restricted solely to

analyses of mitochondrial DNA (but see Pearson, 2006).

Furthermore, the problems of drift and sampling vari-

ance apply equally to all sampling schemes, modern and

archaeological. Although ancient DNA approaches may

be useful in certain circumstances (e.g. Shapiro et al.

2004; Chan et al. 2006), less direct theoretical methods

will generally be necessary to reconstruct recent histor-

ical demography. Deme-based Monte Carlo simulation

represents one viable alternative.

Here, I have illustrated how genetic simulation can in-

corporate diachronic and contemporary variance com-

ponents into standard human population genetic analy-

ses. Minimum confidence limits are subsequently placed

on the magnitude of genetic drift and sampling error ef-

fects for experimentally derived datasets. The software

employed here was developed solely for this aim. Re-

lated programmes, such as PopG (Felsenstein, 2005) and

Populus (Alstad, 2003) are commonly employed for di-

dactic purposes, but cannot readily be conditioned on

real data and have limited utility as inference tools. The

R statistical package (R Project, 2006) has also proved

useful for studies of variance (Thomas et al. 2006), but

requires its user to have relatively sophisticated cod-

ing skills. Consequently, while most genetic studies ac-

knowledge the confounding effects of drift and sampling

error on observed genetic data (often as a one-line state-

ment), the paucity of ready tools leads few to try placing

quantitative bounds on their effects.

The experimental Vanuatu dataset considered here

for illustration contains only a limited number of

Y-chromosome haplogroups (C-M130, K∗-M9, K5-

M230, M-M106, O3-M122, R-M207), but lineage fre-

quencies vary widely even among populations in very

close geographic proximity (Figure 1). The FST value for

the entire Vanuatu dataset implies substantial population

subdivision, but this seems at odds with the archipelago’s

recent settlement. The discrepancy is probably a joint

outcome of three factors: firstly, inaccurate estimates of

lineage frequencies due to sampling error; secondly, di-

vergence from ancestral lineage frequencies over time

through genetic drift; and thirdly, a complex underlying

demographic history. Synchronic comparisons of con-

temporary data cannot disentangle these effects. How-

ever, analysis methods with a diachronic perspective can

help infer the relative contributions of each factor to the

observed modern data, and therefore increase a dataset’s

interpretative power.

400 Annals of Human Genetics (2006) 71,390–406 C© 2006 The AuthorJournal compilation C© 2006 University College London

Diachronic Variance in Small Populations

For example, simulation analyses on the Vanuatu

dataset reinforce two key points. Firstly, the inac-

curacy of observed allele frequencies is greatest for

small sample sizes and intermediate allele frequencies

(Figure 2). Secondly, observed allele frequencies deviate

more from ancestral allele frequencies for communities

with smaller effective population sizes (Figure 3). Given

the aim of reconstructing ancestral information about

a dataset, genetic profiles should be weighted towards

observed genetic profiles from populations with large

effective sizes (e.g. Paama, Malekula, Tanna) rather than

smaller habitation groups (e.g. Futuna, Aniwa, Lamenu).

In fact, by studying communities with different effec-

tive population sizes, it should theoretically be possi-

ble to query a region’s history at different stratigraphic

time depths. Importantly, however, reconstruction of

these variance components requires knowledge of three

variables: the effective population size (Ne), the sample

size (Ns), and the population’s allele frequencies (pa).

The latter two can be determined directly from genetic

data, but effective population sizes are harder to infer

from many datasets. (For instance, they cannot be esti-

mated with any accuracy from the Y-SNP data exam-

ined here). Therefore, census sizes, genealogical history

and other cultural information will prove necessary to

answer many genetic questions, even though this evi-

dence is not usually collated by geneticists. In practice,

surveying communities about parameters of their de-

mographic history are as central to population genetics

analyses as are sample collections.

Assuming such information is available, inference of

confidence bounds for these variance components pro-

vides one approach to more accurately reconstruct a

community’s genetic prehistory. For instance, here we

asked whether founding populations in Vanuatu were

genetically homogeneous or heterogeneous. The for-

mer scenario would fit a model in which Asian and

Melanesian individuals were already substantially ad-

mixed further north in Melanesia (Hill & Serjeantson

1989; Green, 1991, 2003), and the founding popula-

tions in Vanuatu were therefore approaching genetic

homogeneity. The latter scenario would fit a model in

which Asian and Melanesian populations in northern

Melanesia were only partially admixed (or not admixed

at all), and the groups founding Vanuatu were genetically

heterogeneous. This question cannot be answered ar-

chaeologically, or genetically using standard synchronic

analysis methods. Genetic methods that account for

variance components therefore seem obligatory. Within

the limitations of the demographic model, the first Van-

uatu populations were probably not genetically homo-

geneous, although even low levels of ancestral demic

heterogeneity would have been sufficient to produce the

population subdivision observed today. Figure 4 demon-

strates the extreme variance that can be generated even

over short timeframes. One repercussion that is not im-

mediately obvious from standard synchronic analysis is

that genetic studies on human remains from early ar-

chaeological sites (for instance, at Teouma, Efate; Bed-

ford et al. 2006) may well identify a near contemporary

range of Y-chromosome lineages. However, studies of

different archaeological sites from the same time period

should harbor these lineages at quite different frequen-

cies, and the results from one site are unlikely to directly

reflect the genetic profiles of surrounding regions, or

flanking time periods for the site locality. It follows that

the interpretative utility of ancient DNA studies on small

ancestral populations will probably be limited.

Simulation results also suggest that composite allele

frequencies are not well suited to reconstructions of

historic demography. Data from multiple small sampling

locations are often combined to form a larger sample for

analysis purposes. For instance, a Y-chromosome sample

commonly used to represent coastal Papua New Guinea

is really a composite assemblage (Kayser et al. 2003);

it represents individuals from several sampling locations

along the northern and southern coasts of New Guinea

(Stoneking et al. 1990; Kayser et al. 2000). Such struc-

tured datasets have biased allele frequencies and higher

levels of uncertainty than would be expected given their

nominal effective population and sample sizes (Vanuatu;

Figure 3). Corrected levels of uncertainty can be in-

ferred by weighting the uncertainty estimates of individ-

ual demes by their effective population sizes, but this re-

quires knowledge of the composite sample’s constituent

parts (unavailable in this coastal Papua New Guinea

case). All these findings suggest that additional research

on the meta-analysis of population samples with differ-

ent collection schemes, sample and effective population

sizes, and historical demographies are necessary prereq-

uisites for more sophisticated historical reconstructions

from single-locus, particularly haploid, genetic data.

C© 2006 The AuthorJournal compilation C© 2006 University College London

Annals of Human Genetics (2006) 71,390–406 401

M. Cox

Even though the sampling coverage in Vanuatu – 235

individuals out of a census population of 184,000 (or

0.1% of the national total) – is proportionally one of

the best in the region, the uncertainty bounds of ob-

served lineage frequencies are still large for most com-

munities. Simple variance processes, such as genetic drift

and sampling error, have obviously had a large impact

on genetic profiles in modern Vanuatu, which severely

limits the interpretative power of historical construc-

tions made from these particular data. By extension,

the extreme geographical structuring of mtDNA and

Y-chromosome diversity observed in other extant small

communities from Island Melanesia (e.g. Friedlaender

et al. 2005; Scheinfeldt et al. 2006) is often explained

as a relict patchwork of ancient population substruc-

ture. However, although these lineages are undoubtedly

old and have clear regional specificities, their current

spatial arrangement on the landscape far more likely re-

sults from recent demographic events and extreme lev-

els of drift acting in small human groups (cf. Figure 3).

Such strong population structuring is perhaps better in-

terpreted as being very young, which parsimoniously

circumvents many of the discrepancies that otherwise

appear to arise between arguments for ancient struc-

ture and the archaeological and linguistic records (e.g.

Wilder & Hammer 2007). Such interpretive problems

probably affect many human population genetic studies

where sampling sizes are limited and population groups

subdivided, including those from continental regions.

Although the approach used here improves on how

we view and interpret contemporary genetic data, it still

has important limitations. For instance, these simulation

analyses implement an equilibrium model, a represen-

tation of human demography that currently dominates

analytical approaches to human genetic history (Hey

& Machado 2003). However, equilibrium models force

unrealistic constraints on the data (e.g. depending on the

model: no inter-deme migration, no explicit geographi-

cal structure, population size stationary relative to time),

all of which probably bias historical inferences. Such

assumptions may never hold strictly time for human

populations. For instance, migrants move between all

human groups. Conversely, migration cannot be mod-

elled unless estimates of migration rates through time are

available, often from non-genetic sources. Furthermore,

human mobility is probably not linear with time. Migra-

tion may have peaked twice in Vanuatu; once soon after

the founding populations initially diverged while kin

relationships were still strong (Fix, 2004), and again in

the modern period when novel modes of transport and a

common lingua franca (Bislama) facilitated greater inter-

island mobility (see Friedlaender et al. 2005 for interpre-

tations on changing island migration rates). Other more

exceptional factors may also have influenced the popu-

lations under study here, possibly including: fluctuations

in population size (e.g. bottlenecks resulting from intro-

duced diseases and indentured labor in the immediate

post-European contact period; Buxton, 1926; Wawn,

1973; Spriggs, 1997, p. 261; Cox, 2003, p. 140); in-

ternal extinctions and the resettlement of communities

(e.g. following volcanic eruptions or endemic warfare;

Spriggs, 1997, p. 178 ff.); post-founding immigration

into the Vanuatu archipelago from the immediate north

(e.g. from the Solomons or Bismarcks; Cox, 2003, p. 140

ff.); back migrations from Polynesia (e.g. as evidenced by

the O3 lineage’s restricted distribution in the eastern-

most islands of Vanuatu; Figure 1; Spriggs, 1997, p. 187

ff.; Cox, 2003, p. 138 ff.); and extreme variance in male

reproductive rates (e.g. a biological ‘Big Man’ effect;

Eldon & Wakeley 2006). There is not sufficient infor-

mation on these factors in Vanuatu to incorporate them

into simulation models, although they do lend additional

uncertainty to the analyses undertaken here. Neverthe-

less, models are by necessity relatively simple; the real

world is more complex – and decidedly more human.

These simulations were designed solely to capture the

main effects of major variance factors (at which task

they succeed), but the confidence bounds given here

represent only conservative estimates of the stochastic

variation in these populations over time. Still, even the

magnitudes of these major variance components, which

are frequently ignored in historical reconstructions from

genetic data, should by themselves be sufficient cause for

concern.

One approach towards improving reconstructions of

human demography is to adopt non-equilibrium model

systems. Significant progress has also been made on

more sophisticated inference tools, including impor-

tance sampling (Griffiths & Tavare 1995; Stephens &

Donnelly 2000), Markov chain Monte Carlo (Kuhner

et al. 1995; Wilson & Balding 1998), and Bayesian tech-

niques (Beaumont et al. 2002; Beaumont, 2004). These

402 Annals of Human Genetics (2006) 71,390–406 C© 2006 The AuthorJournal compilation C© 2006 University College London

Diachronic Variance in Small Populations

are increasingly being used to address historical genetic

questions (Hey, 2005). However, the price of these im-

proved models is their large parameter space, and the

substantial information content of genetic data required

to implement them. Single loci with a common ge-

nealogical history (such as the NRY or mitochondrial

DNA) cannot meet this requirement, and if future hu-

man population studies wish to reconstruct human his-

tory with greater accuracy than represented here, these

traditional mainstays will have to be abandoned for au-

tosomal loci with unlinked (and therefore statistically

independent) genetic histories.

Regardless, the Monte Carlo simulation approach

employed here provides an intermediary step, through

which additional information about patterns of human

demography over time can be extracted from current

single-locus studies. Simulation approaches can never

infer exact values for demographic parameters (such as

absolute allele frequencies for a given location at a given

moment in prehistory), but they do allow conservative

confidence limits to be placed on the variance compo-

nents of genetic datasets. This exposes the non-trivial

effects of stochastic variance that have influenced the

population genetic profiles we observe today, which in

turn permits historical inferences from a more quanti-

tative, as opposed to qualitative, analytical basis.

Acknowledgements

I wish to thank the individuals and communities from Vanu-

atu whose participation made this study possible, and I of-

fer in return this partial reconstruction of their extraordi-

nary prehistory. I also extend my appreciation to D. Bowden

(Monash University, Australia) and M. Ganczakowski (Queen

Alexandra Hospital, U.K.) for kindly providing access to DNA

samples; F. Mendez (University of Arizona, U.S.A.) for help-

ful discussion; and E. Hagelberg (University of Oslo, Nor-

way) for research support. This work was partially supported

by grants from the Foundation for Research, Science and

Technology (New Zealand), Thor Heyerdahl and the Kon-

Tiki Museum (Norway), the University of Otago Research

Committee (New Zealand); and by the patronage of the

University of Otago (New Zealand) and the University of

Oslo (Norway).

References

Alstad, D. (2003) Populus: Simulations of population biology.

URL: http://www.cbs.umn.edu/software/populus.html.

Beaumont, M. A., Zhang, W. & Balding, D. J. (2002) Approxi-

mate Bayesian computation in population genetics. Genetics

162, 2025–2035.

Beaumont, M. A. (2004) Recent developments in genetic data

analysis: what can they tell us about human demographic

history? Heredity 92, 365–379.

Bedford, S., Spriggs, M. & Wilson, M. (1998) The Australian

National University–National Museum of Vanuatu Archae-

ological Project: a preliminary report on the establishment

of cultural sequences and rock art research. Asian Perspect

37, 165–193.

Bedford, S. (2003) NZAA 2003 Conference Abstract – From

gaps to galore: Lapita sites in Northern Vanuatu. Archaeology

in New Zealand 46, 62–63.

Bedford, S., Spriggs, M. & Regenvanu, R. (2006) Lapita pots

along with the people: preliminary archaeological investi-

gations at the Teouma Lapita site, Efate, Central Vanuatu.

Antiquity In Press.

Buxton, P. A. (1926) The depopulation of the New Hebrides

and other parts of Melanesia. T Roy Soc Trop Med H 19,

420–454.

Byers, J. A. (2005) Surface distance between two points

of latitude and longitude. URL: http://www.wcrl.

ars.usda.gov/cec/java/lat-long.htm.

Cann, R. L. & Lum, J. K. (2004) Dispersal ghosts in Oceania.

Am J Hum Biol 16, 440–451.

Capelli, C., Wilson, J. F., Richards, M., Stumpf, M. P. H., Gra-

trix, F., Oppenheimer, S., Underhill, P., Pascali, V. L., Ko,

T.-M. & Goldstein, D. B. (2001) A predominantly indige-

nous paternal heritage of the Austronesian-speaking peoples

of Insular Southeast Asia and Oceania. Am J Hum Genet 68,

432–443.

Cavalli-Sforza, L. L., Menozzi, P. & Piazza, A. (1994) The

history and geography of human genes. Princeton: Princeton

University Press.

Chan, Y. L., Anderson, C. N. K. & Hadly, E. A. (2006)

Bayesian estimation of the timing and severity of a pop-

ulation bottleneck from ancient DNA. PLoS Genet 2, e59.

Cox, M. P. (2003) Genetic Patterning at Austronesian Contact

Zones. Ph.D. thesis, University of Otago, Dunedin, New

Zealand.

Cox, M. P. (2005a) Drifter: Simulation software for modelling

demography in small Fisher-Wright populations. URL:

http://www.u.arizona.edu/∼mpcox/.

Cox, M. P. (2005b) Indonesian mitochondrial DNA and

its opposition to a Pleistocene era origin of proto-

Polynesians in Island Southeast Asia. Hum. Biol. 77, 179–

188.

Cox, M. P. (2006) Minimal hierarchical analysis of global

human Y-chromosome SNP diversity by PCR-RFLP.

Anthropol Sci 114, 69–74.

Cox, M. P. & Lahr, M. M. (2006) Y-Chromosome

diversity is inversely associated with language affiliation

C© 2006 The AuthorJournal compilation C© 2006 University College London

Annals of Human Genetics (2006) 71,390–406 403

M. Cox

in paired Austronesian- and Papuan-speaking communities

from Solomon Islands. Am J Hum Biol 18, 35–50.

Eldon, B. & Wakeley, J. (2006) Coalescent processes when

the distribution of offspring number among individuals is

highly skewed. Genetics 172, 2621–2633.

Felsenstein, J. (2005) PopG genetic simulation program. URL:

ftp://evolution.gs.washington.edu/pub/popgen/popg.html.

Fenner, J. N. (2005) Cross-cultural estimation of the hu-

man generation interval for use in genetics-based pop-

ulation divergence studies. Am J Phys Anthropol 128,

415–423.

Fix, A. G. (2004) Kin-structured migration: causes and con-

sequences. Am J Hum Biol 16, 387–394.

Forster, P., Torroni, A., Renfrew, C. & Rohl, A. (2001)

Phylogenetic star contraction applied to Asian and Papuan

mtDNA evolution. Mol Biol Evol 18, 1864–1881.

Friedlaender, J., Gentz, F., Friedlaender, F., Kaestle, F., Schurr,

T., Koki, G., Schanfield, M., McDonough, J., Smith, L.,

Cerchio, S., Mgone, C. & Merriwether, D. A. (2005) Mi-

tochondrial genetic diversity and its determinants in Island

Melanesia. In: Papuan Pasts: Cultural, Linguistic and Biologi-

cal Histories of Papuan-Speaking Peoples. Eds. Pawley, A., At-

tenborough, R., Golson, J., & Hide, R. Canberra: Pacific

Linguistics, pp. 693–716.

Friedlaender, J. S. (1975) The Demography, Genetics, and Phenet-

ics of Bougainville Islanders. Cambridge: Harvard University

Press.

Friedlaender, J. S., Gentz, F., Green, K. & Merriwether, D.

A. (2002) A cautionary tale on ancient migration detec-

tion: mitochondrial DNA variation in Santa Cruz Islands,

Solomon Islands. Hum Biol 74, 453–471.

Friedlaender, J. S. (Ed.). (2007). Population Genetics, Linguis-

tics, and Culture History in the Southwest Pacific: A Synthesis.

Oxford: Oxford University Press, In Press.

Gordon, R. G., Jr. (2005) Ethnologue: Languages of the World.

Dallas: SIL International.

Goudet, J. (2005) Hierfstat, a package for R to compute and

test hierarchical F-statistics. Mol Ecol Notes 5, 184–186.

Green, R. C. (1991) The Lapita Cultural Complex: Cur-

rent evidence and proposed models. In: Indo-Pacific Prehis-

tory 1990. Ed. Bellwood, P. S. Canberra and Jakarta: IPPA

and Asosiasi Prehistorisi Indonesia pp. 295–305.

Green, R. C. (2003) The Lapita Horizon and tradition – Sig-

nature for one set of oceanic migrations. In: Pacific Archaeol-

ogy: Assessments and Anniversary of the First Lapita Excavation

(July 1952) Kone, Noumea, 2002. Ed. Sand, C. Noumea:Le

Cahiers de l’Archeologie en Nouvelle-Caledonie, pp. 95–

120.

Griffiths, R. C. & Tavare, S. (1995) Unrooted genealogical

tree probabilities in the infinitely-many-sites model. Math

Biosci 127, 77–98.

Hagelberg, E. & Clegg, J. B. (1993) Genetic polymorphisms

in prehistoric Pacific Islanders determined by analysis of

ancient bone DNA. Proc R Soc Lond, B, Biol Sci 252, 163–

170.

Hammer, M., Karafet, T., Park, H., Omoto, K., Harihara,

S., Stoneking, M. & Horai, S. (2005) Dual origins of the

Japanese: common ground for hunter-gatherer and farmer

Y chromosomes. J Hum Genet 51, 47–58.

Hammer, M. F., Karafet, T. M., Redd, A. J., Jarjanazi,

H., Santachiara-Benerecetti, S., Soodyall, H. & Zegura,

S. L. (2001) Hierarchical patterns of global human Y-

chromosome diversity. Mol Biol Evol 18, 1189–1203.

Hedrick, P. (2005) Large variance in reproductive success and

the Ne/N ratio. Evolution 59, 1596–1599.

Hey, J. & Machado, C. A. (2003) The study of structured

populations–new hope for a difficult and divided science.

Nat Rev Genet 4, 535–543.

Hey, J. (2005) On the number of New World founders: a

population genetic portrait of the peopling of the Americas.

PLoS Biol 3, e193.

Hill, A. V. S. & Serjeantson, S. (1989) The colonization of the

Pacific: a genetic trail. Oxford: Oxford University Press.

Hudson, R. R. (2002) Generating samples under a Wright-

Fisher neutral model of genetic variation. Bioinformatics 18,

337–338.

Hurles, M. E., Nicholson, J., Bosch, E., Renfrew, C., Sykes,

B. C. & Jobling, M. A. (2002) Y chromosomal evidence

for the origins of Oceanic-speaking peoples. Genetics 160,

289–303.

Hurles, M. E., Matisoo-Smith, E., Gray, R. D. & Penny, D.

(2003) Untangling Oceanic settlement: The edge of the

knowable. Trends Ecol Evol 18, 531–540.

Ingman, M. & Gyllensten, U. (2003) Mitochondrial genome

variation and evolutionary history of Australian and New

Guinean Aborigines. Genome Res 13, 1600–1606.

Jobling, M. A. & Tyler-Smith, C. (2003) The human Y chro-

mosome: An evolutionary marker comes of age. Nature Rev

Genet 4, 598–612.

Karafet, T. M., Lansing, J. S., Redd, A. J., Reznikova, S.,

Watkins, J. C., Surata, S. P., Arthawiguna, W. A., Mayer, L.,

Bamshad, M., Jorde, L. B. & Hammer, M. F. (2005) Balinese

Y-chromosome perspective on the peopling of Indonesia:

genetic contributions from pre-Neolithic hunter-gatherers,

Austronesian farmers, and Indian traders. Hum Biol 77, 93–

114.

Kayser, M., Brauer, S., Weiss, G., Underhill, P. A., Roewer,

L., Schiefenhovel, W. & Stoneking, M. (2000) Melanesian

origin of Polynesian Y chromosomes. Curr Biol 10, 1237–

1246.

Kayser, M., Brauer, S., Weiss, G., Schiefenhovel, W., Under-

hill, P. A. & Stoneking, M. (2001a) Independent histories

of human Y chromosomes from Melanesia and Australia.

Am J Hum Genet 68, 173–190.

Kayser, M., Brauer, S., Weiss, G., Underhill, P. A., Roewer,

L., Schiefenhovel, W. & Stoneking, M. (2001b) Correction:

404 Annals of Human Genetics (2006) 71,390–406 C© 2006 The AuthorJournal compilation C© 2006 University College London

Diachronic Variance in Small Populations

Melanesian origin of Polynesian Y chromosomes. Curr Biol

11, 141–142.

Kayser, M., Brauer, S., Weiss, G., Schiefenhovel, W., Un-

derhill, P., Shen, P., Oefner, P., Tommaseo-Ponzetta, M.

& Stoneking, M. (2003) Reduced Y-chromosome, but not

mitochondrial DNA, diversity in human populations from

West New Guinea. Am J Hum Genet 72, 281–302.

Kuhner, M. K., Yamato, J. & Felsenstein, J. (1995) Estimat-

ing effective population size and mutation rate from se-

quence data using Metropolis-Hastings sampling. Genetics

140, 1421–1430.

Long, J. C., Naidu, J. M., Mohrenweiser, H. W., Gershowitz,

H., Johnson, P. L., Wood, J. W. & Smouse, P. E. (1986)

Genetic characterization of Gainj- and Kalam-speaking

peoples of Papua New Guinea. Am J Phys Anthropol 70,

75–96.

Lum, J. K., Jorde, L. B. & Schiefenhovel, W. (2002) Affinities

among Melanesians, Micronesians, and Polynesians: a neu-

tral biparental genetic perspective. Hum Biol 74, 413–430.

Mantel, N. (1967) The detection of disease clustering and a

generalized regression approach. Cancer Res 27, 209–220.

Merriwether, D. A., Friedlaender, J. S., Mediavilla, J., Mgone,

C., Gentz, F. & Ferrell, R. E. (1999) Mitochondrial DNA

variation is an indicator of Austronesian influence in Island

Melanesia. Am J Phys Anthropol 110, 243–270.

Murray-McIntosh, R. P., Scrimshaw, B. J., Hatfield, P. J. &

Penny, D. (1998) Testing migration patterns and estimat-

ing founding population size in Polynesia by using human

mtDNA sequences. Proc Natl Acad Sci USA 95, 9047–9052.

National Statistics Office. (2000) The 1999 Vanuatu National

Population and Housing Census. Port Vila: National Statistics

Office.

Pearson, H. (2006) Neanderthal DNA yields to genome foray.

Nature 441, 260–261.

Pinhasi, R., Fort, J. & Ammerman, A. J. (2005) Tracing the

origin and spread of agriculture in Europe. PLoS Biol 3,

e410.

R Project. (2006) The R project for statistical computing.

URL: http://www.r-project.org/.

Rivers, W. H. R. (1914) The history of Melanesian society. Cam-

bridge: Cambridge University Press.

Rodman, M. (1979) Following peace: Indigenous pacification

of a northern New Hebridean society. In: The Pacification

of Melanesia. Ann Arbor: University of Michigan Press.

Rousset, F. (1997) Genetic differentiation and estimation of

gene flow from F-statistics under isolation by distance. Ge-

netics 145, 1219–1228.

Scheinfeldt, L., Friedlaender, F. R., Friedlaender, J., Latham,

K., Koki, G., Karafet, T., Hammer, M. & Lorentz, J. (2006)

Unexpected NRY chromosome variation in Northern Is-

land Melanesia. Mol Biol Evol 23, 1628–1641.

Scheinfeldt, L., Friedlaender, F. R., Friedlaender, J., Latham,

K., Koki, G., Karafet, T., Hammer, M. & Lorentz, J.

(2006) Unexpected Y chromosome variation in Northern

Island Melanesia. Mol Biol Evol 23, 1628–1641.

Shapiro, B., Drummond, A. J., Rambaut, A., Wilson, M. C.,

Matheus, P. E., Sher, A. V., Pybus, O. G., Gilbert, M. T.

P., Barnes, I., Binladen, J., Willerslev, E., Hansen, A. J.,

Baryshnikov, G. F., Burns, J. A., Davydov, S., Driver, J. C.,

Froese, D. G., Harington, C. R., Keddie, G., Kosintsev, P.,

Kunz, M. L., Martin, L. D., Stephenson, R. O., Storer, J.,

Tedford, R., Zimov, S. & Cooper, A. (2004) Rise and fall

of the Beringian Steppe Bison. Science 306, 1561–1565.

Slatkin, M. (1985) Gene flow in natural populations. Ann Rev

Ecol Syst 16, 393–430.

Spriggs, M. (1997) The Island Melanesians. Oxford: Blackwell

Publishers.

Stephens, M. & Donnelly, P. (2000) Inference in molec-

ular population genetics. J R Statist Soc B 62, 605–

655.

Stoneking, M., Jorde, L. B., Bhatia, K. & Wilson, A. C. (1990)

Geographic variation in human mitochondrial DNA from

Papua New Guinea. Genetics 124, 717–733.

Storz, J. F., Ramakrishnan, U. & Alberts, S. C. (2001) Deter-

minants of effective population size for loci with different

modes of inheritance. J Hered 92, 497–502.

Thomas, M. G., Stumpf, M. P. H. & Harke, H. (2006) Evi-

dence for an apartheid-like social structure in early Anglo-

Saxon England. Proc R Soc B 273, 2651–2657.

Tommaseo-Ponzetta, M., Attimonelli, M., Roberts, M. D.,

Tanzariello, F. & Saccone, C. (2002) Mitochondrial DNA

variability of West New Guinea populations. Am J Phys

Anthropol 117, 49–67.

Trejaut, J. A., Kivisild, T., Loo, J. H., Lee, C. L., He, C. L.,

Hsu, C. J. & Li, Z. Y. (2005) Traces of archaic mitochon-

drial lineages persist in Austronesian-speaking Formosan

populations. PLoS Biology 3, e247.

Underhill, P. A., Passarino, G., Lin, A. A., Marzuki, S.,

Oefner, P. J., Cavalli-Sforza, L. L. & Chambers, G. K.

(2001) Maori origins, Y-chromosome haplotypes and im-

plications for human history in the Pacific. Hum Mutat 17,

271–280.

Wawn, W. T. (1973) The South Sea Islanders and the Queensland

labour trade. Canberra: Australian National University Press.

Whitlock, M. C. & McCauley, D. E. (1999) Indirect measures

of gene flow and migration: FST = 1/(4Nm + 1). Heredity

82, 117–125.

Whyte, A. L., Marshall, S. J. & Chambers, G. K. (2005) Hu-

man evolution in Polynesia. Hum Biol 77, 157–177.

Wilder, J. A. & Hammer, M. F. (2007) Extraordinary popula-

tion structure among the Baining of New Britain. In: Popu-

lation Genetics, Linguistics, and Culture History in the Southwest

Pacific: A Synthesis. Ed. Friedlaender, J. S. Oxford: Oxford

University Press. In Press.

Williams, T. N., Maitland, K., Bennett, S., Ganczakowski,

M., Peto, T. E. A., Newbold, C. I., Bowden, D. K.,

C© 2006 The AuthorJournal compilation C© 2006 University College London

Annals of Human Genetics (2006) 71,390–406 405

M. Cox

Weatherall, D. J. & Clegg, J. B. (1996) High inci-

dence of malaria in α-thalassaemic children. Nature 383,

522–525.

Wilson, I. A. & Balding, D. J. (1998) Genealogical inference

from microsatellite data. Genetics 150, 499–510.

Wright, S. (1931) Evolution in Mendelian populations. Ge-

netics 16, 97–159.

Wright, S. (1951) The genetical structure of populations. Ann

Eugenics 15, 323–354.

Y Chromosome Consortium. (2002) A nomenclature system

for the tree of human Y-chromosomal binary haplogroups.

Genome Res 12, 339–348.

Yang, R. C. (1998) Estimating hierarchical F-statistics. Evolu-

tion 52, 950–956.

Received: 29 May 2006

Accepted: 6 October 2006

406 Annals of Human Genetics (2006) 71,390–406 C© 2006 The AuthorJournal compilation C© 2006 University College London