assignment tests and population geneticsbolker/eeid/evolution... · 1 assignment tests and...
TRANSCRIPT
1
Assignment Testsand
Population Genetics
Michael AntolinDepartment of Biology, Colorado State University
EEID 2012
Goals• Understanding some old-school
population genetics using discrete markers– inferring IBD and ibd
• New School: Assignment Tests– Individuals as sampling units– Multi-locus genotypes– Probability of exclusion– Probability of assignment
2
Assumptions and needs• Polymorphic, codominant discrete markers. >> N
– SSR, SNP, VNTR, etc.• Hardy-Weinberg
– some inbreeding OK, selfing or clonal structure generally not • Loci are independent
– no linkage between markers• Necessary for estimation
– Population sample– Suspects (sampled individuals)– Samples to compare (offspring in paternity tests).
• Necessary technical issues– Accurate genotyping – Well-designed field sampling
Gene pools are structured
• N individuals • Gene pool = 2N• Population
– Local population• Deme
– Family• Density varies• Mating opportunities
vary
3
Population subdivision = population structure
• Limited migration between localities– Limited mating between and within populations (inbreeding)– Genetic drift in small populations
• Wahlund’s Effect– Samples of separate H-W populations treated as one
population will result in an apparent deficiency of heterozygotes
• BUT: gene flow is notoriously difficult to measure – seldom directly observed!– estimated from patterns of allele frequency in populations– spatial patterning, temporal changes
Old-school Wright Model: Gene flow and genetic drift
The island model, with migration rate m equal between all islands, and N equal in all islands.
rate of genetic drift ~ 1/2N per generation
4
) 2 and of terms(ignoring
: spopulationbetween varianceof In terms
set
:equation the tomigration Adding
:spopulation small within inbreeding anddrift geneticfor Recursion
2
141
)1)(12(2)1(
,)1(21
121
211
21
2
2
12
1
1
+≅
−−−−
=
==−⎥⎦
⎤⎢⎣
⎡⎟⎠
⎞⎜⎝
⎛−+=
⎟⎠
⎞⎜⎝
⎛−+=
++
+
NmF
mNNmF
FffmfNN
f
fNN
f
ST
ST
STtttt
tt
mm
Estimation gene flow from Fst
4
1:migrants
ofnumber effective the
,for Solve14
1
ST
ST
ST
FFNm
NmNm
F
−=
+≅
5
Wahlunds’s effect and Fst
• A subdivided population that’s lumped as one will have a deficiency of heterozygotes
• This can be described by the variance in allele frequencies between populations– FST
– The most commonly estimated statistic is the standardized variance in allele frequency between subpopulations,
– Related to f, the inbreeding coefficient
Recall: under inbreeding, expected and observed heterozygosity relate to each other as:
exp
exp
exp )1( 12
HHH
f
fHH – f )(qp H
obs
obs
obs
−=
−=
=
Deviations from expected heterozygosity:Partitions of F-statistics among hierarchical levels
effect s Wahlund'total
yvariabilit population-between
inbreeding population-within
0
0
T
TIT
T
STST
S
SIS
HHHF
HHHF
HHHF
−=
−=
−=
( )( )
spopulationbetween sfrequencie allele of varianceedStandardiz :
:dpartitione )(effect s Wahlund' totalThe
ST
ISSTIT
F
FFF ,111 −−=−ITF
6
Gene pools are structured
• FIT = total Wahlund’seffect (individual to total)
• FIS = inbreeding effect within subpopulations (individual to subpopulation)
• FST = variance between subpopulations (subpopulation to total)
• Other levels can be examined as well
Estimating F-statistics for discrete marker data
• FSTAT: good general program– http://www2.unil.ch/popgen/softwares/fstat.htm
• GDA: allows haploid loci, allows more than 3 levels of estimation, bootstraps of CI around estimates– http://hydrodictyon.eeb.uconn.edu/people/plewis/software.php
• Arlequin: estimates for both discrete markers and nucleotide sequences– http://cmpg.unibe.ch/software/arlequin35/
• R-packages:– adegenet, pegas, ecodist– adegenet depends upon graph, install adegenet using these lines:
• source("http://bioconductor.org/biocLite.R")• biocLite("graph")• install.packages("adegenet", dependencies=TRUE)
7
Assignment Tests: New School
• Identifying the numbers of migrants versus residents in populations– Admixture models
• Individuals as sampling units• Multi-locus genotypes• Probability of exclusion• Probability of assignment
Sources of Error
• Incorrect genotyping (technical issue)• Poor field sampling (technical issue) • Type A error: Incorrectly excluding the
parent (population)– Under assignment: source population as the
origin• Type E error: Failing to exclude individuals
who are not the parents (who are migrants into a population)– Assignment: non-source populations as the
origin
8
Probability of a random match between any two genotypes in a population
allele theoffrequency theis where ii
n
i
n
ijji
n
ii
n
i
n
ijjii
n
ii
Ap
ppp
AAAAM
∑ ∑+∑=
∑ ∑+∑=
= +==
= +==
1 1
24
1
1 1
22
10
)2()(
)()(
Multiply over multiple loci to get full probability, but how do you test these probabilities of matching to assign individuals to populations?
Assignment TestsGeneclass II Cornuet et al. 1999, Genetics 153: 1989-2000
STRUCTURE Pritchard et al. 2000. Genetics 155: 945-959CERVUS Marshall et al. 1998. Mol. Ecol. 7: 639-655
Potential methods:• Gene frequency estimates• Genetic distances (genetic correlations)
– Cavalli-Sforza and Edwards chord distance– Nei’s genetic distances– Goldstein’s SSR size difference (δμ)2
• Bayesian probabilities• Generate distributions of probable genotypes for
each population
9
Likelihoods and Bayesian Probabilities
2
1
2
121,
:data thegiven,hypothesisatestingfor ratio Likelihood
HMPHMP
HDPHDP
DHHL ==
. sfrequencie allele with spopulation from genotypes of sample aFor
)Pr(/PrPrPrPr:iesProbabilit Bayesian
PZM
MZ, PM(P)(Z) MZ, P =
Assignment criteria:Bayesian approach with
likelihoods
10
Correct assignment
depends upon
population subdivision
IAM- infinite alleles SMM – stepwise mutation
Bayesian methods provide the greatest discriminating power!
Type A(circles)
and Type E(squares)
errors
Bayesian minimizes type E error, but has slightly higher type A error!
11
Efficiency: sample size -
number of loci is fixed:
Increasing your sample above ~[10] loci does not help much!
How to determine population structure of prairie dogs?
12
Population genetic analysis(Jen Roach, Lisa Savage, Dan Tripp)
• Fst = 0.12
• Genetic distance related to:Colony age (area), Distance along drainages
• 13 colonies in 1997
• Assignment tests:1/3 of individuals identified as immigrants
Y = 0-2 years, M = 4-6 years, O = 8-10 years
Roach et al. 2001, J. Mammal.
13
Population genetic analysis in 1997, 2000, 2001 demonstrates movement among colonies
Genetic Methods• Genetic markers:
– 454 bp of control region of MITOCHONDRIAL genome (maternally inherited)
– 7 MICROSATELLITE loci (bi-parental inheritance, tandem repeats, highly variable)
• All individuals genotyped at all 8 loci
Microsatellite Gel for CGS14 Allele B = GT18 , C = GT17 , I = GT11
SSCP Gel of Mitochondrial Marker
14
Levels of genetic differentiation (Fst) among six black-tailed prairie dog colonies in 1997, 2000 and five colonies in 2001.
0
0.1
0.2
0.3
0.4
0.5
0.6
1997 2000 2001
Fst
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
1997 2000 2001
Fst
A) microsatellite data B) mitochondrial data.
• Moderate genetic differentiation(mtDNA values expected to be ~ 4x that of microsats)
How much gene flow?• Assignment tests: ~20%
individuals are potential immigrants
Exclude all pops but source
Exclude some pops but not
source
Exclude source but not
other pops
Exclude all pops
sampled
p-value used: 1/sample size
1997 16 51 4 10 p=1/82=0.01219.75% 62.96% 4.94% 12.35%
2000 13 117 6 5 p=1/141=0.0079.22% 82.98% 4.26% 3.55%
2001 16 98 5 5 p=1/124=0.00812.90% 79.03% 4.03% 4.03%
• Exclusion tests: 7-17% individuals immigrants
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
1996 1997 1998 1999 2000 2001 2002
perc
ent o
f ind
ivid
uals
sam
pled
% Correctly Assigned
% Immigrants
15
Who’s dispersing and where?• Isolation by distance• Mantel Test:
Correlation between genetic distance and geographic distanceMales long distance dispersers Females short distance, along drainages
1997 r2 pMicrosat Fst-drainage -0.319 0.062Microsat-euclidean -0.287 0.176Mt Fst-drainage 0.683 0.052Mt Fst-euclidean 0.535 0.041
2000 r2 pMicrosat Fst-drainage 0.027 0.412Microsat-euclidean 0.072 0.381Mt Fst-drainage 0.633 0.031Mt Fst-euclidean 0.519 0.041
2001 r2 pMicrosat Fst-drainage -0.514 0.047Microsat-euclidean -0.598 0.063Mt Fst-drainage 0.733 0.016Mt Fst-euclidean 0.558 0.040