assignment tests and population geneticsbolker/eeid/evolution... · 1 assignment tests and...

15
1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University [email protected] EEID 2012 Goals Understanding some old-school population genetics using discrete markers inferring IBD and ibd New School: Assignment Tests – Individuals as sampling units – Multi-locus genotypes – Probability of exclusion – Probability of assignment

Upload: others

Post on 26-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

1

Assignment Testsand

Population Genetics

Michael AntolinDepartment of Biology, Colorado State University

[email protected]

EEID 2012

Goals• Understanding some old-school

population genetics using discrete markers– inferring IBD and ibd

• New School: Assignment Tests– Individuals as sampling units– Multi-locus genotypes– Probability of exclusion– Probability of assignment

Page 2: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

2

Assumptions and needs• Polymorphic, codominant discrete markers. >> N

– SSR, SNP, VNTR, etc.• Hardy-Weinberg

– some inbreeding OK, selfing or clonal structure generally not • Loci are independent

– no linkage between markers• Necessary for estimation

– Population sample– Suspects (sampled individuals)– Samples to compare (offspring in paternity tests).

• Necessary technical issues– Accurate genotyping – Well-designed field sampling

Gene pools are structured

• N individuals • Gene pool = 2N• Population

– Local population• Deme

– Family• Density varies• Mating opportunities

vary

Page 3: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

3

Population subdivision = population structure

• Limited migration between localities– Limited mating between and within populations (inbreeding)– Genetic drift in small populations

• Wahlund’s Effect– Samples of separate H-W populations treated as one

population will result in an apparent deficiency of heterozygotes

• BUT: gene flow is notoriously difficult to measure – seldom directly observed!– estimated from patterns of allele frequency in populations– spatial patterning, temporal changes

Old-school Wright Model: Gene flow and genetic drift

The island model, with migration rate m equal between all islands, and N equal in all islands.

rate of genetic drift ~ 1/2N per generation

Page 4: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

4

) 2 and of terms(ignoring

: spopulationbetween varianceof In terms

set

:equation the tomigration Adding

:spopulation small within inbreeding anddrift geneticfor Recursion

2

141

)1)(12(2)1(

,)1(21

121

211

21

2

2

12

1

1

+≅

−−−−

=

==−⎥⎦

⎤⎢⎣

⎡⎟⎠

⎞⎜⎝

⎛−+=

⎟⎠

⎞⎜⎝

⎛−+=

++

+

NmF

mNNmF

FffmfNN

f

fNN

f

ST

ST

STtttt

tt

mm

Estimation gene flow from Fst

4

1:migrants

ofnumber effective the

,for Solve14

1

ST

ST

ST

FFNm

NmNm

F

−=

+≅

Page 5: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

5

Wahlunds’s effect and Fst

• A subdivided population that’s lumped as one will have a deficiency of heterozygotes

• This can be described by the variance in allele frequencies between populations– FST

– The most commonly estimated statistic is the standardized variance in allele frequency between subpopulations,

– Related to f, the inbreeding coefficient

Recall: under inbreeding, expected and observed heterozygosity relate to each other as:

exp

exp

exp )1( 12

HHH

f

fHH – f )(qp H

obs

obs

obs

−=

−=

=

Deviations from expected heterozygosity:Partitions of F-statistics among hierarchical levels

effect s Wahlund'total

yvariabilit population-between

inbreeding population-within

0

0

T

TIT

T

STST

S

SIS

HHHF

HHHF

HHHF

−=

−=

−=

( )( )

spopulationbetween sfrequencie allele of varianceedStandardiz :

:dpartitione )(effect s Wahlund' totalThe

ST

ISSTIT

F

FFF ,111 −−=−ITF

Page 6: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

6

Gene pools are structured

• FIT = total Wahlund’seffect (individual to total)

• FIS = inbreeding effect within subpopulations (individual to subpopulation)

• FST = variance between subpopulations (subpopulation to total)

• Other levels can be examined as well

Estimating F-statistics for discrete marker data

• FSTAT: good general program– http://www2.unil.ch/popgen/softwares/fstat.htm

• GDA: allows haploid loci, allows more than 3 levels of estimation, bootstraps of CI around estimates– http://hydrodictyon.eeb.uconn.edu/people/plewis/software.php

• Arlequin: estimates for both discrete markers and nucleotide sequences– http://cmpg.unibe.ch/software/arlequin35/

• R-packages:– adegenet, pegas, ecodist– adegenet depends upon graph, install adegenet using these lines:

• source("http://bioconductor.org/biocLite.R")• biocLite("graph")• install.packages("adegenet", dependencies=TRUE)

Page 7: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

7

Assignment Tests: New School

• Identifying the numbers of migrants versus residents in populations– Admixture models

• Individuals as sampling units• Multi-locus genotypes• Probability of exclusion• Probability of assignment

Sources of Error

• Incorrect genotyping (technical issue)• Poor field sampling (technical issue) • Type A error: Incorrectly excluding the

parent (population)– Under assignment: source population as the

origin• Type E error: Failing to exclude individuals

who are not the parents (who are migrants into a population)– Assignment: non-source populations as the

origin

Page 8: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

8

Probability of a random match between any two genotypes in a population

allele theoffrequency theis where ii

n

i

n

ijji

n

ii

n

i

n

ijjii

n

ii

Ap

ppp

AAAAM

∑ ∑+∑=

∑ ∑+∑=

= +==

= +==

1 1

24

1

1 1

22

10

)2()(

)()(

Multiply over multiple loci to get full probability, but how do you test these probabilities of matching to assign individuals to populations?

Assignment TestsGeneclass II Cornuet et al. 1999, Genetics 153: 1989-2000

STRUCTURE Pritchard et al. 2000. Genetics 155: 945-959CERVUS Marshall et al. 1998. Mol. Ecol. 7: 639-655

Potential methods:• Gene frequency estimates• Genetic distances (genetic correlations)

– Cavalli-Sforza and Edwards chord distance– Nei’s genetic distances– Goldstein’s SSR size difference (δμ)2

• Bayesian probabilities• Generate distributions of probable genotypes for

each population

Page 9: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

9

Likelihoods and Bayesian Probabilities

2

1

2

121,

:data thegiven,hypothesisatestingfor ratio Likelihood

HMPHMP

HDPHDP

DHHL ==

. sfrequencie allele with spopulation from genotypes of sample aFor

)Pr(/PrPrPrPr:iesProbabilit Bayesian

PZM

MZ, PM(P)(Z) MZ, P =

Assignment criteria:Bayesian approach with

likelihoods

Page 10: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

10

Correct assignment

depends upon

population subdivision

IAM- infinite alleles SMM – stepwise mutation

Bayesian methods provide the greatest discriminating power!

Type A(circles)

and Type E(squares)

errors

Bayesian minimizes type E error, but has slightly higher type A error!

Page 11: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

11

Efficiency: sample size -

number of loci is fixed:

Increasing your sample above ~[10] loci does not help much!

How to determine population structure of prairie dogs?

Page 12: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

12

Population genetic analysis(Jen Roach, Lisa Savage, Dan Tripp)

• Fst = 0.12

• Genetic distance related to:Colony age (area), Distance along drainages

• 13 colonies in 1997

• Assignment tests:1/3 of individuals identified as immigrants

Y = 0-2 years, M = 4-6 years, O = 8-10 years

Roach et al. 2001, J. Mammal.

Page 13: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

13

Population genetic analysis in 1997, 2000, 2001 demonstrates movement among colonies

Genetic Methods• Genetic markers:

– 454 bp of control region of MITOCHONDRIAL genome (maternally inherited)

– 7 MICROSATELLITE loci (bi-parental inheritance, tandem repeats, highly variable)

• All individuals genotyped at all 8 loci

Microsatellite Gel for CGS14 Allele B = GT18 , C = GT17 , I = GT11

SSCP Gel of Mitochondrial Marker

Page 14: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

14

Levels of genetic differentiation (Fst) among six black-tailed prairie dog colonies in 1997, 2000 and five colonies in 2001.

0

0.1

0.2

0.3

0.4

0.5

0.6

1997 2000 2001

Fst

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

1997 2000 2001

Fst

A) microsatellite data B) mitochondrial data.

• Moderate genetic differentiation(mtDNA values expected to be ~ 4x that of microsats)

How much gene flow?• Assignment tests: ~20%

individuals are potential immigrants

Exclude all pops but source

Exclude some pops but not

source

Exclude source but not

other pops

Exclude all pops

sampled

p-value used: 1/sample size

1997 16 51 4 10 p=1/82=0.01219.75% 62.96% 4.94% 12.35%

2000 13 117 6 5 p=1/141=0.0079.22% 82.98% 4.26% 3.55%

2001 16 98 5 5 p=1/124=0.00812.90% 79.03% 4.03% 4.03%

• Exclusion tests: 7-17% individuals immigrants

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

1996 1997 1998 1999 2000 2001 2002

perc

ent o

f ind

ivid

uals

sam

pled

% Correctly Assigned

% Immigrants

Page 15: Assignment Tests and Population Geneticsbolker/eeid/evolution... · 1 Assignment Tests and Population Genetics Michael Antolin Department of Biology, Colorado State University michael.antolin@colostate.edu

15

Who’s dispersing and where?• Isolation by distance• Mantel Test:

Correlation between genetic distance and geographic distanceMales long distance dispersers Females short distance, along drainages

1997 r2 pMicrosat Fst-drainage -0.319 0.062Microsat-euclidean -0.287 0.176Mt Fst-drainage 0.683 0.052Mt Fst-euclidean 0.535 0.041

2000 r2 pMicrosat Fst-drainage 0.027 0.412Microsat-euclidean 0.072 0.381Mt Fst-drainage 0.633 0.031Mt Fst-euclidean 0.519 0.041

2001 r2 pMicrosat Fst-drainage -0.514 0.047Microsat-euclidean -0.598 0.063Mt Fst-drainage 0.733 0.016Mt Fst-euclidean 0.558 0.040