evolutionary genetics: part 5 coalescent simulations · 2013-01-07 · population genetics: 4...

26
Evolutionary Genetics: Part 5 Coalescent simulations S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG Populationsgenetik

Upload: others

Post on 27-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Evolutionary Genetics: Part 5

Coalescent simulations

S. peruvianum

S. chilense

Winter Semester 2012-2013

Prof Aurélien TellierFG Populationsgenetik

Page 2: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Color code

Color code:

Red = Important result or definition

Purple: exercise to do

Green: some bits of maths

Page 3: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Population genetics: 4 evolutionary forces

random genomic processes

(mutation, duplication, recombination, gene conversion)

natural

selection

random demographic

process (drift)

random spatial

process (migration)

molecular diversity

Page 4: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulating sequence data

Page 5: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

How to simulate?

Page 6: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

How to simulate?

Page 7: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

How to simulate?

Algorithm to generate sequence data

� Put k+n where n is the sample size

� Choose an exponential variable with parameter k(k-1+θ)/2

� With probability:

� (k-1)/(k-1+θ) the event is a coalescent event

� And with probability θ/(k-1+θ) the event is a mutation

� If a coalescent event occurs choose a pair of lineages to coalesce, k becomes then

k-1

� If a mutation event occurs, choose a lineage to mutate, k is unchanged

� Repeat all this until k=1

Page 8: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulations 1

What is θ ?????

Page 9: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulations 1

Do you see the same numbers? WHY?

Page 10: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulations 1

Page 11: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulations 1

pdf(file=‘‘constant_tree.pdf‘‘)

Dev.off()

4 –t 5 –T > treefile.tre

Page 12: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulations 1: neutral and constant size

Page 13: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulations 2: neutral and expansion

t1 = 0.5 = time at which the expansion starts in the past

x = 0.1 = the population in the past is 0.1*N0

Present population size = N0

Ancestral population size = x*N0

Time t1 of expansionIn 4N0 generations

Do you see a problem ??? What is N0 ???

Page 14: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulations 2: neutral and expansion

-eN 0.5 0.1

0.5 = time at which the expansion starts in the past

0.1 = the population in the past is 0.1*N0

-eN 0.05 0.1 – T > expansion.tre

4

4

Page 15: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulations 2: trees of expansion

pdf(file=‘‘expansion-tree.pdf‘‘)

Dev.off()

expansion.tre

Page 16: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulations 3: crash or bottleneck?

For a crash:

./ms 10 4 –t 5 -eN 0.5 5

Present population size = N0

Ancestral population size = x*N0

Time t1 of expansionIn 4N0 generations

Page 17: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulations 3: crash or bottleneck?

For a bottleneck:

./ms 10 4 –t 5 -eN 0.5 0.25 -eN 0.75 2

Present population size = N0

Ancestral population size = x2*N0

Time t1

Time t2

Bottleneck population size = x1*N0

t1 x1t2 x2

Page 18: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Simulations 2: trees of expansion

Page 19: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Exercise

Page 20: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Summarize the ms output

Page 21: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Exercise

Page 22: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Exercise

Then save the output in a file:

> test1.out

Page 23: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Exercise

Now using R

Load the file:

test <- read.table(“test1.out“,header=FALSE)

Then draw graphs:

pdf(file=‘‘summary_neutral_constant.pdf‘‘)

hist(test[,2],main=“Theta_Pi Tajima“)

hist(test[,4],main=“Theta_Watterson“)

hist(test[,6],main=“Tajima D“)

Dev.off()

Then do the same for an expansion, decline or bottleneck

Page 24: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Exercise

Page 25: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Final simulations

� Using msmsplay on your computer

� Command line is similar

� Can see directly the site Frequency-Spectrum

� Can you compare the site frequency spectrum with values of Tajima‘s D ?

� Lets simulate neutral model, expansion, decline

� What differences we see?

Page 26: Evolutionary Genetics: Part 5 Coalescent simulations · 2013-01-07 · Population genetics: 4 evolutionary forces random genomic processes (mutation, duplication, recombination, gene

Some data analysis

� Use datasets:

� Use DnaSP to calculate usual statistics:

� Diversity = θW , θπ

� Site frequency spectrum

� Tajima‘s D

� What do you conclude on these various data?

� Do you have an idea of the past demography of these populations?

� Why do you need several independent loci ?