fixed parameters: population structure, mutation, selection, recombination,... reproductive...
DESCRIPTION
P(k):=P{k alleles had k distinct parents} 1 2N 1 2N *(2N-1) *..* (2N-(k-1)) =: (2N) [k] (2N) k k -> any k -> k k -> k-1 Ancestor choices: k -> j For kTRANSCRIPT
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,...
Reproductive Structure
Genealogies of non-sequenced data
Genealogies of sequenced data
Parameter Estimation
Model Testing
Coalescent Theory in Biologywww. coalescent.dk
TGTTGT CATAGTCGTTAT
Haploid Model
Diploid Model
Wright-Fisher Model of Population Reproduction
i. Individuals are made by sampling with replacement in the previous generation.
ii. The probability that 2 alleles have same ancestor in previous generation is 1/2N
Individuals are made by sampling a chromosome from the female and one from the male previous generation with replacement
Assumptions
1. Constant population size
2. No geography
3. No Selection
4. No recombination
P(k):=P{k alleles had k distinct parents}
1 2N
1
2N *(2N-1) *..* (2N-(k-1)) =: (2N)[k]
(2N)k
k -> any k -> k k -> k-1
Ancestor choices:
€
P(k) =2N[k ]
(2N)k ≈ (k 2 < 2N) 1−k2 ⎛ ⎝ ⎜
⎞ ⎠ ⎟/2N ≈ e
−k2 ⎛ ⎝ ⎜
⎞ ⎠ ⎟/ 2N
€
k2 ⎛ ⎝ ⎜
⎞ ⎠ ⎟(2N)[k−1]
k -> j
€
Sk, j (2N)[ j ]
For k << 2N:
Sk,j - the number of ways to group k labelled objects into j groups.(Stirling Numbers of second kind.
Mean, E(X2) = 2N.
Ex.: 2N = 20.000, Generation time 30 years, E(X2) = 600000 years.
Waiting for most recent common ancestor - MRCA
P(X2 = j) = (1-(1/2N))j-1 (1/2N)
Distribution until 2 alleles had a common ancestor, X2?:
P(X2 > j) = (1-(1/2N))jP(X2 > 1) = (2N-1)/2N = 1-(1/2N)
1 2N 1 2N
1 1
2
j
1 2N
1
2
j
10 Alleles’ Ancestry for 15 generations
1. Simultaneous Events 2. Multifurcations.3. Underestimation of Coalescent Rates
Multiple and Simultaneous Coalescents
2 56 3 0.0
1.0
1.0 corresponds to 2N generations
1 40
2N
0
6 6/2Ne
tc:=td/2Ne
€
Xk is exp[k2 ⎛ ⎝ ⎜
⎞ ⎠ ⎟] distributed. E(Xk ) =1/
k2 ⎛ ⎝ ⎜
⎞ ⎠ ⎟
Discrete Continuous Time
The Standard CoalescentTwo independent Processes
Continuous: Exponential Waiting Times
Discrete: Choosing Pairs to Coalesce.
1 2 3 4 5
Waiting Coalescing
4--5
3--(4,5)
(1,2)--(3,(4,5))
1--2
€
Exp52
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
€
Exp42
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
€
Exp22
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
€
Exp32
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
{1}{2}{3}{4}{5}
{1,2}{3,4,5}
{1,2,3,4,5}
{1}{2}{3}{4,5}
{1}{2}{3,4,5}
)1(2
2/1
−=⎟⎟⎠
⎞⎜⎜⎝⎛
kkk
Expected Height and Total Branch Length
Expected Total height of tree: Hk= 2(1-1/k)
i.Infinitely many alleles finds 1 allele in finite time. ii. In takes less than twice as long for k alleles to find 1 ancestors as it does for 2 alleles.
Expected Total branch length in tree, Lk:
2*(1 + 1/2 + 1/3 +..+ 1/(k-1)) ca= 2*ln(k-1)
1
2
3
k
1/3
1 2
1
2/(k-1)
Time Epoch Branch Lengths
Effective Populations Size, Ne.
In an idealised Wright-Fisher model:
i. loss of variation per generation is 1-1/(2N).
ii. Waiting time for random alleles to find a common ancestor is 2N.
Factors that influences Ne:
i. Variance in offspring. WF: 1. If variance is higher, then effective population size is smaller.
ii. Population size variation - example k cycle:
N1, N2,..,Nk. k/Ne= 1/N1+..+ 1/Nk. N1 = 10 N2= 1000 => Ne= 50.5
iii. Two sexes Ne = 4NfNm/(Nf+Nm)I.e. Nf- 10 Nm -1000 Ne - 40
6 Realisations with 25 leaves
Observations: Variation great close to root. Trees are unbalanced.
Sampling more sequences
The probability that the ancestor of the sample of size n is in a sub-sample of size k is
Letting n go to infinity gives (k-1)/(k+1), i.e. even for quite small samples it is quite large.
€
(n+1)(k−1)(n−1)(k+1)
Probability for two genes being identical:P(Coalescence < Mutation) = 1/(1+).
m mutation pr. nucleotide pr.generation. L: seq. lengthµ = m*L Mutation pr. allele pr.generation. 2Ne - allele number. := 4N*µ -- Mutation intensity in scaled process.
Adding Mutations
sequence
time
Discrete timeDiscrete sequence
Continuous timeContinuous sequence
1/L
1/(2Ne)time
sequence
/2 /2
mutation mutation coalescence
Note: Mutation rate and population size usually appear together as a product, making separate estimation difficult.
1
Three Models of Alleles and Mutations.
Infinite Allele Infinite Site Finite Site
acgtgcttacgtgcgtacctgcattcctgcattcctgcat
acgtgcttacgtgcgtacctgcattcctggcttcctgcat
i. Only identity, non-identity is determinable
ii. A mutation creates a new type.
i. Allele is represented by a line.
ii. A mutation always hits a new position.
i. Allele is represented by a sequence.
ii. A mutation changes nucleotide at chosen position.
1 2 3 4 5
11)}1{( →
12)}2,1{( →
21)}2(),1{( →21)}2(),1{( →
1121)}3,2(),1{( →1121)}3,2(),1{( →
2121)}5,4)(3(),2,1{( →
1321)}5,4)(3(),2(),1{( →
Infinite Allele Model
Final Aligned Data Set:
Infinite Site Model
1
345
2
1
3
45
2
{ }, ,
Ignoring mutation positionIgnoring sequence label
Ignoring mutation position
Ignoring sequence label
Labelling and unlabelling:positions and sequences
€
2θ5(4 + θ)
€
1(4 + θ)
9 coalescence events incompatible with data
4 classes of mutation events incompatible with data
The forward-backward argument
Infinite Site Model: An exampleTheta=2.12
2
3 2 3
5 5 4
910 5
19 14
33
Impossible Ancestral States
Final Aligned Data Set:acgtgcttacgtgcgtacctgcattcctgcattcctgcats s s
Finite Site Model
Diploid Model with Recombination
An individual is made by:
1. The paternal chromosome is taken by picking random father.
2. Making that father’s chromosomes recombine to create the individuals paternal chromosome.
Similarly for maternal chromosome.
A recombinant sequence will have have two different ancestor sequences in the grandparent.
The Diploid Model Back in Time.
1- recombination histories I: Branch length change
431 2
431 2 431 2
1- recombination histories II: Topology change
431 2
431 2 431 2
1- recombination histories III: Same tree
431 2
431 2 431 2
1- recombination histories IV: Coalescent time must be further back in time than recombination time.
3 41 2
c
r
Recombination-Coalescence Illustration Copied from Hudson 1991
IntensitiesCoales. Recomb.
1 2
3 2
6 2
3 (2+b)
1 (1+b)
0
b
Age to oldest most recent common ancestor
From W
iuf a nd Hein , 199 9 G
enet ics
Scaled recombination rate -
Age
t o o
lde s
t mos
t re c
ent c
omm
o n a
nces
tor
0 kb 250 kb
S– number of Segments E(S) = 1 +
Number of genetic ancestors to the Human Genome
sequence
time
R
R
R
C
C
C
Statements about number of ancestors are much harder to make.
Simulations
A randomly picked ancestor: (ancestral material comes in batteries!)
0
0 52.000
260 Mb
06890 8360
7.5 Mb
*35
0 30kb
*250
Parameters used 4Ne 20.000 Chromos. 1: 263 Mb. 263 cM
Chromosome 1: Segments 52.000 Ancestors 6.800
All chromosomes Ancestors 86.000Physical Population. 1.3-5.0 Mill.
Applications to Human Genome (Wiuf and Hein,97)
Ignoring recombination in phylogenetic analysis
Mimics decelerations/accelerations of evolutionary rates.
No & Infinite recombination implies molecular clock.
General Practice in Analysis of Viral Evolution!!!Recombination Assuming No Recombination
1 432 1 4 32
Simulated Example
Genotype and Phenotype Covariation: Gene Mapping
Time
Result:The Mapping Function
Reich et al. (2001)
Decay of local dependency
A set of characters.
Binary decision (0,1).
Quantitative Character.
Dominant/Recessive.
Penetrance
Spurious Occurrence
Heterogeneity
genotype Genotype Phenotype phenotype
Genetype -->Phenotype Function
Sampling Genotypes and Phenotypes