fixed parameters: population structure, mutation, selection, recombination,... reproductive...

33
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non- sequenced data Genealogies of sequenced data Parameter Estimation Model Testing Coalescent Theory in Biology www. coalescent.dk TGTTGT CATAGT CGTTAT

Upload: carol-patterson

Post on 19-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

P(k):=P{k alleles had k distinct parents} 1 2N 1 2N *(2N-1) *..* (2N-(k-1)) =: (2N) [k] (2N) k k -> any k -> k k -> k-1 Ancestor choices: k -> j For k

TRANSCRIPT

Page 1: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Fixed Parameters: Population Structure, Mutation, Selection, Recombination,...

Reproductive Structure

Genealogies of non-sequenced data

Genealogies of sequenced data

Parameter Estimation

Model Testing

Coalescent Theory in Biologywww. coalescent.dk

TGTTGT CATAGTCGTTAT

Page 2: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Haploid Model

Diploid Model

Wright-Fisher Model of Population Reproduction

i. Individuals are made by sampling with replacement in the previous generation.

ii. The probability that 2 alleles have same ancestor in previous generation is 1/2N

Individuals are made by sampling a chromosome from the female and one from the male previous generation with replacement

Assumptions

1. Constant population size

2. No geography

3. No Selection

4. No recombination

Page 3: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

P(k):=P{k alleles had k distinct parents}

1 2N

1

2N *(2N-1) *..* (2N-(k-1)) =: (2N)[k]

(2N)k

k -> any k -> k k -> k-1

Ancestor choices:

P(k) =2N[k ]

(2N)k ≈ (k 2 < 2N) 1−k2 ⎛ ⎝ ⎜

⎞ ⎠ ⎟/2N ≈ e

−k2 ⎛ ⎝ ⎜

⎞ ⎠ ⎟/ 2N

k2 ⎛ ⎝ ⎜

⎞ ⎠ ⎟(2N)[k−1]

k -> j

Sk, j (2N)[ j ]

For k << 2N:

Sk,j - the number of ways to group k labelled objects into j groups.(Stirling Numbers of second kind.

Page 4: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Mean, E(X2) = 2N.

Ex.: 2N = 20.000, Generation time 30 years, E(X2) = 600000 years.

Waiting for most recent common ancestor - MRCA

P(X2 = j) = (1-(1/2N))j-1 (1/2N)

Distribution until 2 alleles had a common ancestor, X2?:

P(X2 > j) = (1-(1/2N))jP(X2 > 1) = (2N-1)/2N = 1-(1/2N)

1 2N 1 2N

1 1

2

j

1 2N

1

2

j

Page 5: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

10 Alleles’ Ancestry for 15 generations

Page 6: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

1. Simultaneous Events 2. Multifurcations.3. Underestimation of Coalescent Rates

Multiple and Simultaneous Coalescents

Page 7: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

2 56 3 0.0

1.0

1.0 corresponds to 2N generations

1 40

2N

0

6 6/2Ne

tc:=td/2Ne

Xk is exp[k2 ⎛ ⎝ ⎜

⎞ ⎠ ⎟] distributed. E(Xk ) =1/

k2 ⎛ ⎝ ⎜

⎞ ⎠ ⎟

Discrete Continuous Time

Page 8: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

The Standard CoalescentTwo independent Processes

Continuous: Exponential Waiting Times

Discrete: Choosing Pairs to Coalesce.

1 2 3 4 5

Waiting Coalescing

4--5

3--(4,5)

(1,2)--(3,(4,5))

1--2

Exp52

⎝ ⎜ ⎜

⎠ ⎟ ⎟

Exp42

⎝ ⎜ ⎜

⎠ ⎟ ⎟

Exp22

⎝ ⎜ ⎜

⎠ ⎟ ⎟

Exp32

⎝ ⎜ ⎜

⎠ ⎟ ⎟

{1}{2}{3}{4}{5}

{1,2}{3,4,5}

{1,2,3,4,5}

{1}{2}{3}{4,5}

{1}{2}{3,4,5}

Page 9: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

)1(2

2/1

−=⎟⎟⎠

⎞⎜⎜⎝⎛

kkk

Expected Height and Total Branch Length

Expected Total height of tree: Hk= 2(1-1/k)

i.Infinitely many alleles finds 1 allele in finite time. ii. In takes less than twice as long for k alleles to find 1 ancestors as it does for 2 alleles.

Expected Total branch length in tree, Lk:

2*(1 + 1/2 + 1/3 +..+ 1/(k-1)) ca= 2*ln(k-1)

1

2

3

k

1/3

1 2

1

2/(k-1)

Time Epoch Branch Lengths

Page 10: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Effective Populations Size, Ne.

In an idealised Wright-Fisher model:

i. loss of variation per generation is 1-1/(2N).

ii. Waiting time for random alleles to find a common ancestor is 2N.

Factors that influences Ne:

i. Variance in offspring. WF: 1. If variance is higher, then effective population size is smaller.

ii. Population size variation - example k cycle:

N1, N2,..,Nk. k/Ne= 1/N1+..+ 1/Nk. N1 = 10 N2= 1000 => Ne= 50.5

iii. Two sexes Ne = 4NfNm/(Nf+Nm)I.e. Nf- 10 Nm -1000 Ne - 40

Page 11: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

6 Realisations with 25 leaves

Observations: Variation great close to root. Trees are unbalanced.

Page 12: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Sampling more sequences

The probability that the ancestor of the sample of size n is in a sub-sample of size k is

Letting n go to infinity gives (k-1)/(k+1), i.e. even for quite small samples it is quite large.

(n+1)(k−1)(n−1)(k+1)

Page 13: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Probability for two genes being identical:P(Coalescence < Mutation) = 1/(1+).

m mutation pr. nucleotide pr.generation. L: seq. lengthµ = m*L Mutation pr. allele pr.generation. 2Ne - allele number. := 4N*µ -- Mutation intensity in scaled process.

Adding Mutations

sequence

time

Discrete timeDiscrete sequence

Continuous timeContinuous sequence

1/L

1/(2Ne)time

sequence

/2 /2

mutation mutation coalescence

Note: Mutation rate and population size usually appear together as a product, making separate estimation difficult.

1

Page 14: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Three Models of Alleles and Mutations.

Infinite Allele Infinite Site Finite Site

acgtgcttacgtgcgtacctgcattcctgcattcctgcat

acgtgcttacgtgcgtacctgcattcctggcttcctgcat

i. Only identity, non-identity is determinable

ii. A mutation creates a new type.

i. Allele is represented by a line.

ii. A mutation always hits a new position.

i. Allele is represented by a sequence.

ii. A mutation changes nucleotide at chosen position.

Page 15: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

1 2 3 4 5

11)}1{( →

12)}2,1{( →

21)}2(),1{( →21)}2(),1{( →

1121)}3,2(),1{( →1121)}3,2(),1{( →

2121)}5,4)(3(),2,1{( →

1321)}5,4)(3(),2(),1{( →

Infinite Allele Model

Page 16: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Final Aligned Data Set:

Infinite Site Model

Page 17: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

1

345

2

1

3

45

2

{ }, ,

Ignoring mutation positionIgnoring sequence label

Ignoring mutation position

Ignoring sequence label

Labelling and unlabelling:positions and sequences

2θ5(4 + θ)

1(4 + θ)

9 coalescence events incompatible with data

4 classes of mutation events incompatible with data

The forward-backward argument

Page 18: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Infinite Site Model: An exampleTheta=2.12

2

3 2 3

5 5 4

910 5

19 14

33

Page 19: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Impossible Ancestral States

Page 20: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Final Aligned Data Set:acgtgcttacgtgcgtacctgcattcctgcattcctgcats s s

Finite Site Model

Page 21: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Diploid Model with Recombination

An individual is made by:

1. The paternal chromosome is taken by picking random father.

2. Making that father’s chromosomes recombine to create the individuals paternal chromosome.

Similarly for maternal chromosome.

Page 22: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

A recombinant sequence will have have two different ancestor sequences in the grandparent.

The Diploid Model Back in Time.

Page 23: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

1- recombination histories I: Branch length change

431 2

431 2 431 2

Page 24: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

1- recombination histories II: Topology change

431 2

431 2 431 2

Page 25: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

1- recombination histories III: Same tree

431 2

431 2 431 2

Page 26: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

1- recombination histories IV: Coalescent time must be further back in time than recombination time.

3 41 2

c

r

Page 27: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Recombination-Coalescence Illustration Copied from Hudson 1991

IntensitiesCoales. Recomb.

1 2

3 2

6 2

3 (2+b)

1 (1+b)

0

b

Page 28: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Age to oldest most recent common ancestor

From W

iuf a nd Hein , 199 9 G

enet ics

Scaled recombination rate -

Age

t o o

lde s

t mos

t re c

ent c

omm

o n a

nces

tor

0 kb 250 kb

Page 29: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

S– number of Segments E(S) = 1 +

Number of genetic ancestors to the Human Genome

sequence

time

R

R

R

C

C

C

Statements about number of ancestors are much harder to make.

Simulations

Page 30: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

A randomly picked ancestor: (ancestral material comes in batteries!)

0

0 52.000

260 Mb

06890 8360

7.5 Mb

*35

0 30kb

*250

Parameters used 4Ne 20.000 Chromos. 1: 263 Mb. 263 cM

Chromosome 1: Segments 52.000 Ancestors 6.800

All chromosomes Ancestors 86.000Physical Population. 1.3-5.0 Mill.

Applications to Human Genome (Wiuf and Hein,97)

Page 31: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Ignoring recombination in phylogenetic analysis

Mimics decelerations/accelerations of evolutionary rates.

No & Infinite recombination implies molecular clock.

General Practice in Analysis of Viral Evolution!!!Recombination Assuming No Recombination

1 432 1 4 32

Page 32: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Simulated Example

Page 33: Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of

Genotype and Phenotype Covariation: Gene Mapping

Time

Result:The Mapping Function

Reich et al. (2001)

Decay of local dependency

A set of characters.

Binary decision (0,1).

Quantitative Character.

Dominant/Recessive.

Penetrance

Spurious Occurrence

Heterogeneity

genotype Genotype Phenotype phenotype

Genetype -->Phenotype Function

Sampling Genotypes and Phenotypes