666.05 -- distribution of mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 ·...
TRANSCRIPT
![Page 1: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/1.jpg)
Distribution of Mutations
Biostatistics 666Lecture 5
![Page 2: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/2.jpg)
Last Lecture:Introduction to the Coalescent
Coalescent approach• Proceed backwards through time.• Genealogy of a sample of sequences.
Infinite sites model• All mutations distinguishable.• No reverse mutation.
![Page 3: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/3.jpg)
Some key ideas …
Probability of coalescence events
Length of genealogy and its branches
Expected number of mutations
Parameter θ which combines population size and mutation rate
![Page 4: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/4.jpg)
Building Blocks…
Probability of sampling distinct ancestors for n sequences
Coalescence time t is approximately exponentially distributed
N
n
NinP
n
i
⎟⎟⎠
⎞⎜⎜⎝
⎛
−≈⎟⎠⎞
⎜⎝⎛ −=∏
−
=
211)(
1
1
![Page 5: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/5.jpg)
Some Key Results…
Coalescence Time (population size units)
Total Length (population size units)
∑−
=
=1
1
2)(n
itot i
TE
⎟⎟⎠
⎞⎜⎜⎝
⎛=
2/1)(
jTE j
![Page 6: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/6.jpg)
Some More Key Results …
Expected Number of Polymorphisms
∑∑
∑∑
−
=
−
=
−
=
−
=
==
==
1
1
1
1
1
1
1
1
/1/12)(
sample haploidan For
/1/14)(
sample diploid aFor
n
i
n
i
n
i
n
i
iiNSE
iiNSE
θµ
θµ
![Page 7: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/7.jpg)
Inferences about θ
Could be estimated from S• Divide by expected length of genealogy
Could then be used to:• Estimate N, if mutation rate µ is known• Estimate µ, if population size N is known
∑−
=
= 1
1/1
ˆn
ii
Sθ
![Page 8: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/8.jpg)
Alternative Estimator for θ …
Count pairwise differences between sequences
Compute average number of differences
∑∑= +=
−
⎟⎟⎠
⎞⎜⎜⎝
⎛=
n
i
n
ijijS
n
1 1
1
2~θ
![Page 9: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/9.jpg)
Var(θ) as a function of N
2 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50
Sample Size
Var
ianc
e in
Est
imat
e of
The
ta
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Parameters
N = 10,000 individualsµ = 10-4
θ = 4
^
![Page 10: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/10.jpg)
Today ...
More applications of the coalescent
Predicting allele frequency distributions• Using simulations
The full distribution of S• Using analytical calculations
![Page 11: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/11.jpg)
A Coalescent Simulation …
Let’s consider tracing the ancestry of 4 sequences
![Page 12: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/12.jpg)
When n = 4
coalesce torandomat sequences Pick twoondistributi lexponentia from timeSample
24
2)4(
Event CoalescentNext toTime
224
)4(
Event Coalescent ofy Probabilit
⎟⎟⎠
⎞⎜⎜⎝
⎛≈
⎟⎟⎠
⎞⎜⎜⎝
⎛≈
NT
NP
![Page 13: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/13.jpg)
Next n = 3 …
T(4)
Let’s assume that sequences 3 and 4 are selected …
Then, we repeat the process for a sample of 3 sequences
![Page 14: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/14.jpg)
Next n = 2 …
T(4)
Let’s assume that sequences 1 and 2 are selected to coalesce
Then, we repeat the process for a sample of 2 sequences
T(3)
![Page 15: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/15.jpg)
The Simulated Coalescent
T(4)
T(3)
T(2)At this point, we could
place mutations in genealogy. Most often,
these would fall in longer branches.
![Page 16: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/16.jpg)
A Coalescent Simulation …
T(4)
T(3)
T(2)Mutations in these
branches affecta pair of sequences
![Page 17: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/17.jpg)
A Coalescent Simulation …
T(4)
T(3)
T(2)Mutations in these
branches affecta single sequence
![Page 18: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/18.jpg)
Frequency Spectrum
Repeating the simulation multiple times, would give us a predicted mutation spectrum.
Frequency (out of n)
Pro
babi
lity
0.0
0.1
0.2
0.3
0.4
0.5
![Page 19: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/19.jpg)
Frequency Spectrum (n = 10)
1 2 3 4 5 6 7 8 9 10
Frequency (out of n)
Pro
porti
on o
f All
Var
iant
s
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
![Page 20: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/20.jpg)
Frequency Spectrum (n = 100)
1 4 7 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
Frequency (out of n)
Pro
porti
on o
f All
Var
iant
s
0.00
0.05
0.10
0.15
![Page 21: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/21.jpg)
Frequency Spectrum
Constant size populationExponentially growing population
Most variants are rare• For n = 100, ~44% of variants occur < 5/100.• For n = 10, ~35% of variants observed once.
![Page 22: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/22.jpg)
Mutation Spectrum
Depends on genealogy• Population Size• Population Growth• Population Subdivision
Does not depend on• Mutation rate!
![Page 23: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/23.jpg)
Deviations from Neutral Spectrum
When would you expect deviations from the spectra we described?
What would you expect for …• A rapidly growing population?• A population whose size is decreasing?
Why?
![Page 24: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/24.jpg)
Effect of Polymorphism Type
Data from Cargill et al, 1999
![Page 25: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/25.jpg)
Number of Mutations
Can be derived from coalescent tree• What are the key features?
Analytical results possible• Trace back in time until MRCA, tracking
mutation events
![Page 26: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/26.jpg)
Sample of Two Sequences
Track coalescences and mutations• Probability of a coalescent event?
• Depends on population size …• Probability of a mutation?
• Depends on mutation rate …
Proceed backwards until either occurs…• Conditional probability for each outcome?
![Page 27: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/27.jpg)
Two Identical Sequences
θ
µ
+=
+=
+≈
11
22/12/1
)0 is (2
NN
PPPSP
mutCA
CA
![Page 28: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/28.jpg)
Full distribution of S…
Probability that first j events are mutations…
⎟⎠⎞
⎜⎝⎛
+⎟⎠⎞
⎜⎝⎛
+=
θθθ
11
1)(2
j
jP
![Page 29: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/29.jpg)
Example…
2 sequencesPopulation size N = 25,000Mutation rate µ = 10-5
Probability of 0, 1, 2, 3… mutations
![Page 30: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/30.jpg)
And for multiple sequences…
Describe number of mutations until the next coalescence event
Proceed back in time, until:• One of n sequences mutates…• A coalescent event occurs…
• Then track mutations in (n-1) sequences
![Page 31: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/31.jpg)
Formulae …
11
1
22
22
22
)(−+
−⎟⎠⎞
⎜⎝⎛
−+=
⎟⎟⎠
⎞⎜⎜⎝
⎛
+
⎟⎟⎠
⎞⎜⎜⎝
⎛
⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛
+
=n
nn
N
n
n
N
n
N
n
n
njQj
j
n θθθ
µµ
µ
∑=
− −=j
innn iQijPjP
01 )()()(
![Page 32: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/32.jpg)
Example…
3 sequencesPopulation size N = 25,000Mutation rate µ = 10-5
Probability of 0, 1, 2, 3… mutations
![Page 33: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/33.jpg)
Number of Mutations
0 1 2 3 4 5 6 7 8 9 10+
Number of Mutations
Pro
babi
lity
0.00
0.05
0.10
0.15
0.20
N = 10,000n = 10µ = 2 * 10-5
Approximately 2kb of sequence,sequenced in 10 individuals
![Page 34: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/34.jpg)
So far …
One homogeneous population• Coalescence times• Number of mutations
• Expectation• Distribution
• Spectrum of mutations
Several assumptions, including …• No recombination
![Page 35: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/35.jpg)
Recombination …
No recombination• Single genealogy
Free recombination• Two independent genealogies• Same population history
Intermediate case• Correlated genealogies
![Page 36: 666.05 -- Distribution of Mutationscsg.sph.umich.edu/abecasis/class/666.05.pdf · 2005-01-20 · Microsoft PowerPoint - 666.05 -- Distribution of Mutations Author: Goncalo Created](https://reader034.vdocument.in/reader034/viewer/2022042923/5f7303977cea67481a38e91a/html5/thumbnails/36.jpg)
Recommended Reading
Richard R. Hudson (1990) Gene genealogies and the coalescent process
Oxford Surveys in Evolutionary Biology, Vol. 7.D. Futuyma and J. Antonovics (Eds). Oxford University Press, New York.