cryptic variation in the human mutation rate alan hodgkinson adam eyre-walker, manolis ladoukakis

59
Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Upload: cori-perkins

Post on 12-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Cryptic Variation in the Human mutation rate

Alan Hodgkinson

Adam Eyre-Walker, Manolis Ladoukakis

Page 2: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Variation in the mutation rate:

• Between different chromosomes

• Between regions on chromosomes

• Neighbouring nucleotides

Page 3: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Simple context effects:

Hwang and Green (2004) PNAS 101: 13994-14001

Page 4: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Cryptic Variation:

• Remote context:

AGTCGGTTACCGTGACGTTGAACGTGT

Page 5: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Cryptic Variation:

• Remote context:

AGTCGGTTACCGTGACGTTGAACGTGT

• Degenerate context:

AGTCGGTTACCGTGYSRGYGAACGTGT

Page 6: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Cryptic Variation:

• Remote context:

AGTCGGTTACCGTGACGTTGAACGTGT

• Degenerate context:

AGTCGGTTACCGTGYSRGYGAACGTGT

• No context / Complex context

Page 7: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Our approach to the problem

• Search for SNPs in human sequences that also have a SNP in the orthologous position in chimp.

Human

Chimp

Page 8: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Our approach to the problem

• Search for SNPs in human sequences that also have a SNP in the orthologous position in chimp.

Human

Chimp

Do we see more coincident SNPs than expected by chance?

Page 9: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

The method• Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis.

Page 10: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

The method• Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis.

• Extract all chimp SNPs from dbSNP with 50bp either side of SNP.

Page 11: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

The method• Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis.

• Extract all chimp SNPs from dbSNP with 50bp either side of SNP.

• BLAST chimp SNPs against human database.

Page 12: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

The method• Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis.

• Extract all chimp SNPs from dbSNP with 50bp either side of SNP.

• BLAST chimp SNPs against human database.

• Extract results above a certain level of homology where there is a SNP on both sequences and reduce to 40bp either side of central position.

Page 13: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

The method• Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis.

• Extract all chimp SNPs from dbSNP with 50bp either side of SNP.

• BLAST chimp SNPs against human database.

• Extract results above a certain level of homology where there is a SNP on both sequences and reduce to 40bp either side of central position.

• Repeating both including and excluding CpG effects.

Page 14: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Results

• ~1.5 million chimp SNPs.

• ~310,000 81bp alignments containing a human and chimp SNP.

Page 15: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Results

• ~1.5 million chimp SNPs.

• ~310,000 81bp alignments containing a human and chimp SNP.

• Observe the number of coincident SNPs.

• Calculate the expected number, taking into account the effects of neighbouring nucleotides.

Page 16: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Results

Obs Exp Ratio

All 11571 6592 1.76 (1.72,1.79)

No-CpG 5028 2533 1.98 (1.93,2.04)

Page 17: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Results

C/T G/A C/A G/T C/G A/T

C/T 1.91 1.04 1.19 1.21 0.96

G/A 1.83 1.24 1.02 1.14 1.40

C/A 1.23 1.08 4.81 1.28 1.39

G/T 1.15 1.38 4.95 1.27 0.77

C/G 1.09 1.14 1.24 1.40 2.79

A/T 0.94 1.06 1.79 0.99 15.43

Page 18: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Alternative Explanations

• Bias in the Method

• Selection

• Ancestral Polymorphism

• Paralogous SNPs

Page 19: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Alternative Explanations

• Bias in the Method

• Selection

• Ancestral Polymorphism

• Paralogous SNPs

Page 20: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Methodological Bias

• Simulated data with same density of human and chimp SNPs as dbSNP under different divergence and mutation patterns.

• Method worked well under realistic conditions.

Page 21: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Methodological Bias

Div Obs Exp Ratio 95% CI

0 839 812 1.033 (0.963,1.103)

1 2419 2316 1.040 (1.003,1.086)

2 681 685 0.995 (0.920,1.069)

Div Obs Exp Ratio 95% CI

0 401 428 0.936 (0.844,1.028)

1 1182 1228 0.963 (0.908,1.018)

2 374 400 0.935 (0.840,1.030)

All sites (H&G):

Non CpG sites (H&G):

Page 22: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Methodological Bias

Div Obs Exp Ratio 95% CI

0 839 812 1.033 (0.963,1.103)

1 2419 2316 1.040 (1.003,1.086)

2 681 685 0.995 (0.920,1.069)

Div Obs Exp Ratio 95% CI

0 401 428 0.936 (0.844,1.028)

1 1182 1228 0.963 (0.908,1.018)

2 374 400 0.935 (0.840,1.030)

All sites (H&G):

Non CpG sites (H&G):

Page 23: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Alternative Explanations

• Bias in the method

• Selection

• Ancestral Polymorphism

• Paralogous SNPs

Page 24: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Selection

• Areas of low SNP density result in clustering:

Human

Chimp

Page 25: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Selection

• Areas of low SNP density result in clustering:

Human

Chimp

Apparent excess of coincident SNPs

Page 26: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Selection • No clustering:

Page 27: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Alternative Explanations

• Bias in the method

• Selection

• Ancestral Polymorphism

• Paralogous SNPs

Page 28: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Ancestral Polymorphism• SNP inherited from common ancestor of chimp and human:

T

TT

A

T

TT

A

T

AT

A

Common Ancestor

HumanChimp

Page 29: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Ancestral Polymorphism• SNP inherited from common ancestor of chimp and human:

T

TT

A

T

TT

A

T

AT

A

Common Ancestor

HumanChimp

Increase in coincident SNPs

Page 30: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Ancestral Polymorphism

• Expect observed/expected ratio to be same for all transitions:

C/T G/A C/A G/T C/G A/T

C/T 1.91 1.04 1.19 1.21 0.96

G/A 1.83 1.24 1.02 1.14 1.40

C/A 1.23 1.08 4.81 1.28 1.39

G/T 1.15 1.38 4.95 1.27 0.77

C/G 1.09 1.14 1.24 1.40 2.79

A/T 0.94 1.06 1.79 0.99 15.43

Page 31: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Ancestral Polymorphism

• Repeated initial analysis with macaque data.

• Humans and Macaque split ~23-24 million years ago so we expect there to be no shared polymorphisms.

Page 32: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Ancestral Polymorphism

• Repeated initial analysis with macaque data.

• Humans and Macaque split ~23-24 million years ago so we expect there to be no shared polymorphisms.

Obs Exp Ratio

All 77 47 1.64

(1.27,2.00)

No-CpG 34 23 1.51 (1.001,2.02)

Page 33: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Alternative Explanations

• Bias in the method

• Selection

• Ancestral Polymorphism

• Paralogous SNPs

Page 34: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Paralogous SNPs

• Excess of coincident SNPs a consequence of artifactual SNPs called as a result of substitutions in paralogous regions.

Page 35: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Paralogous SNPs

• Excess of coincident SNPs a consequence of artifactual SNPs called as a result of substitutions in paralogous regions.

• Musumeci et al (2010): 8.32% of human variation in dbSNP may be due to paralogy.

Page 36: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Paralogous SNPs

• Excess of coincident SNPs a consequence of artifactual SNPs called as a result of substitutions in paralogous regions.

• Musumeci et al (2010): 8.32% of human variation in dbSNP may be due to paralogy.

AGCTGCACGT Y CGGCATCCAA SNPAGCTGCACGT T CGGCATCCAA Chromosome 1AGCTGCACGT A CGGCATCCAA Chromosome 7

Artifactual SNP

Page 37: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Paralogous SNPs

AGCTGCACGT (T/A) CGGCATCCAAAGCTGCACGT T CGGCATCCAA

AGCTGCACGT (T/A) CGGCATCCAAAGCTGCACGT T CGGCATCCAAAGCTGCACGT A CGGCATCCAA

Page 38: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Paralogous SNPs

AGCTGCACGT (T/A) CGGCATCCAAAGCTGCACGT T CGGCATCCAA

AGCTGCACGT (T/A) CGGCATCCAAAGCTGCACGT T CGGCATCCAAAGCTGCACGT A CGGCATCCAA

3.6% of coincident SNPs are possibly a consequence of paralogous sequences

Page 39: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Alternative Explanations

• Bias in the method

• Selection

• Ancestral Polymorphism

• Paralogous SNPs

Cryptic variation in the mutation rate

Page 40: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Context Analysis

• 4517 sequences containing non-CpG coincident SNPs flanked by 200bp.

• Tabulate triplet frequencies at each position in surrounding sequences.

• Test whether the proportions of triplets we observe at each position significantly different from the proportions in the sequences as a whole.

Page 41: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Context Analysis

• Coincident SNP in central position:

Page 42: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Context Analysis

• Coincident SNP in central position:

No obvious context surrounding coincident SNPs

Page 43: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Genomic Distribution

• Tallied the number of coincident SNPs per MB:

- 3.91 coincident SNPs per MB.

- 1.68 non-CpG coincident SNPs per MB.

Page 44: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Genomic Distribution

• Tallied the number of coincident SNPs per MB:

- 3.91 coincident SNPs per MB.

- 1.68 non-CpG coincident SNPs per MB.

• If randomly distributed expect Poisson distribution and = 2 = 3.91

Page 45: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Genomic Distribution

• Tallied the number of coincident SNPs per MB:

- 3.91 coincident SNPs per MB.

- 1.68 non-CpG coincident SNPs per MB.

• If randomly distributed expect Poisson distribution and = 2 = 3.91

• 2 = 13.27 (p<0.001) and so sampling variance explains approximately 30% of total variance.

Page 46: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Genomic Distribution

Feature r r2 pSNP density 0.256 0.0655 <0.001**

Distance to Telomere

-0.022 0.0004 0.226

Distance to Centromere

0.011 0.0001 0.565

Recombination Rate

0.107 0.0114 <0.001**

Nucleosome Association

0.004 0.0000 0.832

Gene Density -0.022 0.0004 0.230

GC content -0.006 0.0000 0.741

Page 47: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Genomic Distribution

• SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone.

Page 48: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Genomic Distribution

• SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone.

• Recombination rate positively correlated with SNP density (r = 0.242, p<0.001).

• Partial correlation controlling for SNP density: r = 0.048, p=0.011**.

Page 49: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Genomic Distribution

• SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone.

• Recombination rate positively correlated with SNP density (r = 0.242, p<0.001).

• Partial correlation controlling for SNP density: r = 0.048, p=0.011**.

• SNP densities explain 6.5% of the variance, recombination rate explains 0.2% of the variance of coincident SNPs.

Page 50: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Genomic Distribution

Feature r r2 pCoincident SNP

Density0.256 0.0655 <0.001**

Distance to Telomere

-0.171 0.0292 <0.001**

Distance to Centromere

-0.047 0.0022 0.012**

Recombination Rate

0.234 0.0548 <0.001**

Nucleosome Association

0.187 0.0350 <0.001**

Gene Density 0.064 0.0041 0.001**

GC content 0.184 0.0339 <0.001**

Page 51: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Quantification

• Use Log-normal distribution of relative mutation rates due to cryptic variation.

• Model the number of coincident SNPs under the effects of cryptic variation.

• Incorporate effects of divergence.

Page 52: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Quantification

• Use Log-normal distribution of relative mutation rates due to cryptic variation.

• Model the number of coincident SNPs under the effects of cryptic variation.

• Incorporate effects of divergence.

What level of variation in the log-normal distribution explains our results?

Page 53: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Log-normal model

Fastest 5% of sites mutate ~16.4 times faster than slowest 5% of sites.

Page 54: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Summary

• Cryptic variation in the mutation rate.

Page 55: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Summary

• Cryptic variation in the mutation rate.

• No obvious context surrounding coincident SNPs.

Page 56: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Summary

• Cryptic variation in the mutation rate.

• No obvious context surrounding coincident SNPs.

• Variation is truly cryptic.

Page 57: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Summary

• Cryptic variation in the mutation rate.

• No obvious context surrounding coincident SNPs.

• Variation is truly cryptic.

• Genomic distribution of coincident SNPs is over-dispersed

Page 58: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Summary

• Cryptic variation in the mutation rate.

• No obvious context surrounding coincident SNPs.

• Variation is truly cryptic.

• Genomic distribution of coincident SNPs is over-dispersed

• Variation in mutation rate is substantial.

Page 59: Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

Acknowledgments

Manolis Ladoukakis

• BBSRC

• People:

Adam Eyre-Walker