introduction to haplotype estimation stat/biostat 550
TRANSCRIPT
![Page 1: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/1.jpg)
Introduction to Haplotype Estimation
Stat/Biostat 550
![Page 2: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/2.jpg)
The Haplotype Problem
• Suppose we genotype individuals at a number of tightly linked SNPs.
A C G C C T T T G C G C
G A A C C C C C A G G C
![Page 3: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/3.jpg)
The Haplotype Problem
• Suppose we genotype individuals at a number of tightly linked SNPs.
A C G C C T T T G C G C
G A A C C C C C A G G C
![Page 4: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/4.jpg)
The Haplotype Problem
• Suppose we genotype individuals at a number of tightly linked SNPs.
![Page 5: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/5.jpg)
The Haplotype Problem
• What do the types on the two chromosomes look like?
![Page 6: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/6.jpg)
The Haplotype Problem
• What do the types on the two chromosomes look like?
![Page 7: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/7.jpg)
The Haplotype Problem
• What do the types on the two chromosomes look like?
![Page 8: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/8.jpg)
The Haplotype Problem
• What do the types on the two chromosomes look like?
![Page 9: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/9.jpg)
The Haplotype Problem
• What do the types on the two chromosomes look like?
![Page 10: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/10.jpg)
Haplotypes: who cares?
• LD mapping: increase power?
• LD mapping: decrease genotyping?
• Evolutionary studies: selection, recombination, gene conversion, population structure,…
Many people, for many different reasons…
![Page 11: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/11.jpg)
The Haplotype Problem – potential solutions
• Molecular methods
• Collect family data
• Statistical methods for population data
![Page 12: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/12.jpg)
The Simplest Case
• What do the types on the two chromosomes look like?
![Page 13: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/13.jpg)
The Next Simplest Case
• What do the types on the two chromosomes look like?
![Page 14: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/14.jpg)
The Next Simplest Case
• What do the types on the two chromosomes look like?
![Page 15: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/15.jpg)
The first difficult case…
• What do the types on the two chromosomes look like?
![Page 16: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/16.jpg)
The first difficult case…
• What do the types on the two chromosomes look like?
![Page 17: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/17.jpg)
Clark’s Method (1990)
• Idea: use information obtained from other individuals in the population to determine the most probable haplotype pair.
![Page 18: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/18.jpg)
Is it this configuration?
1
2
3
![Page 19: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/19.jpg)
…or this one?
1
2
3
![Page 20: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/20.jpg)
This one is more probable.
1
2
3
![Page 21: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/21.jpg)
Clark’s Method (Clark, 1990)
• Identify the unambiguous individuals.
• Make a list of “known” haplotypes.
• Go through list, and see whether ambiguous individuals can be made up from a “known” haplotype plus another “complementary” haplotype. If so, add the complementary haplotype to the list of “known” haplotypes.
![Page 22: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/22.jpg)
Clark’s Method
List of known haps.1
2
3
![Page 23: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/23.jpg)
Clark’s Method
List of known haps.1
2
3
![Page 24: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/24.jpg)
Clark’s Method: Problem 1
3
1
2
![Page 25: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/25.jpg)
Clark’s Method: Problem 1
List of known haps.1
2
3
![Page 26: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/26.jpg)
Clark’s Method: Problem 1
List of known haps.1
2
3
![Page 27: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/27.jpg)
Clark’s Method: Problem 1
List of known haps.1
2
3
![Page 28: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/28.jpg)
Clark’s Method: Problem 1
List of known haps.1
2
3
![Page 29: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/29.jpg)
Clark’s Method: Problem 1
List of known haps.1
2
3
Answer depends on order list is considered….
… and frequency information is ignored
![Page 30: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/30.jpg)
Clark’s Method: Problem 2
3
1
2
![Page 31: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/31.jpg)
Clark’s Method: Problem 2
3
1
2
List of known haps.
Algorithm can fail to resolve all haplotypes…
… because looks only for exact matches
![Page 32: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/32.jpg)
Clark’s Algorithm: Summary
• Results may depend on order individuals are considered.
• Frequency information is ignored.
• May fail to resolve all haplotypes.
• Fails to assess uncertainty.
• Looks only for exact matches.
• Fast and intuitive(?).
![Page 33: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/33.jpg)
Maximum Likelihood (EM Algorithm)
• Idea: find haplotype frequencies (f1,…fN) to maximise probability of observed genotype data (g1,…,gn).
}21:2,1{ 211 ),...|Pr(ighhhh hhNi ffffg
),...|Pr(),...|,...,Pr( 111 Ni
iNn ffgffgg
![Page 34: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/34.jpg)
Bayesian version
• Replace single pass through data, with iterative scheme.
• Allow for uncertainty in resolution.
• Use frequency information.
Resulting “naïve Gibbs sampler” produces results similar to EM (Stephens, Smith and Donnelly 2001).
Modify Clark’s algorithm:
![Page 35: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/35.jpg)
Example
List of known haps.1
2
3Matches 1 known
Does not match any
31
Assigned moderate probability
![Page 36: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/36.jpg)
Example
List of known haps.1
2
3Matches 3 known
Does not match any
31
Assigned higher probability
![Page 37: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/37.jpg)
Example
List of known haps.1
2
3Does not match any
Does not match any
31
Assigned low probability
![Page 38: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/38.jpg)
Problems with EM/naïve Gibbs
• Potentially (very) large number of parameters to estimate, leading to inaccurate estimates.
• Can be time-consuming for large problems.
• Can “converge” to poor local optima (alleviated by multiple runs).
![Page 39: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/39.jpg)
Further modification
• Take into account “near misses”, as well as exact matches.
(PHASE v1.0: Stephens, Smith and Donnelly 2001)
![Page 40: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/40.jpg)
Example
List of known haps.1
2
3Matches 1 known
Differs by 2 from 3 known
31
![Page 41: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/41.jpg)
Example
List of known haps.1
2
3Matches 3 known
Differs by 2 from 1 known
31
![Page 42: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/42.jpg)
Example
List of known haps.1
2
3Differs by 1 from 3 known
Differs by 1 from 1 known
31
How to balance these possibilities?
![Page 43: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/43.jpg)
The key question
• What is the conditional distribution of the next haplotype, given a set of known haplotypes?
![Page 44: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/44.jpg)
Example
1
2
Given the above haplotypes, what would you expect the next haplotype to look like?
![Page 45: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/45.jpg)
Qualitative answer
• The next haplotype will likely differ by a small number of mutations (possibly 0 mutations) from a (randomly-chosen) existing haplotype.
• Use theory (Ewens sampling formula; coalescent theory) to roughly quantify the distribution of the “small number”.
![Page 46: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/46.jpg)
Comparisons on simulated data
![Page 47: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/47.jpg)
![Page 48: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/48.jpg)
Problems
• Time-consuming for large problems.
• Can “converge” to poor local optima.
• Ignores recombination (decay of LD with distance).
• How should uncertainty in haplotype estimates be treated?
![Page 49: Introduction to Haplotype Estimation Stat/Biostat 550](https://reader036.vdocument.in/reader036/viewer/2022081420/551a9cc455034643688b6244/html5/thumbnails/49.jpg)
… to be continued.