lecture 10: linkage analysis iii
DESCRIPTION
Lecture 10: Linkage Analysis III. Date: 9/26/02 Revisit segregation ratio distortion. Haplotype coding Three point analysis Multipoint analysis. Additive Segregation Ratio Distortion. Systematic genotype classification error occurs. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/1.jpg)
Lecture 10: Linkage Analysis III
Date: 9/26/02 Revisit segregation ratio distortion. Haplotype coding Three point analysis Multipoint analysis
![Page 2: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/2.jpg)
Additive Segregation Ratio Distortion
Systematic genotype classification error occurs.
Power and estimates of recombination fraction are unaffected by additive distortion in the backcross configuration.
Estimates of recombination fraction are not affected for F2, but the false positive rate increases.
![Page 3: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/3.jpg)
Additive Segregation - Backcross
Suppose the frequency of genotype Aa is increased because a fraction u of aa genotypes are misclassified.
Similarly, assume the frequency of genotype Bb is independently increased by fraction v.
We need to recalculate the expected frequencies under the new model with additional parameters u and v.
![Page 4: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/4.jpg)
Additive Segregation – Backcross (contd)
Genotype Expected Frequency
Expected Frequency with Distortion
AaBb 0.5(1-) 0.5(1-) + u/2 + v/2
Aabb 0.5 0.5u/2 – v/2
aaBb 0.5 0.5 - u/2 + v/2
aabb 0.5(1-) 0.5(1-) – u/2 – v/2
Total: Aa 0.5 0.5 + u
Total: aa 0.5 0.5 – u
Total: Bb 0.5 0.5 + v
Total: bb 0.5 0.5 – v
![Page 5: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/5.jpg)
Additive Segregation – Backcross (contd)
The number of unknown parameters equals the number of degrees of freedom.
Use Bailey’s method to find the MLEs of the parameters (, u, v).
![Page 6: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/6.jpg)
Bailey’s Method
Set the expected frequencies equal to the observed proportions and solve the system of equations for the unknown parameters. These are the MLEs.
Example: Suppose you observe 5 successes from a Binomial(10, p) distribution. Then
pmle = 5/10
![Page 7: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/7.jpg)
Additive Segregation – Backcross (contd)
What do you notice about the MLE for recombinant fraction?
Is the MLE for recombinant fraction biased?
N
ffffv
N
ffffu
N
ff
4ˆ
4ˆ
ˆ
22122111
22211211
2112
![Page 8: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/8.jpg)
Additive Segregation – F2-CC
Genotype Expected Frequency
Additive Distortion
AABB 0.25(1-)2 u/3 + v/3
AABb 0.5 u/3 – v/3
AAbb 0.25 u/3
AaBB 0.5(1-) - u/3 + v/3
AaBb 0.5(1-2+22) -u/3 – v/3
Aabb 0.5 (1-) -u/3
aaBB 0.252 v/3
aaBb 0.5 (1-) -v/3
aabb 0.25(1-)2 0
![Page 9: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/9.jpg)
Penetrance Distortion - Backcross
Selection, penetrance, linkage to selected markers all can result in penetrance distortion, thus it is quite common.
Suppose (100xu)% of the genotype aa is misclassified as Aa. Similarly, assume that bb has (100xv)% misclassified as Bb independently.
![Page 10: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/10.jpg)
Penetrance Distortion - Backcross
Gen. Expected Frequency
AaBb P(AaBb)+P(scored as Aa|aaBb)P(aaBb)+P(scored as Bb|Aabb)P(Aabb)+P(scored as AaBb|aabb)P(aabb)
=0.5(1-)+0.5u+0.5v+0.5(a+b)(1-)
=0.5[(u+v)+(1-)(1+uv)]
Aabb
aaBb
aabb
![Page 11: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/11.jpg)
Penetrance Distortion - Backcross
Is the estimate for recombination fraction biased?
The power to detect linkage is decreased.
N
ffffv
N
ffffu
vuN
f
22122111
22211211
22
ˆ
ˆ
11
21ˆ
![Page 12: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/12.jpg)
Cost of Assuming Non-Distortion Model
The estimate for recombination fraction is biased. By how much?
ˆEBias
![Page 13: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/13.jpg)
Overall Impact of Segregation Distortion
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
Distortion (u=v)
Biasrecomb. fraction 0.3
recomb. fraction 0.2
recomb. fraction 0.1
![Page 14: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/14.jpg)
First Project
This slide marks the end of the material that will be needed to complete the first project.
![Page 15: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/15.jpg)
Linkage Analysis for Multiple Loci
The haplotype is the sequence of alleles along one of the chromosomes in an individual.
In multipoint linkage analysis we are not concerned with the alleles at each locus, rather its parental origin.
![Page 16: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/16.jpg)
Recoding Haplotypes
Suppose there are k loci. Recode each haplotype as a string of k-1 of 0’s and 1’s If the ith position is 0, it indicates the (i+1)th
locus is noit recombinant with respect to the ith locus.
If the ith position is 1, it indicates the (i+1)th locus is recombinant with respect o the ith locus.
![Page 17: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/17.jpg)
Recoding Haplotypes (contd)
Haplotype ABC
Recombinant on interval:
Picture
AB AC BC
00 no no no A—B—C
01 no yes yes A—BC
10 yes no yes ABC
11 yes yes no ABC
![Page 18: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/18.jpg)
Recoding Haplotypes (contd)
Haplotype Code
ABxCxD
101
000110
![Page 19: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/19.jpg)
Recoded Haplotypes and Recombination Fractions
1101
1001
1011
AC
BC
AB
111011000
![Page 20: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/20.jpg)
Calculate the probabilities of the four haplotype classes (i.e. 00, 10, 01, 11) when AB = 0.1 and BC = 0.2 and AC is unknown. Assume the Sturt map function with L = 1.
Sample Problem
![Page 21: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/21.jpg)
Plan of Attack
1. Transform recombination fractions to genetic map units using the inverse map function.
2. Sum the genetic map units to obtain length of AC interval.
3. Calculate the recombination fraction between AC using the map function.
4. Solve the set of simultaneous equations for the haplotype frequencies.
![Page 22: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/22.jpg)
Step 1
238.0
108.0
BC
AB
m
m
LLme
L
m /12112
1
![Page 23: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/23.jpg)
Step 2
346.0238.0108.0 BCABAC mmm
![Page 24: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/24.jpg)
Step 3
269.0
346.0112
1
112
1
346.0
/12
e
eL
m
AC
LLm
![Page 25: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/25.jpg)
Step 4
1
269.0
2.0
1.0
11100100
1101
1001
1011
0845.0
1845.0
0155.0
7155.0
11
01
10
00
![Page 26: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/26.jpg)
Phase Known Three Point Analysis
When all gametes in sample are fully informative, then the likelihood is simple.
4
1
logi
iifl
BCAB
ACBCAB
BCAB
ACBCAB
c
cl
l
2
,,
,,
How would youtest for interference?
![Page 27: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/27.jpg)
Multipoint Analysis – A Difficulty
Suppose there are k loci. How many haplotypes are possible? How many recombination fractions are
there?
![Page 28: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/28.jpg)
Recombination Value
Definition: The recombination value of a set of intervals is the probability of an odd number of crossovers occurring in the intervals.
How many sets of intervals are there?
![Page 29: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/29.jpg)
Sample Problem – Four Point Analysis
Suppose loci A, B, C, and D are in syntenic order and AB = 0.1, BC = 0.2, and CD = 0.3.
What are the probabilities of the haplotype classes given the Kosambi map function.
12
14
4
m
m
e
e
![Page 30: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/30.jpg)
The Linear Equations
1111101011110100010001000
111111101011001
110111110011010
,101111100010001
100111110101100
011110100011001
010101100011010
001110101010001
AD
AC
CDAB
AB
BD
BC
CD
![Page 31: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/31.jpg)
Multipoint Likelihood
Can be written in terms of the 2k-1-1 recombination values or haplotype frequencies.
Can be reparameterized as k-1 recombination fractions and 2k-1-k interference parameters.
Then tests for interference are possible. An alternative is to assume a map function with
possibly unknown parameters which constrains the gamete probabilities as functions of the k-1 recombination fractions.
![Page 32: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/32.jpg)
Multilocus-Infeasible Map Functions
Kosambi, Carter-Falconer, and Felsenstein map functions are multilocus-infeasible because they can produce negative gametic frequencies.
The Morgan, Haldane, Sturt and generalized map functions are multilocus-feasible.
Haldane is most often used for its simplicity except when linkage is tight, e.g. m << 0.5.
![Page 33: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/33.jpg)
Map Building
How many possible orders are there for k loci?
10 loci can be ordered in over 1 million ways.
The solution is to generate a small number of probably orders and then analyze these few in depth.
![Page 34: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/34.jpg)
Stepwise Approximate Ordering
Use likelihood analysis to order a few markers, say l.
Add each additional marker one at a time by considering all l-1 positions for it. Choose the location that results in the highest likelihood.
Number of likelihood evaluations: 3+4+5...+k = (k-2)(k+3)/2.
![Page 35: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/35.jpg)
Pairwise Approximate Ordering
Two point linkage analysis on all pairs of loci to obtain a recombination fraction estimate.
Multidimensional scaling analyses (multivariate exploratory analysis) to find approximate orders.
![Page 36: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/36.jpg)
Final Step – Perfecting Order
Test the likelihood of various reorderings of neigboring groups of loci.
If an tested order has higher likelihood, keep it.
etc...
![Page 37: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/37.jpg)
Disease Mapping
Condition on an ordering of all markers except disease locus.
Calculate a multilocus likelihood for each possible position of the disease locus, call this lx.
Calculate the location score 2(lx - l) at point x, where l is the log-likelihood with disease locus unlinked to other markers.
![Page 38: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/38.jpg)
Disease Mapping
Can also calculate multipoint LOD scores by dividing locations scores by 2ln(10).
Plot location score or multipoint LOD score by position x. The peak is the likely position of the disease locus and if the peak exceeds some cut-off criteria linkage to that region is significant.
![Page 39: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/39.jpg)
Multipoint vs. Single Point Disease Mapping
Information from every sampled individual, even those who may be homozygous at the single marker.
Single marker can only provide information about crossovers on one side of the disease gene.
The more markers, the sharper the peak. The disease gene is ultimately mapped to the smallest
interval where there is no observed crossover between marker and disease gene in entire sample.
![Page 40: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/40.jpg)
Sample Size
Assuming no interference, crossovers are distributed exponentially with mean 1 per Morgan.
Sample n individuals and the mean rate is n. Therefore, the expected distance to the nearest
crossover on either side of the disease locus is 1/n. The interval containing disease gene has length
distributed as gamma distribution with mean 2/n. Example: You want to localize disease gene to 1
cM = 1/100 M. Therefore, you need n>200.
![Page 41: Lecture 10: Linkage Analysis III](https://reader034.vdocument.in/reader034/viewer/2022051516/56814521550346895db1e34b/html5/thumbnails/41.jpg)
Summary
Modeling of segregation distortion and the impact on linkage analysis.
Haplotying coding. The use of map functions. Overview of likelihood formulation for
multipoint analysis.