finding bit patterns applying haplotype models to association study design natalie castellana kedar...
Post on 21-Dec-2015
214 views
TRANSCRIPT
![Page 1: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/1.jpg)
Finding Bit
Patterns
Applying haplotype models to association study design
Natalie CastellanaKedar DhamdhereRussell Schwartz
August 16, 2005
![Page 2: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/2.jpg)
10000010100010010
00010100101101001
01101101001000010
10101011111000010
Problem: Applying haplotype models
Input:
Output:a set of recurring patterns of the form
(start column, end column, pattern)
(14,17,“0010”)
![Page 3: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/3.jpg)
Major Allele
Minor allele
Background
SNPHaplotype
Association TestGiven that this sample has haplotype 1101, does it have the disease?
1000011010110100000010
![Page 4: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/4.jpg)
…1110101…
…1000011…
Genetic Variation
Mutation:
…1000001…
Recombination:
…1110011…
…1000101…
…1001001…
Because of recombination, similar genetic variation can be found within closely linked regions.
![Page 5: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/5.jpg)
Controls:
Cases:
Data Sets
Download from
HapMap.org
Generate using MS
Apply Disease
Model
Apply Haplotype
Model
Perform Association
Tests
10010011101
1001001010110001110100
01100101101
Input: 1001001010110
1001001110100
0110010110100
1000111010010
![Page 6: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/6.jpg)
Go through each SNP and determine which SNP’s accurately predict which samples have the disease and which do not.
Case: 0 0 1 1 0 1 0 1
0 1 0 1 0 0 0 0
0 0 1 1 1 0 0 0
Control: 0 0 0 0 1 0 1 0
0 0 1 0 0 1 1 0
1 1 1 0 0 0 0 1
Testing individual SNP’s
![Page 7: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/7.jpg)
Haplotype block method
Instead of looking at each individual SNP, we can look at groups of contiguous SNP’s.
1101000000…11…
1101100100…01…
0111000000…10…
1101100100…00…
![Page 8: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/8.jpg)
Haplotype motif method
Notion that a sequence is the concatenation of segments (like the block method) but does not require conservation of boundaries.
1101000000…1100100100…0111000000…1101100111…
![Page 9: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/9.jpg)
Approximation Algorithm
General idea:
10000100…………………………………
00011100…………………………………
11011110…………………………………
01010110…………………………………
c c c cc c c c
Pick the best partition, minimizing the number of motifs needed to explain all the data.
![Page 10: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/10.jpg)
Finding Motifs
C
0 1 1 0 1 0 0 1 1 0 0 0 1 1 0 0 1
000…000 000..100
0 1
……… 111…111
![Page 11: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/11.jpg)
Problems
Really, really, really slow
Took over a week to partition our biggest data set.
Added a ‘max leaves explored’ feature.Useless for larger c.
![Page 12: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/12.jpg)
Real Data
0
0.2
0.4
0.6
0.8
1
55 60 65 70 75 80 85 90 95 100
Penetrance parameter (p)
Corr
ect
In
fere
nce
s
single SNP
Bounded Block
4-gamete Block
Bounded Block htSNP
4-gamete Block htSNP
Motif htSNP
Motif Approx
![Page 13: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/13.jpg)
Simulated Data
0
0.2
0.4
0.6
0.8
1
50 55 60 65 70 75 80 85 90 95 100
Penetrance parameter (p)
Corr
ect
In
fere
nce
s
single SNP
Bounded Block
4- gamete Block
Bounded Block htSNP
4- gamete Block htSNP
Motif htSNP
Motif Approx
![Page 14: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/14.jpg)
False Positives
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
3 2.5 2 1.5 1
LOD Cutoff
Fals
e P
osi
tive R
ate
single SNP
Bounded Block
4- gamete Block
Bounded Block htSNP
4- gamete Block htSNP
Motif htSNP
Motif Approx
Expectation
![Page 15: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/15.jpg)
General Linear Program
Objective Function: minimize: x + y + zConstraints: x + y <= 2 1 1 0 x 2 x +2z <= 5 1 0 2 * y <= 5 z 0 <= x <= 3 0 <= y <= Inf -Inf <= z <= 0
![Page 16: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/16.jpg)
A Linear Program
Input: A matrix with M rows and N columns
Output: The minimum number of motifs.
![Page 17: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/17.jpg)
Variables
X’s: each x corresponds to a motif
Define a motif by a tuple:
(start column, end column, string pattern)
Y’s: each y corresponds to a row partition
Define a row partition by a set of motifs:
{(1,e1,“…”),(e1+1,e2,“…”),...,(en,N,“…”)}
![Page 18: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/18.jpg)
Constraints
Exactly one partition must be chosen per row.
If a motif used in a row partition is not chosen, then the row partition may not be chosen.
Minimize the sum of all X’s.
![Page 19: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/19.jpg)
Example
10001101
X’s: (1,1,“1”),(1,2,“10”),(1,3,“100”), etc.
Y’s: (1,1,“1”),(1,8,“0001101”)
(1,2,“10”),(3,3,“0”),(4,8,“01101”)
![Page 20: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/20.jpg)
Constraint Matrix(1)
all X’s all Y’s
(1,1,“1”) (1,1,“0”)…(1,2,“10”) Y_1 Y_2 …
Row 1 0 0 … 0 1 1 …
Row 2 0 0 … 0 0 0 …
Row 3 0 0 … 0 1 1 …
..
Row M 0 0 0
Y_1 := (1,1,“1”),(1,8,“0001101”)
Y_2 := (1,2,“10”),(3,3,“0”),(4,8,“01101”)
Exactly one row partition must be chosen per row.
=1
=1
=1
…
=1
![Page 21: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/21.jpg)
Constraint Matrix(2)
If a motif used in a row partition is not chosen, then the row partition may not be chosen.
all X’s all Y’s
(1,1,“1”) (1,1,“0”)…(1,2,“10”) Y_1 Y_2 …
Row i: (1,1,“1”) 1 0 … 0 -1 0 …
(1,2,“10”) 0 0 … 1 0 -1 …
(1,3,“100”) 0 0 … 0 0 0 …
.. … … … … … … …
(8,8,“1”) 0 0 … 0 0 0
Y_1 := (1,1,“1”),(1,8,“0001101”)
Y_2 := (1,2,“10”),(3,3,“0”),(4,8,“01101”)
>=0
>=0
>=0
…
>=0
![Page 22: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/22.jpg)
Constraint Matrix x’s y’s
1 K K+1 K+P
0 1 0 0 0 0 0 …0 0 0 0 1 1 1 0 0 0 0…. 0 ** Constraint 1 ** 2 0 0 0 0 0 …0 0 0 0 1 0 0 1 1 1 0…. 0 == 1 … M 0 0 0 0 0 …0 0 0 0 0 0 1 0 0 0 1…. 1
1 1 1 0 0 0 0 …0 0 0 0 -1 0 0 0 ….0 0 ** Constraint 2 ** 2 0 1 0 0 0 …0 0 0 0 -1 -1 0 0….-1 0 >= 0 … K_1 0 0 1 0 0 …0 0 0 0 0 0 0 0 ….0 0
. . . M
Where K is the number of unique motifs, K_i is the number of motifs appearing in row i,
and P is the number of unique partitions
![Page 23: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/23.jpg)
Problems
Each row has N(N+1)/2 motifs. So there will be a polynomial number of X’s. Good!
Each row can be partitioned in 2^(N-1) ways. So there will be an exponential number of Y’s. Bad!
Solution: column generation
![Page 24: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/24.jpg)
Column generation
We find the optimal solution to the problem which contains all X’s and only some of the Y’s.
Then we see if adding any Y’s would improve the solution.
![Page 25: Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5e5503460f94a3de88/html5/thumbnails/25.jpg)
Where are we now?
Where are we going?