giuseppe lancia university of udine the phasing of heterozygous traits: algorithms and complexity
TRANSCRIPT
![Page 1: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/1.jpg)
Giuseppe LanciaUniversity of Udine
The phasing of The phasing of heterozygous heterozygous
traits: traits: Algorithms and ComplexityAlgorithms and Complexity
![Page 2: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/2.jpg)
-The genomic age has allowed to look at ourselves in a detailed, comparative way
-All humans are >99% identical at genome level
-Small changes in a genome can make a big difference in how we look and who we are
![Page 3: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/3.jpg)
What makes us different from each other?
The answer is
POLYMORPHISMSPOLYMORPHISMS
![Page 4: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/4.jpg)
This is true for humans
as well as for other species
![Page 5: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/5.jpg)
Polymorphisms are features existing in different“flavours”, that make us all look (and be) different
Examples can be eye-color, blood type, hair, etc…
In fact, polymorphisms in the way we look (phenotyes) are determined by polymorphisms in our genome
![Page 6: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/6.jpg)
For a given polymorhism, say the eye-color, thepossible forms are called alleles
We all inherit two alleles (paternal and maternal)
identical HOMOZYGOUS
If they are
different HETEROZYGOUS
{
![Page 7: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/7.jpg)
mother
father
childHomozygous
![Page 8: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/8.jpg)
mother
father
childHomozygous
mother
father
childHeterozygous
Dominant Recessive
![Page 9: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/9.jpg)
mother
father
childHomozygous
mother
father
childHeterozygous
mother
father
childHomozygous
Dominant Recessive
![Page 10: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/10.jpg)
mother
father
childHomozygous
mother
father
childHeterozygous
mother
father
childHomozygous
Dominant Recessive
![Page 11: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/11.jpg)
mother
father
child
mother
father
child
mother
father
child
??
??
??
??
??
??
![Page 12: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/12.jpg)
mother
father
child
mother
father
child
mother
father
child
??
??
??
??
??
??
![Page 13: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/13.jpg)
SingleSingle NucleotideNucleotidePolymorphismsPolymorphisms
![Page 14: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/14.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
The shortest possible sequence has only 1 nucleotide, hence
SSingle NNucleotide PPolymorphism (SNP)
![Page 15: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/15.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
The shortest possible sequence has only 1 nucleotide, hence
SSingle NNucleotide PPolymorphism (SNP)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
![Page 16: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/16.jpg)
- SNPs are predominant form of human variations
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
- Used for drug design, study disease, forensic, evolutionary...
- On average one every 1,000 bases
![Page 17: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/17.jpg)
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgt
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
- SNPs are predominant form of human variations
- Used for drug design, study disease, forensic, evolutionary...
- On average one every 1,000 bases
![Page 18: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/18.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
- SNPs are predominant form of human variations
- Used for drug design, study disease, forensic, evolutionary...
- On average one every 1,000 bases
![Page 19: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/19.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
![Page 20: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/20.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
GENOTYPEGENOTYPE: “union” of 2 haplotypes
{c}{g,t}
{a,c}{g,t}
{a}{g}
{a}{g,t} {a}{t}
{a,c}{g}
{a,c}{g}
![Page 21: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/21.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
{a,c}{g,t}
{a}{g,t}
{c}{g,t}
{a}{g}
{a}{t}
{a,c}{g}
{a,c}{g}
CHANGE OF SYMBOLSCHANGE OF SYMBOLS: each SNP only two values in a population (bio).
Call them 0 and 1. Also, call 2 the fact that a site is heterozygous
HAPLOTYPEHAPLOTYPE: string over 0, 1GENOTYPEGENOTYPE: string over 0, 1, 2
![Page 22: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/22.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
{a,c}{g,t}
{a}{g,t}
{c}{g,t}
{a}{g}
{a}{t}
{a,c}{g}
{a,c}{g}
CHANGE OF SYMBOLSCHANGE OF SYMBOLS: each SNP only two values in a population (bio).
Call them 0 and 1. Also, call 2 the fact that a site is heterozygous
HAPLOTYPEHAPLOTYPE: string over 0, 1GENOTYPEGENOTYPE: string over 0, 1, 2 where 0={0}, 1={1}, 2={0,1}
![Page 23: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/23.jpg)
10 11
01 10
01 00
11 11
10 00
10 00
10 10
02
22
10
12 11
20
20
CHANGE OF SYMBOLSCHANGE OF SYMBOLS: each SNP only two values in a population (bio).
Call them 0 and 1. Also, call 2 the fact that a site is heterozygous
HAPLOTYPEHAPLOTYPE: string over 0, 1GENOTYPEGENOTYPE: string over 0, 1, 2 where 0={0}, 1={1}, 2={0,1}
![Page 24: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/24.jpg)
10 11
01 10
01 00
11 00
00 10
10 10
02
22
10
12
22
20
0 + 0 =--- 0
1 + 1 =--- 1
0 + 1 + 1 = 0 = --- --- 2 2
ALGEBRA OF HAPLOTYPES:
Homozygous sites Heterozygous (ambiguous) sites
![Page 25: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/25.jpg)
12202
1110110000
1110010001
1100110100
1100010101
Phasing the allelesPhasing the alleles
For k heterozygous (ambiguous) sites, there are 2k-1 possible phasings
![Page 26: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/26.jpg)
THE PHASING (or HAPLOTYPING) PROBLEMTHE PHASING (or HAPLOTYPING) PROBLEM
Given genotypes of k individuals, determine the phasings
of all heterozygous sites.
It is too expensive to determine haplotypes directly
Much cheaper to determine genotypes, and then infer haplotypes in silico:
This yields a set H, of (at most) 2k haplotypes. H is a resolution of G.
![Page 27: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/27.jpg)
The input is GENOTYPE data
00011
11011
21221
22221
11221
INPUT: G = { 11221, 22221, 11011, 21221, 00011 }
![Page 28: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/28.jpg)
The input is GENOTYPE data
1101111101
00011
0001111101
1101101101
1101111011
0001100011
11011
21221
22221
11221
OUTPUT: H = { 11011, 11101, 00011, 01101}
INPUT: G = { 11221, 22221, 11011, 21221, 00011 }
Each genotype is resolved by two haplotypes
We will define some objectives for H
![Page 29: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/29.jpg)
--without objectives/constraints, the haplotyping problem would be (mathematically)trivial
OBJECTIVES
22021 00001 11011
E.g., always put 0 above and 1 below
12022 10000 11011
--the objectives/constraints must be “driven by biology”
![Page 30: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/30.jpg)
2°) 2°) (parsimony): minimize |H|
1°) 1°) Clark’s inference rule
3°) Perfect Phylogeny3°) Perfect Phylogeny
4°) Disease Association4°) Disease Association
OBJECTIVES
![Page 31: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/31.jpg)
Obj: Clark’s ruleObj: Clark’s rule
1st1st
![Page 32: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/32.jpg)
1011001011 +********** =1221001212
known haplotype h
known (ambiguos) genotype g
Inference RuleInference Rule
for a compatible pair h , g
![Page 33: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/33.jpg)
1011001011 +1101001110 =1221001212
known haplotype h
known (ambiguos) genotype g
Inference RuleInference Rule
for a compatible pair h , g
new (derived) haplotype h’
We write h + h’ = g
![Page 34: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/34.jpg)
1st Objective (Clark, 1990)1st Objective (Clark, 1990)
1. Start with H = “bootstrap” haplotypes2. while Clark’s rule applies to a pair (h, g) in H x G3. apply the rule to any such (h, g) obtaining h’4. set H = H + {h’} and G = G - {g}5. end while
![Page 35: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/35.jpg)
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
1st Objective (Clark, 1990)1st Objective (Clark, 1990)
1. Start with H = “bootstrap” haplotypes2. while Clark’s rule applies to a pair (h, g) in H x G3. apply the rule to any such (h, g) obtaining h’4. set H = H + {h’} and G = G - {g}5. end while
![Page 36: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/36.jpg)
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
1st Objective (Clark, 1990)1st Objective (Clark, 1990)
1. Start with H = “bootstrap” haplotypes2. while Clark’s rule applies to a pair (h, g) in H x G3. apply the rule to any such (h, g) obtaining h’4. set H = H + {h’} and G = G - {g}5. end while
0000100022001122
![Page 37: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/37.jpg)
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
1st Objective (Clark, 1990)1st Objective (Clark, 1990)
1. Start with H = “bootstrap” haplotypes2. while Clark’s rule applies to a pair (h, g) in H x G3. apply the rule to any such (h, g) obtaining h’4. set H = H + {h’} and G = G - {g}5. end while
0000100022001122
1100
![Page 38: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/38.jpg)
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
0000100022001122
1100 1111 SUCCESS
1st Objective (Clark, 1990)1st Objective (Clark, 1990)
1. Start with H = “bootstrap” haplotypes2. while Clark’s rule applies to a pair (h, g) in H x G3. apply the rule to any such (h, g) obtaining h’4. set H = H + {h’} and G = G - {g}5. end while
![Page 39: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/39.jpg)
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
1st Objective (Clark, 1990)1st Objective (Clark, 1990)
1. Start with H = “bootstrap” haplotypes2. while Clark’s rule applies to a pair (h, g) in H x G3. apply the rule to any such (h, g) obtaining h’4. set H = H + {h’} and G = G - {g}5. end while
0000100022001122
![Page 40: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/40.jpg)
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
1st Objective (Clark, 1990)1st Objective (Clark, 1990)
1. Start with H = “bootstrap” haplotypes2. while Clark’s rule applies to a pair (h, g) in H x G3. apply the rule to any such (h, g) obtaining h’4. set H = H + {h’} and G = G - {g}5. end while
0000100022001122
0100
![Page 41: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/41.jpg)
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
0000100022001122
0100 FAILURE (can’t resolve 1122 )
1st Objective (Clark, 1990)1st Objective (Clark, 1990)
1. Start with H = “bootstrap” haplotypes2. while Clark’s rule applies to a pair (h, g) in H x G3. apply the rule to any such (h, g) obtaining h’4. set H = H + {h’} and G = G - {g}5. end while
![Page 42: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/42.jpg)
1. Start with H = “bootstrap” haplotypes2. while Clark’s rule applies to a pair (h, g) in H x G3. apply the rule to any such (h, g) obtaining h’4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic: the algorithm could end without explainingall genotypes even if an explanation was possible.
The number of genotypes solved depends on order of application.
1st Objective (Clark, 1990)1st Objective (Clark, 1990)
OBJ: find order of application rule that leaves the fewest elements in GOBJ: find order of application rule that leaves the fewest elements in G
![Page 43: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/43.jpg)
The problem was studied by Gusfield(ISMB 2000, and Journal of Comp. Biol., 2001)
- problem is APX-hard
- it corresponds to finding largest forest in a graph with haplotypes as nodes and arcs for possible derivations
-solved via ILP of exponential-size (practical for small real instances)
![Page 44: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/44.jpg)
Obj: Max ParsimonyObj: Max Parsimony
2nd2nd
![Page 45: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/45.jpg)
- Clark conjectured solution (when found) uses min # of haplotypes
- this is clearly false
- solution with few haplotypes is biologically relevant (as we all descend from a small set of ancestors)
![Page 46: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/46.jpg)
011101
111111
011000
010001
010011
111111
![Page 47: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/47.jpg)
011101
111111
011000
010001
010011
111111
022
222
012
221
011111 022211
012022
012
222
![Page 48: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/48.jpg)
minimize |H|
2nd Objective (parsimony)2nd Objective (parsimony) :
![Page 49: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/49.jpg)
1. The problem is APX-Hard1. The problem is APX-Hard
Reduction from VERTEX-COVER
![Page 50: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/50.jpg)
A
B
C
D E
![Page 51: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/51.jpg)
A
B
C
D E
A B C D E *
![Page 52: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/52.jpg)
A
B
C
D E
A B C D E *
AB BC AE DE AD
![Page 53: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/53.jpg)
A
B
C
D E
A B C D E *
AB BC AE DE AD
A B C D E
![Page 54: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/54.jpg)
A
B
C
D E
A B C D E *
AB 2 2BC 2 2AE 2 2DE 2 2AD 2 2
ABCDE
![Page 55: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/55.jpg)
A
B
C
D E
A B C D E *
AB 2 2BC 2 2AE 2 2DE 2 2AD 2 2
A 0B 0C 0D 0E 0
![Page 56: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/56.jpg)
A
B
C
D E
A B C D E *
AB 2 2 2 BC 2 2 2 AE 2 2 2 DE 2 2 2 AD 2 2 2
A 0 0 B 0 0C 0 0 D 0 0 E 0 0
![Page 57: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/57.jpg)
A
B
C
D E
A B C D E *
AB 2 2 1 1 1 2BC 1 2 2 1 1 2AE 2 1 1 1 2 2DE 1 1 1 2 2 2 AD 2 1 1 2 1 2
A 0 1 1 1 1 0 B 1 0 1 1 1 0C 1 1 0 1 1 0 D 1 1 1 0 1 0 E 1 1 1 1 0 0
![Page 58: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/58.jpg)
A
B
C
D E
A B C D E *
AB 2 2 1 1 1 2BC 1 2 2 1 1 2AE 2 1 1 1 2 2DE 1 1 1 2 2 2 AD 2 1 1 2 1 2
A 0 1 1 1 1 0 B 1 0 1 1 1 0C 1 1 0 1 1 0 D 1 1 1 0 1 0 E 1 1 1 1 0 0
G = (V,E) has a node cover X of size k there is a set H of |V | + k haplotypes that explain all genotypes
![Page 59: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/59.jpg)
A
B
C
D E
A B C D E *
AB 2 2 1 1 1 2BC 1 2 2 1 1 2AE 2 1 1 1 2 2DE 1 1 1 2 2 2 AD 2 1 1 2 1 2
A 0 1 1 1 1 0 B 1 0 1 1 1 0C 1 1 0 1 1 0 D 1 1 1 0 1 0 E 1 1 1 1 0 0
G = (V,E) has a node cover X of size k there is a set H of |V | + k haplotypes that explain all genotypes
![Page 60: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/60.jpg)
A
B
C
D E
A B C D E *
AB 2 2 1 1 1 2BC 1 2 2 1 1 2AE 2 1 1 1 2 2DE 1 1 1 2 2 2 AD 2 1 1 2 1 2
A 0 1 1 1 1 0 B 1 0 1 1 1 0C 1 1 0 1 1 0 D 1 1 1 0 1 0 E 1 1 1 1 0 0 A’ 0 1 1 1 1 1B’ 1 0 1 1 1 1E’ 1 1 1 1 0 1
G = (V,E) has a node cover X of size k there is a set H of |V | + k haplotypes that explain all genotypes
![Page 61: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/61.jpg)
A basic ILP formulation
![Page 62: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/62.jpg)
Expand your input G in all possible ways
220 120 022
A basic ILP formulation
![Page 63: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/63.jpg)
Expand your input G in all possible ways
010 + 100, 000 + 110100 + 110 000 + 011, 001 + 010
220 120 022
A basic ILP formulation
![Page 64: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/64.jpg)
hx
21,hh
hx
yhh 21 ,
Expand your input G in all possible ways
010 + 100, 000 + 110100 + 110 000 + 011, 001 + 010
220 120 022
A basic ILP formulation
![Page 65: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/65.jpg)
The resulting Integer Program (IP1):
![Page 66: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/66.jpg)
![Page 67: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/67.jpg)
Other ILP formulation are possible. E.g. POLY-SIZE ILP formulations
![Page 68: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/68.jpg)
![Page 69: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/69.jpg)
Obj: Perfect PhylogenyObj: Perfect Phylogeny
3rd3rd
![Page 70: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/70.jpg)
- Parsimony does not take into account mutations/evolution of haplotypes
- parsimony is very relialable on “small” haplotype blocks
- when haplotypes are large (span several SNPs, we should consider evolutionionary events and recombination)
- the cleanest model for evolution is the perfect phylogeny
![Page 71: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/71.jpg)
- A phylogeny expalains set of binary features (e.g. flies, has fur…) with a tree
- Leaf nodes are labeled with species
- Each feature labels an edge leading to a subtree that possesses it
3rd objective is based on perfect phylogenyperfect phylogeny
![Page 72: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/72.jpg)
- A phylogeny expalains set of binary features (e.g. flies, has fur…) with a tree
- Leaf nodes are labeled with species
- Each feature labels an edge leading to a subtree that possesses it
has 2 legs
3rd objective is based on perfect phylogenyperfect phylogeny
has tailflies
![Page 73: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/73.jpg)
- A phylogeny expalains set of binary features (e.g. flies, has fur…) with a tree
- Leaf nodes are labeled with species
- Each feature labels an edge leading to a subtree that possesses it
has 2 legs
But…a new species may come along so that noPerfect phylogeny is possible…
has tailflies
![Page 74: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/74.jpg)
TheoremTheorem: such matrix has p.p. iff there is not a 00 4x2 minor 10 01 11
Human 1 0 0
Mouse 0 1 0
Spider 0 0 0
Eagle 1 0 1
two legs
tail
flies
![Page 75: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/75.jpg)
TheoremTheorem: such matrix has p.p. iff there is not a 00 4x2 minor 10 01 11
Human 1 0 0
Mouse 0 1 0
Spider 0 0 0
Eagle 1 0 1
Mickey mouse 1 1 0
two legs
tail
flies
![Page 76: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/76.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
![Page 77: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/77.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
0 1 2 02 1 0 22 0 2 0
![Page 78: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/78.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
0 1 0 00 1 1 01 1 0 10 1 0 01 0 0 00 0 1 0
0 1 2 02 1 0 22 0 2 0
![Page 79: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/79.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
0 1 2 02 1 0 22 0 2 0
0 1 0 00 1 1 01 1 0 10 1 0 0 1 0 0 00 0 1 0
NOT a perfect phylogeny solution !
![Page 80: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/80.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
0 1 2 0 0 1 0 20 0 0 2
![Page 81: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/81.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
0 1 2 0 0 1 0 20 0 0 2
0 1 0 0 0 1 1 00 1 0 0
1 1 0 1 0 0 0 00 0 0 1
A perfect phylogeny
![Page 82: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/82.jpg)
Theorem: The Perfect Phylogeny Haplotyping problem is polynomial
![Page 83: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/83.jpg)
Theorem: The Perfect Phylogeny Haplotyping problem is polynomial
Algorithms are of combinatorial nature
- There is a graph for which SNPs are columns and edges are of two types (forced and free)
- forced edges connect pairs of SNPs that must be phased in the same way
22 00 + 11 or 22 01 + 10
- a complex visit of the graph decides how to phase free SNPs
![Page 84: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/84.jpg)
Obj: Disease AssociationObj: Disease Association
4th4th
![Page 85: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/85.jpg)
Some diseases may be due to a gene which has “faulty” configurations
RECESSIVE DISEASE (e.g. cystic fibrosis, sickle cell anemia): to be diseased one must have both copies faulty. With one copy one is a carrier of the disease
DOMINANT DISEASE (e.g. Huntington’s disease, Marfan’s syndrome): to be diseased it is enough to have one faulty copy
Two individuals of which one is healthy and the other diseased may have the same genotype.
The explanation of the disease lies in a difference in their haplotypes
![Page 86: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/86.jpg)
00011
02011 21221
02201
11221
INPUT: GD = {11221,21221,02011}, GH = {11221,02201,00011}
11221
![Page 87: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/87.jpg)
1101111101
00011
0110100001
1101101101
0101100011
0001100011
02011 21221
02201
11221
OUTPUT: H = { 11011,01011,00001,11111,11101,00011,01101}
H contains HD, s.t. each diseased has >=1 haplotype in HD and each healty none
INPUT: GD = {11221,21221,02011}, GH = {11221,02201,00011}
1100111111
11221
![Page 88: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/88.jpg)
![Page 89: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/89.jpg)
Theorem 1 is proved via a reduction from 3 SAT
Theorem 2 has a mathematical proof (coloring argument) with little relation to biology:There is R (depending on input) s.t. a haplotype is healthy if the sum of its bits is congruent to R modulo 3
This means the model must be refined!
![Page 90: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/90.jpg)
![Page 91: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/91.jpg)
Summary:
- haplotyping in-silico needed for economical reasons
- several objectives, all biologically driven
- nice combinatorial problems (mostly from binary nature of SNPs)
- these problems are technology-dependant and may become obsolete (hopefully after we have retired)
![Page 92: Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity](https://reader034.vdocument.in/reader034/viewer/2022051820/56649e4a5503460f94b3e6a1/html5/thumbnails/92.jpg)
ThanksThanks