complexity and approximation of the minimum recombinant haplotype configuration problem authors: lan...
Post on 19-Dec-2015
216 views
TRANSCRIPT
Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem
Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang
Outline
Introduction and problem definition
Deciding the complexity of binary-tree-MRHC
Approximation of MRHC with missing data
Approximation of MRHC without missing data
Approximation of bounded MRHC
Conclusion
Introduction
2 2
2 1
1 2
1 1
1 2
Genotype
Haplotype
Locus
2 1 PS value=1
1 2 PS value=0
Basic concepts Mendelian Law: one haplotype comes from the mother and the other comes from the father.
Example: Mendelian experiment
Notations and Recombinant
1111
2222
2222
2222
1111
0 recombinant
2222
FatherMother
: recombinant
1111
2222
2222
2222
1122
2222
1 recombinant
FatherMother
1122
2222
Genotype
1222
2122
Haplotype Configuration
Pedigree
Camilla, Duchess of Cornwall
Peter Phillips Zara Phillips
Diana,Princess of Wales
Prince Williamof Wales
Prince Henry ofWales
PrincessBeatrice of York
PrincessEugenie of York
Lady LouiseWindsor
Prince Charles,Prince of Wales
Princess Anne, Princess Royal
CommanderTimothy Laurence
Prince Andrew,Duke of York
SarahMargaret Ferguson
Prince Edward, Earl of Wessex
Sophie Rhys-Jones
Elizabeth II ofthe United Kingdom
Prince Philip,Duke of Edinburgh
CaptainMark Phillips
An example: British Royal Family
Haplotype Reconstruction - Haplotype: useful, expensive - Genotype: cheaper
1 21 2
1 21 2
M C
1 21 2
1 21 2
1 21 2
M C
1 21 2
(a)
1 21 2
1 22 1
M C
1 21 2
(b)
Reconstruct haplotypes from genotypes
Problem Definition MRHC problem
Given a pedigree and the genotype information for each member, find a haplotype configuration for each member which obeys Mendelian law, s.t. the number of recombinants are minimized.
Problem Definition
Variants of MRHC Tree-MRHC: no mating loop Binary-tree-MRHC: 1 mate, 1 child 2-locus-MRHC: 2 loci 2-locus-MRHC*: 2 loci with missing data
Previous Work The known hardness results for Mendelian law checking
Loop?Multi-allelic?
Hardness
Yes Yes NP-hard [AHI+03]No P [AHI+03]
No P [AHI+03]
The known hardness results for MRHC
NP-hard [LJ03]
P [LJ03]
P [DLJ03]
NP-hard [DLJ03]
2-locus-MRHCTree-MRHC with
bounded #membersTree-MRHC withbounded #loci
Tree-MRHC
Hardness
Our hardness and approximation results
Lower boundof approx.
ratio
Any f(n)
Any f(n)
Any constant
Assumption
P≠ NP
P≠ NP
P≠ NPthe Unique Games
Conjecture[Khot02]
Binary-tree-MRHC
2-locus-MRHC*
Binary-tree-MRHC*
2-locus-MRHC
Hardness
NP
Tree-MRHC Any constant P≠ NP
the Unique GamesConjecture
Upper boundof approx.
ratio
O ( )
The lower boundholds for
2-locus-MRHC*(4,1)
Binary-tree-MRHC*(1,1)
2-locus-MRHC(16,15)
Tree-MRHC(1,u)Tree-MRHC(u,1)
)log(n
Our hardness and approximation results
Lower boundof approx.
ratio
Any f(n)
Any f(n)
Any constant
Assumption
P≠ NP
P≠ NP
P≠ NPthe Unique Games
Conjecture[Khot02]
Binary-tree-MRHC
2-locus-MRHC*
Binary-tree-MRHC*
2-locus-MRHC
Hardness
NP
Tree-MRHC Any constant P≠ NP
the Unique GamesConjecture
Upper boundof approx.
ratio
O ( )
The lower boundholds for
2-locus-MRHC*(4,1)
Binary-tree-MRHC*(1,1)
2-locus-MRHC(16,15)
Tree-MRHC(1,u)Tree-MRHC(u,1)
)log(n
Outline Introduction and problem definition
Deciding the complexity of binary-tree-MRHC
Approximation of MRHC with missing data
Approximation of MRHC without missing data
Approximation of bounded MRHC
Conclusion
A verifier for ≠3SAT (1)
Given a truth assignment for literals in a 3CNF formula
Consistency checking for each variable Satisfiability checking for each clause
Binary-tree-MRHC is NP-hard
(A) C’s genotype
1 21 2
(B) Two haplotype
1 21 2
1 22 1
configurations
1 22 1
1 21 2
1 22 1
1 22 1
1 22 1
1 22 1
1 22 1
1 21 2
1 21 2
M C M MC C
(a) (b) (c)
C can check if M have certain haplotype configuration!!
Binary-tree-MRHC is NP-hardO1 O2 B1
A1
BtAt
Bt+1At+1
Bt+2At+2
Bt+3At+3
Bt+3mAt+3m...
M2
M1
...
Mt-1
Mt
B2A2C1
C2
Ct
Part 1 (#recombinants >=0)
Part 2(#recombinants >=#clauses)
Ct+1
Mt+1Ct+2
Mt+2Ct+3
Mt+3m-1 Ct+3m
Mt+3m
consistencychecking
satisfiabilitychecking
The pedigree
≠3SAT is satisfiable OPT(MRHC)=#clauses
Outline Introduction and problem definition
Deciding the complexity of binary-tree-MRHC
Approximation of MRHC with missing data
Approximation of MRHC without missing data
Approximation of bounded MRHC
Conclusion
Inapproximability of 2-locus -MRHC*
Definition: A minimization problem R cannot be approximated -There is not an approximation algorithm with ratio f(n)
unless P=NP.
-f(n) is any polynomial-time computable function
Fact: If it is NP-hard to decide whether OPT(R)=0, R cannot be approximated unless P=NP.
Inapproximability of 2-locus -MRHC*
1 21 2
x
(A) gadget for variable x
x1 21 2
1 12 1
1 21 2
x
*1 2
2
1 21 2
2 22 2
1 22
1 22
1 21 2
2 22 2
1 11 *
y
z
*
*1 2
2*
2 22 2
zyx (B) gadget for clause
Reduce 3SAT to 2-locus-MRHC*
3SAT is satisfiableOPT(2-locus-MRHC*)=0
2-locus-MRHC* cannot be approximatedunless P=NP!!
False
True1 21 2
1 22 1
Outline Introduction and problem definition
Deciding the complexity of binary-tree-MRHC
Approximation of MRHC with missing data
Approximation of MRHC without missing data
Approximation of bounded MRHC
Conclusion
Upper Bound of 2-locus-MRHC Main idea: use a Boolean variable to capture the configuration;
use clauses to capture the recombinants.
An example
1 21 2
1 21 2
1 11 1
A B
1 22 1
FalseTrue 1 21 2
1 21 2
1 22 1
1 11 1
A B
)( BA
Upper Bound of 2-locus-MRHC The reduction from 2-locus-MRHC to Min 2CNF Deletion
Genotype of theMother (A)
Genotype of theFather (B)
Genotype of the Child (C) 2CNF Constraint
1 11 1
2 22 2 )( )( )(2 BABABA
2 21 1
1 12 2 )( )( )(2 BABABA
1 11 2
1 21 1
2 21 2
1 22 2
1 21 2 )( )( )( )( CBCBCACA
1 21 2
1 21 2
1 11 1
2 22 2 A
2 21 1
1 12 2
A
1 21 2 )( )( CACA
1 21 2
X XY X
Y XX X
Y XX XY XY Y
X XY X
Y XX X
X XX YY YX Y
A
A
A
A
)( )( BABA
Upper Bound of 2-locus-MRHC
)log(n
Recently, Agarwal et al. [STOC05] presented an O ( ) randomized approximation algorithm for Min 2CNF Deletion.
)log(n 2-locus-MRHC has O ( ) approximation
algorithm.
Outline Introduction and problem definition
Deciding the complexity of binary-tree-MRHC
Approximation of MRHC with missing data
Approximation of MRHC without missing data
Approximation of bounded MRHC
Conclusion
Approximation Hardness of bounded MRHC
Bound #mates and #children 2-locus-MRHC: (16,15) 2-locus-MRHC*: (4,1) tree-MRHC: (u,1) or (1,u)
Conclusion
Our hardness and approximation results Lower bound
of approx.ratio
Any f(n)
Any f(n)
Any constant
Assumption
P≠ NP
P≠ NP
P≠ NPthe Unique Games
Conjecture
Binary-tree-MRHC
2-locus-MRHC*
Binary-tree-MRHC*
2-locus-MRHC
Hardness
NP-hard
Tree-MRHC Any constant P≠ NP
the Unique GamesConjecture
Upper boundof approx.
ratio
O ( )
The lower boundholds for
2-locus-MRHC*(4,1)
Binary-tree-MRHC*(1,1)
2-locus-MRHC(16,15)
Tree-MRHC(1,u)Tree-MRHC(u,1)
)log(n
Thanks for your time and
attention!