complexity and approximation of the minimum recombinant haplotype configuration problem authors: lan...

26
Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Post on 19-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem

Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Page 2: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Outline

Introduction and problem definition

Deciding the complexity of binary-tree-MRHC

Approximation of MRHC with missing data

Approximation of MRHC without missing data

Approximation of bounded MRHC

Conclusion

Page 3: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Introduction

2 2

2 1

1 2

1 1

1 2

Genotype

Haplotype

Locus

2 1 PS value=1

1 2 PS value=0

Basic concepts Mendelian Law: one haplotype comes from the mother and the other comes from the father.

Example: Mendelian experiment

Page 4: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Notations and Recombinant

1111

2222

2222

2222

1111

0 recombinant

2222

FatherMother

: recombinant

1111

2222

2222

2222

1122

2222

1 recombinant

FatherMother

1122

2222

Genotype

1222

2122

Haplotype Configuration

Page 5: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Pedigree

Camilla, Duchess of Cornwall

Peter Phillips Zara Phillips

Diana,Princess of Wales

Prince Williamof Wales

Prince Henry ofWales

PrincessBeatrice of York

PrincessEugenie of York

Lady LouiseWindsor

Prince Charles,Prince of Wales

Princess Anne, Princess Royal

CommanderTimothy Laurence

Prince Andrew,Duke of York

SarahMargaret Ferguson

Prince Edward, Earl of Wessex

Sophie Rhys-Jones

Elizabeth II ofthe United Kingdom

Prince Philip,Duke of Edinburgh

CaptainMark Phillips

An example: British Royal Family

Page 6: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Haplotype Reconstruction - Haplotype: useful, expensive - Genotype: cheaper

1 21 2

1 21 2

M C

1 21 2

1 21 2

1 21 2

M C

1 21 2

(a)

1 21 2

1 22 1

M C

1 21 2

(b)

Reconstruct haplotypes from genotypes

Page 7: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Problem Definition MRHC problem

Given a pedigree and the genotype information for each member, find a haplotype configuration for each member which obeys Mendelian law, s.t. the number of recombinants are minimized.

Page 8: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Problem Definition

Variants of MRHC Tree-MRHC: no mating loop Binary-tree-MRHC: 1 mate, 1 child 2-locus-MRHC: 2 loci 2-locus-MRHC*: 2 loci with missing data

Page 9: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Previous Work The known hardness results for Mendelian law checking

Loop?Multi-allelic?

Hardness

Yes Yes NP-hard [AHI+03]No P [AHI+03]

No P [AHI+03]

The known hardness results for MRHC

NP-hard [LJ03]

P [LJ03]

P [DLJ03]

NP-hard [DLJ03]

2-locus-MRHCTree-MRHC with

bounded #membersTree-MRHC withbounded #loci

Tree-MRHC

Hardness

Page 10: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Our hardness and approximation results

Lower boundof approx.

ratio

Any f(n)

Any f(n)

Any constant

Assumption

P≠ NP

P≠ NP

P≠ NPthe Unique Games

Conjecture[Khot02]

Binary-tree-MRHC

2-locus-MRHC*

Binary-tree-MRHC*

2-locus-MRHC

Hardness

NP

Tree-MRHC Any constant P≠ NP

the Unique GamesConjecture

Upper boundof approx.

ratio

O ( )

The lower boundholds for

2-locus-MRHC*(4,1)

Binary-tree-MRHC*(1,1)

2-locus-MRHC(16,15)

Tree-MRHC(1,u)Tree-MRHC(u,1)

)log(n

Page 11: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Our hardness and approximation results

Lower boundof approx.

ratio

Any f(n)

Any f(n)

Any constant

Assumption

P≠ NP

P≠ NP

P≠ NPthe Unique Games

Conjecture[Khot02]

Binary-tree-MRHC

2-locus-MRHC*

Binary-tree-MRHC*

2-locus-MRHC

Hardness

NP

Tree-MRHC Any constant P≠ NP

the Unique GamesConjecture

Upper boundof approx.

ratio

O ( )

The lower boundholds for

2-locus-MRHC*(4,1)

Binary-tree-MRHC*(1,1)

2-locus-MRHC(16,15)

Tree-MRHC(1,u)Tree-MRHC(u,1)

)log(n

Page 12: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Outline Introduction and problem definition

Deciding the complexity of binary-tree-MRHC

Approximation of MRHC with missing data

Approximation of MRHC without missing data

Approximation of bounded MRHC

Conclusion

Page 13: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

A verifier for ≠3SAT (1)

Given a truth assignment for literals in a 3CNF formula

Consistency checking for each variable Satisfiability checking for each clause

Page 14: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Binary-tree-MRHC is NP-hard

(A) C’s genotype

1 21 2

(B) Two haplotype

1 21 2

1 22 1

configurations

1 22 1

1 21 2

1 22 1

1 22 1

1 22 1

1 22 1

1 22 1

1 21 2

1 21 2

M C M MC C

(a) (b) (c)

C can check if M have certain haplotype configuration!!

Page 15: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Binary-tree-MRHC is NP-hardO1 O2 B1

A1

BtAt

Bt+1At+1

Bt+2At+2

Bt+3At+3

Bt+3mAt+3m...

M2

M1

...

Mt-1

Mt

B2A2C1

C2

Ct

Part 1 (#recombinants >=0)

Part 2(#recombinants >=#clauses)

Ct+1

Mt+1Ct+2

Mt+2Ct+3

Mt+3m-1 Ct+3m

Mt+3m

consistencychecking

satisfiabilitychecking

The pedigree

≠3SAT is satisfiable OPT(MRHC)=#clauses

Page 16: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Outline Introduction and problem definition

Deciding the complexity of binary-tree-MRHC

Approximation of MRHC with missing data

Approximation of MRHC without missing data

Approximation of bounded MRHC

Conclusion

Page 17: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Inapproximability of 2-locus -MRHC*

Definition: A minimization problem R cannot be approximated -There is not an approximation algorithm with ratio f(n)

unless P=NP.

-f(n) is any polynomial-time computable function

Fact: If it is NP-hard to decide whether OPT(R)=0, R cannot be approximated unless P=NP.

Page 18: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Inapproximability of 2-locus -MRHC*

1 21 2

x

(A) gadget for variable x

x1 21 2

1 12 1

1 21 2

x

*1 2

2

1 21 2

2 22 2

1 22

1 22

1 21 2

2 22 2

1 11 *

y

z

*

*1 2

2*

2 22 2

zyx (B) gadget for clause

Reduce 3SAT to 2-locus-MRHC*

3SAT is satisfiableOPT(2-locus-MRHC*)=0

2-locus-MRHC* cannot be approximatedunless P=NP!!

False

True1 21 2

1 22 1

Page 19: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Outline Introduction and problem definition

Deciding the complexity of binary-tree-MRHC

Approximation of MRHC with missing data

Approximation of MRHC without missing data

Approximation of bounded MRHC

Conclusion

Page 20: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Upper Bound of 2-locus-MRHC Main idea: use a Boolean variable to capture the configuration;

use clauses to capture the recombinants.

An example

1 21 2

1 21 2

1 11 1

A B

1 22 1

FalseTrue 1 21 2

1 21 2

1 22 1

1 11 1

A B

)( BA

Page 21: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Upper Bound of 2-locus-MRHC The reduction from 2-locus-MRHC to Min 2CNF Deletion

Genotype of theMother (A)

Genotype of theFather (B)

Genotype of the Child (C) 2CNF Constraint

1 11 1

2 22 2 )( )( )(2 BABABA

2 21 1

1 12 2 )( )( )(2 BABABA

1 11 2

1 21 1

2 21 2

1 22 2

1 21 2 )( )( )( )( CBCBCACA

1 21 2

1 21 2

1 11 1

2 22 2 A

2 21 1

1 12 2

A

1 21 2 )( )( CACA

1 21 2

X XY X

Y XX X

Y XX XY XY Y

X XY X

Y XX X

X XX YY YX Y

A

A

A

A

)( )( BABA

Page 22: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Upper Bound of 2-locus-MRHC

)log(n

Recently, Agarwal et al. [STOC05] presented an O ( ) randomized approximation algorithm for Min 2CNF Deletion.

)log(n 2-locus-MRHC has O ( ) approximation

algorithm.

Page 23: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Outline Introduction and problem definition

Deciding the complexity of binary-tree-MRHC

Approximation of MRHC with missing data

Approximation of MRHC without missing data

Approximation of bounded MRHC

Conclusion

Page 24: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Approximation Hardness of bounded MRHC

Bound #mates and #children 2-locus-MRHC: (16,15) 2-locus-MRHC*: (4,1) tree-MRHC: (u,1) or (1,u)

Page 25: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Conclusion

Our hardness and approximation results Lower bound

of approx.ratio

Any f(n)

Any f(n)

Any constant

Assumption

P≠ NP

P≠ NP

P≠ NPthe Unique Games

Conjecture

Binary-tree-MRHC

2-locus-MRHC*

Binary-tree-MRHC*

2-locus-MRHC

Hardness

NP-hard

Tree-MRHC Any constant P≠ NP

the Unique GamesConjecture

Upper boundof approx.

ratio

O ( )

The lower boundholds for

2-locus-MRHC*(4,1)

Binary-tree-MRHC*(1,1)

2-locus-MRHC(16,15)

Tree-MRHC(1,u)Tree-MRHC(u,1)

)log(n

Page 26: Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang

Thanks for your time and

attention!