another problem with variants on either side of p vs. np...

60
Another problem with variants on either side of P vs. NP divide Leonid Chindelevitch 28 March 2019 Theory Seminar, Spring 2019, Simon Fraser University

Upload: others

Post on 10-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Another problem with variants on either side

of P vs. NP divide

Leonid Chindelevitch

28 March 2019

Theory Seminar, Spring 2019, Simon Fraser University

Page 2: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Background

Page 3: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

From mice to men through genome rearrangements

Mouse X-Chromosome

Human X-Chromosome 1

Page 4: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Ancestral Reconstruction

M3

A B

M1

C D

M2

Input: Tree and genomes A,B,C ,D

Output: Ancestral genomes M1,M2,M3

2

Page 5: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

The Median of Three

A

B C

M

Input: Genomes A,B,C

Output: Genome M (the median, AKA the lowest common ancestor)

which minimizes

d(A,M) + d(B,M) + d(C ,M)

3

Page 6: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Genome Elements

Adjacencies: {ah, bh}, {bt , ct}; telomeres: at , ch

4

Page 7: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Genome Representations

atah btbh ctch

at ah bt bh ct ch

at 1 0 0 0 0 0

ah 0 0 0 1 0 0

bt 0 0 0 0 1 0

bh 0 1 0 0 0 0

ct 0 0 1 0 0 0

ch 0 0 0 0 0 1

This is a genome matrix.

Genome matrices can be represented by involutions: (ah bh)(bt ct).

5

Page 8: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Genome Representations

atah btbh ctch

at ah bt bh ct ch

at 1 0 0 0 0 0

ah 0 0 0 1 0 0

bt 0 0 0 0 1 0

bh 0 1 0 0 0 0

ct 0 0 1 0 0 0

ch 0 0 0 0 0 1

This is a genome matrix.

Genome matrices can be represented by involutions: (ah bh)(bt ct).

5

Page 9: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Genome Representations

atah btbh ctch

at ah bt bh ct ch

at 1 0 0 0 0 0

ah 0 0 0 1 0 0

bt 0 0 0 0 1 0

bh 0 1 0 0 0 0

ct 0 0 1 0 0 0

ch 0 0 0 0 0 1

This is a genome matrix.

Genome matrices can be represented by involutions: (ah bh)(bt ct).

5

Page 10: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Properties of Genome Matrices

• binary matrices that satisfy A = AT = A−1

• even dimension n (but we can relax this assumption)

orthogonal

binary symmetric

6

Page 11: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Rank Distance

The rank distance between two genome matrices is the rank of their

difference

d(A,B) = r(A− B)

Properties

• d(A,B) ≥ 0; d(A,B) = 0 if and only if A = B

• d(A,B) = d(B,A)

• d(A,C ) ≤ d(A,B) + d(B,C )

This is a metric on the space of genome matrices (and matrices in

general).

7

Page 12: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Rank Distance

The rank distance between two genome matrices is the rank of their

difference

d(A,B) = r(A− B)

Properties

• d(A,B) ≥ 0; d(A,B) = 0 if and only if A = B

• d(A,B) = d(B,A)

• d(A,C ) ≤ d(A,B) + d(B,C )

This is a metric on the space of genome matrices (and matrices in

general).

7

Page 13: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Equivalence of Rank Distance and the Cayley Distance

Lemma

Consider permutations matrices P,Q, with permutation representations

π, τ ∈ Sn, respectively. Then

d(P,Q) = ||τπ−1||

where || · || is the minimum number of cycles in a 2-cycle decomposition.

Remark

|| · || is a metric on permutations, also referred to as the Cayley distance.

Remark

We may as well work with involutions in Sn instead of genome matrices.

8

Page 14: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Equivalence of Rank Distance and the Cayley Distance

Lemma

Consider permutations matrices P,Q, with permutation representations

π, τ ∈ Sn, respectively. Then

d(P,Q) = ||τπ−1||

where || · || is the minimum number of cycles in a 2-cycle decomposition.

Remark

|| · || is a metric on permutations, also referred to as the Cayley distance.

Remark

We may as well work with involutions in Sn instead of genome matrices.

8

Page 15: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Equivalence of Rank Distance and the Cayley Distance

Lemma

Consider permutations matrices P,Q, with permutation representations

π, τ ∈ Sn, respectively. Then

d(P,Q) = ||τπ−1||

where || · || is the minimum number of cycles in a 2-cycle decomposition.

Remark

|| · || is a metric on permutations, also referred to as the Cayley distance.

Remark

We may as well work with involutions in Sn instead of genome matrices.

8

Page 16: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

The Rank Median Problem

A

B C

M

Input: Genome Matrices A,B,C

Output: Matrix M (the median) which minimizes

s(M;A,B,C ) = d(A,M) + d(B,M) + d(C ,M)

What kind of matrix should M be?

9

Page 17: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

The Rank Median Problem

A

B C

M

Input: Genome Matrices A,B,C

Output: Matrix M (the median) which minimizes

s(M;A,B,C ) = d(A,M) + d(B,M) + d(C ,M)

What kind of matrix should M be?

9

Page 18: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Types of medians

[0 1 0 0

1 0 0 0

0 0 0 1

0 0 1 0

][0 0 0 1

0 0 1 0

0 1 0 0

1 0 0 0

][0 0 1 0

0 0 0 1

1 0 0 0

0 1 0 0

]

Generalized median: minimizer of d(A,M) + d(B,M) + d(C ,M) over all

real valued matrices[− 1

212

12

12

12

− 12

12

12

12

12

− 12

12

12

12

12

− 12

]

Genome median: minimizer of d(A,M) + d(B,M) + d(C ,M) over all

genome matrices[0 0 1 0

0 0 0 1

1 0 0 0

0 1 0 0

]

10

Page 19: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Types of medians

[0 1 0 0

1 0 0 0

0 0 0 1

0 0 1 0

][0 0 0 1

0 0 1 0

0 1 0 0

1 0 0 0

][0 0 1 0

0 0 0 1

1 0 0 0

0 1 0 0

]

Generalized median: minimizer of d(A,M) + d(B,M) + d(C ,M) over all

real valued matrices[− 1

212

12

12

12

− 12

12

12

12

12

− 12

12

12

12

12

− 12

]

Genome median: minimizer of d(A,M) + d(B,M) + d(C ,M) over all

genome matrices[0 0 1 0

0 0 0 1

1 0 0 0

0 1 0 0

]

10

Page 20: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Types of medians

[0 1 0 0

1 0 0 0

0 0 0 1

0 0 1 0

][0 0 0 1

0 0 1 0

0 1 0 0

1 0 0 0

][0 0 1 0

0 0 0 1

1 0 0 0

0 1 0 0

]

Generalized median: minimizer of d(A,M) + d(B,M) + d(C ,M) over all

real valued matrices[− 1

212

12

12

12

− 12

12

12

12

12

− 12

12

12

12

12

− 12

]

Genome median: minimizer of d(A,M) + d(B,M) + d(C ,M) over all

genome matrices[0 0 1 0

0 0 0 1

1 0 0 0

0 1 0 0

]

10

Page 21: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

P, NP, and NP-hard

11

Page 22: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Problems with variants on both sides of the P vs. NP divide

Problem type P variant NP-hard variant

Cover Edge cover Vertex cover

Satisfiability 2-CNF-SAT 3-CNF-SAT

Graph mapping Graph isomorphism Subgraph isomorphism

Optimization Linear programming Integer programming

Median-of-three Generalized median Genome median

12

Page 23: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

NP-hard

NP-hard is the set of problems which are ”at least as hard as hardest

problems in NP”.

i.e. there is a polynomial time reduction from any problem L ∈ NP to

H ∈ NP-hard.

H problem solver

solution to I ′

instance I ′of Hfast L-to-H converter

fast H-to-L converter

instance I of L

solution to I

L problem solver

13

Page 24: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

APX-hard

APX is the set of problems which have polynomial time constant-factor

approximation algorithms.

APX-hard is the set of problems where there exists a polynomial time

approximation scheme reduction from any problem L ∈ APX to any

problem H ∈ APX-hard.

H approximater

ε-approximation of I ′

instance I ′of Hfast L-to-H converter

fast H-to-L converter

instance I of L

βε-approximation of I

L approximater

14

Page 25: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Computational Complexity

15

Page 26: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

The Generalized Median problem

Page 27: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Properties of the Median

• Lower Bound

d(M,A) + d(M,B) + d(M,C ) ≥ d(A,B) + d(B,C ) + d(C ,A)

2:= β

• At least one of the “corners” (input genomes) is a 43 approximation

of the median

• The lower bound is achieved if and only if

d(M,A) =d(A,B) + d(C ,A)− d(B,C )

2

and likewise for d(M,B) and d(M,C ).

• Not every A,B,C can achieve the lower bound β, e.g.:

A =

(−1 0

0 −1

),B =

(0 0

0 0

),C =

(1 0

0 1

).

16

Page 28: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Approximating Matrix Medians

• Interesting Property

Theorem

For any three n× n matrices A, B, and C there is a median M satisfying:

for all vectors v ∈ Rn such that Av = Bv = Cv , we have Mv = Av .

• We define the invariant α := dim({v |Av = Bv = Cv}).

• For permutations, this can be computed in O(n) time via graph

union.

• Can we say the same if we have Av = Bv? We don’t know [yes for

orthogonal A,B,C ].

• However, we can act on this idea.

17

Page 29: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Subspace decomposition

18

Page 30: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Approximation Algorithm

V1

S1

P1

V2

S2

P2

V3

S3

P3

V4

S4

P4

V5

S5

P5

Subspaces

Orthonormal Bases

Projection Matrices

Median Candidates

MA = AP1 + AP2 + BP3 + AP4 + AP5

MB = BP1 + BP2 + BP3 + AP4 + BP5

MC = CP1 + BP2 + CP3 + CP4 + CP5

• 43 approximation factor for genome matrices

• if V5 = {0} then each candidate is a median (its score is β)

• In general, dim(V5) := 2δ, where δ := α + β − n is called the

“deficiency” of the triplet A,B,C .

19

Page 31: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Some recently proven theorems

MI := AP1 + AP2 + BP3 + AP4 + P5

Theorem: MI is a median for any genomic inputs A,B,C .

Theorem: MI = I + ([AV1,AV2,BV3,AV4]− V14)(V T14V14)−1V T

14.

Corollary: It is possible to compute MI in O(nω) time, where ω is

Strassen’s exponent, in exact or floating-point arithmetic.

Theorem: The matrix MI is always symmetric and orthogonal for

genomic inputs A,B,C .

Special case: If A = I , then δ = 0, so MA = MB = MC = MI and each

one has a score of β.

20

Page 32: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

An even faster, O(n2), algorithm when δ = 0

Theorem: If a matrix M satisfies

d(A,M) + d(M,B) = d(A,B),

then there exists a projection matrix P such that

M = A + P(B − A).

• We can ignore the condition that P is a projection matrix.

• This yields the system

M = A + P(B − A) = B + Q(C − B) = C + R(A− C ),

from which we eliminate M and any redundancies.

• It splits into n linear systems with the same left-hand side.

• If A,B,C are permutations, δ = 0, each equation has 2 variables;

the Aspvall-Shiloach algorithm solves such systems in O(n) time.

21

Page 33: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Rarity of the special case δ = 0

Theorem: The fraction of triples with δ = 0 goes to 0 as n→∞.

Proof: This follows directly from a result in analytic combinatorics.

0.5

0.6

0.7

0.8

4 6 8 10

n

Fra

ctio

n

Upper bounds and exact fractions of triples with deficiency 0

22

Page 34: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Challenges with computing V5

Observation: A basis for the space im(A− B) ∩ im(B − C ) can be

computed in O(n log n).

Proof sketch: Let P,Q be the cycle partitions of A−1B,C−1B.

Create a multigraph G with vertices P ∪ Q and edges for all i ∈ [n].

Each parallel edge i , j gives a vector ei − ej ∈ im(A− B) ∩ im(B − C ).

Removing those to get a connected graph G ′ whose cycle basis B can be

computed from a spanning tree.

Since G ′ is bipartite, each cycle C ∈ B gives rise to the vector

χ(C+)− χ(C−) ∈ im(A− B) ∩ im(B − C ).

Difficulty: V5 = im(A− B) ∩ im(B − C ) ∩ im(C − A) may have no nice

basis; this construction fails when generalized to hypergraphs.

23

Page 35: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

A quartic algorithm for orthogonal matrices

while d(A,B) + d(B,C ) > d(A,C )

find u ∈ im(A− B) ∩ im(B − C );

B ←(I − 2

uuT

uTu

)B.

Remark

The transformation which multiplies a matrix on the left by I − 2 uuT

uT uis

called a Householder reflection, and is frequently used in numerical

analysis.

24

Page 36: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Complexity of Rank Median Problems

PGeneralized Rank

Median Problem

NP

NP-hardGenome Rank

Median Problem

????

25

Page 37: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

The Genome Median Problem

Page 38: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Genome Rank Median Problem is NP-hard and APX-hard

Theorem

The genome rank median problem of three genomes (GMP) is NP-hard

and APX-hard.

Proof.

By reduction from the breakpoint graph decomposition problem

(BGD).

26

Page 39: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Breakpoint Graph Decomposition Problem

Objective (NP-hard): find a maximum alternating cycle decomposition Cof a balanced bicolored graph G .

Objective (APX-hard): find an alternating cycle decomposition C of a

balanced bicolored graph G which minimizes |B| − |C|.

x

v

w

s

y

t

z

27

Page 40: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

BGD Reduction Plan

GMP problem solver

solution to I ′

instance I ′of GMPfast BGD-to-GMP converter

fast GMP-to-BGD converter

instance G of BGD

max cycle decomp C

BGD problem solver

28

Page 41: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Transforming BGD into GMP

x

v

w

s

y

t

z

−x

−v

−w

−s

−y

−t

−z

G −G

x

v ′

w

s

y

t

z

−x

−v ′

−w

−s

−y

−t

−z

v ′′ −v ′′

29

Page 42: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Transforming BGD into GMP

x

v

w

s

y

t

z

−x

−v

−w

−s

−y

−t

−z

G −G

x

v ′

w

s

y

t

z

−x

−v ′

−w

−s

−y

−t

−z

v ′′ −v ′′

29

Page 43: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Transforming BGD into GMP

x

v ′

w

s

y

t

z

−x

−v ′

−w

−s

−y

−t

−z

v ′′ −v ′′

π1 = id

π2 = (v ′ v ′′)(−v ′ − v ′′)

π3 = (v ′ x s w)(t y v ′′ z)(−w − s − x − v ′)(−z − v ′′ − y − t)

30

Page 44: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Aside: Canonical medians

A canonical median mc is a median of π1, π2, and π3 which contains

cycles only from π2.

π1 = id

π2 = (v ′ v ′′)(−v ′ − v ′′)

π3 = (v ′ x s w)(t y v ′′ z)(−w − s − x − v ′)(−z − v ′′ − y − t)

mc = (v ′ v ′′)

Lemma

Medians of π1, π2, π3 can be transformed into canonical medians in

polynomial time.

Lemma

Canonical medians are in bijection to maximum cycle decompositions of

G .

31

Page 45: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Aside: Canonical medians

mc = m(−m)

x

v

w

s

y

t

z

−x

−v

−w

−s

−y

−t

−z

G −G

m ⇔ max cycle decomp C −m ⇔ max cycle decomp -C

32

Page 46: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Transforming BGD into GMP

Γ = (v ′ − v ′)(v ′′ − v ′′)(s − s)(t − t)(w −w)(x − x)(z − z)(y − y).

π1 = id

π2 = (v ′ v ′′)(−v ′ − v ′′)

π3 = (v ′ x s w)(t y v ′′ z)(−w − s − x − v ′)(−z − v ′′ − y − t)

Γ

π1Γ = Γ

π2Γ = (v ′ − v ′′)(−v ′ v ′′)(s − s)(t − t)(w − w)(x − x)(z − z)(y − y)

π3Γ = (v ′ − x)(x − s) . . .

π1Γ, π2Γ, π3Γ are involutions, i.e. they are an instance of GMP, with

genome rank median m′Γ.

33

Page 47: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Transforming BGD into GMP

Γ = (v ′ − v ′)(v ′′ − v ′′)(s − s)(t − t)(w −w)(x − x)(z − z)(y − y).

π1 = id

π2 = (v ′ v ′′)(−v ′ − v ′′)

π3 = (v ′ x s w)(t y v ′′ z)(−w − s − x − v ′)(−z − v ′′ − y − t)

Γ

π1Γ = Γ

π2Γ = (v ′ − v ′′)(−v ′ v ′′)(s − s)(t − t)(w − w)(x − x)(z − z)(y − y)

π3Γ = (v ′ − x)(x − s) . . .

π1Γ, π2Γ, π3Γ are involutions, i.e. they are an instance of GMP, with

genome rank median m′Γ.

33

Page 48: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Transforming BGD into GMP

Γ = (v ′ − v ′)(v ′′ − v ′′)(s − s)(t − t)(w −w)(x − x)(z − z)(y − y).

π1 = id

π2 = (v ′ v ′′)(−v ′ − v ′′)

π3 = (v ′ x s w)(t y v ′′ z)(−w − s − x − v ′)(−z − v ′′ − y − t)

Γ

π1Γ = Γ

π2Γ = (v ′ − v ′′)(−v ′ v ′′)(s − s)(t − t)(w − w)(x − x)(z − z)(y − y)

π3Γ = (v ′ − x)(x − s) . . .

π1Γ, π2Γ, π3Γ are involutions, i.e. they are an instance of GMP, with

genome rank median m′Γ.

33

Page 49: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Transforming BGD into GMP

Γ = (v ′ − v ′)(v ′′ − v ′′)(s − s)(t − t)(w −w)(x − x)(z − z)(y − y).

π1 = id

π2 = (v ′ v ′′)(−v ′ − v ′′)

π3 = (v ′ x s w)(t y v ′′ z)(−w − s − x − v ′)(−z − v ′′ − y − t)

Γ

π1Γ = Γ

π2Γ = (v ′ − v ′′)(−v ′ v ′′)(s − s)(t − t)(w − w)(x − x)(z − z)(y − y)

π3Γ = (v ′ − x)(x − s) . . .

π1Γ, π2Γ, π3Γ are involutions, i.e. they are an instance of GMP, with

genome rank median m′Γ.

33

Page 50: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Transforming BGD into GMP

Proposition

The rank distance is right-multiplication invariant; that is, for

σ, π, τ ∈ Sn,

d(σ, π) = d(στ, πτ)

Corollary

s(m′Γ;π1Γ, π2Γ, π3Γ) = s(m′;π1, π2, π3)

Corollary

m′Γ is a genome median of π1Γ, π2Γ, π3Γ if and only if m′ is a

permutation median of π1, π2, π3.

34

Page 51: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Transforming BGD into GMP

Proposition

The rank distance is right-multiplication invariant; that is, for

σ, π, τ ∈ Sn,

d(σ, π) = d(στ, πτ)

Corollary

s(m′Γ;π1Γ, π2Γ, π3Γ) = s(m′;π1, π2, π3)

Corollary

m′Γ is a genome median of π1Γ, π2Γ, π3Γ if and only if m′ is a

permutation median of π1, π2, π3.

34

Page 52: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Transforming BGD into GMP

Proposition

The rank distance is right-multiplication invariant; that is, for

σ, π, τ ∈ Sn,

d(σ, π) = d(στ, πτ)

Corollary

s(m′Γ;π1Γ, π2Γ, π3Γ) = s(m′;π1, π2, π3)

Corollary

m′Γ is a genome median of π1Γ, π2Γ, π3Γ if and only if m′ is a

permutation median of π1, π2, π3.

34

Page 53: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

NP-hardness proof sketch

Theorem

The genome rank median problem of three genomes is NP-hard.

Proof.

instance G of BGD

BGD problem solver

GMP problem solver

m′Γ

π1Γ, π2Γ, π3Γ

Γ

Γ

m′

mc = m(−m)

mMax cycle decomp. C

H = G ∪ −G

π1, π2, π3

Canonical Machine

35

Page 54: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

NP-hardness proof sketch

Theorem

The genome rank median problem of three genomes is NP-hard.

Proof.

instance G of BGD

BGD problem solver

GMP problem solver

m′Γ

π1Γ, π2Γ, π3Γ

Γ

Γ

m′

mc = m(−m)

mMax cycle decomp. C

H = G ∪ −G

π1, π2, π3

Canonical Machine

35

Page 55: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

APX-hardness proof sketch

Theorem

The genome rank median problem of three genomes is APX-hard.

Proof.

instance G of BGD

BGD approximater

GMP ε-approximater

m′Γ

π1Γ, π2Γ, π3Γ

Γ

Γ

m′

mc = m(−m)

m8ε-approximation C

H = G ∪ −G

π1, π2, π3

Canonical Machine

36

Page 56: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

APX-hardness proof sketch

Theorem

The genome rank median problem of three genomes is APX-hard.

Proof.

instance G of BGD

BGD approximater

GMP ε-approximater

m′Γ

π1Γ, π2Γ, π3Γ

Γ

Γ

m′

mc = m(−m)

m8ε-approximation C

H = G ∪ −G

π1, π2, π3

Canonical Machine

36

Page 57: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Conclusion and open problems

• We have a general O(nω+1) algorithm for orthogonal matrices.

• We have a specialized O(nω) algorithm for symmetric orthogonal

matrices.

• We have a O(n2) algorithm for permutations with δ = 0.

• What properties of input matrices are inherited by medians?

• Partial answer: we know that not all generalized medians are

symmetric or orthogonal!

• Can we use convex optimization to find better approximations?

What is the best possible ratio for approximating the genome

median problem?

• Is there a fast exponential or sub-exponential algorithm for solving

this problem?

37

Page 58: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Thank you for your attention!

Please contact me at [email protected].

Page 59: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

Acknowledgments

Joao Meidanis

Cedric Chauve

Pedro Feijao

Sean La

39

Page 60: Another problem with variants on either side of P vs. NP divideskoroth/cstheorysem/files/slides/03_28_2019/... · 2019. 9. 20. · Problems with variants on both sides of the P vs

References

• J. P. Pereira Zanetti, P. Biller, J. Meidanis. Median Approximations

for Genomes Modeled as Matrices. Bulletin of Math Biology, 78(4),

2016.

• L. Chindelevitch and J. Meidanis. On the Rank-Distance Median of

3 Permutations. Proc. 15th RECOMB Comparative Genomics

Satellite Workshop. LNCS, vol. 10562, pp. 256-276. Springer,

Heidelberg (2017). Journal version in BMC Bioinformatics.

• L. Chindelevitch, S. La and J. Meidanis. A cubic algorithm for the

generalized rank median of three genomes. Proc. 16th RECOMB

Comparative Genomics Satellite Workshop. LNCS, vol. 11183, pp.

3-27 (2018). Journal version in BMC Algorithms for Molecular

Biology.

• R. Sarkis, S. La, P. Feijao, L. Chindelevitch, H. Hatami. Computing

the Cayley median for permutations and the rank median for

genomes is NP-hard. [Submitted to WABI 2019]

40