a linear-time algorithm for computing inversion distance between signed permutations with an...
Post on 22-Dec-2015
213 views
TRANSCRIPT
A Linear-Time Algorithm for Computing Inversion Distance between signed
Permutations with an experimental Study
David Bader, Bernard Moret, Mi Yan
Presented by Anat Heilper
Motivation
Evolutionary process on many single-chromosome organisms, consists mostly of inversions.
Due to that, phylogenies are reconstructed based on gene order, using inversion distance as a measure of evolutionary distance between 2 genomes.
Model assumptions
Each genome is an ordering (circular/linear) of a fixed set of genes {g1,g2,…gn}.
Each gene is given with an orientation: positive(gi), or negative(-gi).
Inversion
Let G be the genome with the signed ordering(linear or circular) g1, g2,…gn.
Inversion(i, j), i j: g1, g2,…, gi-1, gi, gi+1,…,gj, gj+1,…, gn
g1, g2,…, gi-1, -gj, -gj-1,…, -gi , gj+1,…,gn
What happens if j<i?
Circular case: rotate the circular ordering until the proper relationship between the indices is received.
Linear case: not applicable.
Inversion
Inversion Distance
Minimum number of inversions needed to transform one permutation into the other.
Computing Inversion Distance between unsigned permutations is NP-hard.
Inversion Distance Sorting
Computing the shortest sequence of inversions can be also treated as a sorting problem.
Also Known as “sorting by reversals”.
Previous work:Hannenhalli and Pevsner, 1995 – first polynomial-time algorithm.
Berman and Hannenhalli, 1996 – runs in O(n(n)), where (n) is the inverse Ackerman function
Bader, Moret, and Yan, 2000 – runs in linear-time algorithm.
What we shall see today?
Present a simple and practical, linear time algorithm to compute the connected components of the overlap graph.
Present experimental evidence that this algorithm is really efficient.
Transformation
Given a signed permutation of {1…n}, we transform it into an unsigned permutation of {0… 2n+1}:
Substitute each positive element x by the ordered pair (2x-1, 2x).
Substitute each negative element –x by the ordered pair (2x,2x-1).
(0) = 0, (2n+1) = 2n+1.
Example
+3 +9 -7 +5 -10 +8 +4 -6 +11 +2 +1
5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
X (2x-1, 2x)-X(2x,2x-1)(0) = 0, (2n+1) = 2n+1
AssumptionsBoth permutations have been turned into unsigned permutations.
Both unsigned permutations are transformed so that the first permutation becomes the identity permutation (0,1,..2n,2n+1).
This is why this problem can be viewed as sorting: find the number of inversions needed to transform the given permutation into the identity permutation.
Cycle graph We represent the unsigned permutation by an edge-colored graph called the “cycle graph”.
Properties of the graph:2n+2 vertices.
For each i, 0 i n, there’s a gray edge between vertices (2i) and (2i+1).
There is a black edge between vertices 2i and 2i+1.
Cycle graph
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
i
(i)
The resulting graph consists of disjoint cycles in which edges alternate colors.
Overlapping edges2 gray edges ((i) , (j)) and ((k), (t)) overlap if the 2 intervals [i,j] and [k,t] overlap but neither one contains the other.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
i
(i)
Overlapping cycles2 cycles C1 and C2 overlap if there exist overlapping gray edges e1 C1 and e2C2.
Extent of a cycle C: is the interval [C.B, C.E] where C.B = min {i| i C}, B stands for the beginning of the cycle,
and C.E = max{i| i C}, E stands for the ending of the cycle.
Extent of a set of cycles {c1…ck} is [B,E] where B = min {Ci.B| 0 i k} and E = max{Ci.E| 0 i k}.
Overlap Graph of permutation One vertex for each cycle in the cycle graph.
An edge between any 2 vertices that correspond to overlapping cycles.
Overlap graph
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
i
(i)
a
b
c
d
e f
One vertex for each cycle in the cycle graph.
Overlap grapha
b
c
d
e f
a fb d c
e
An edge between any 2 vertices that correspond to overlapping cycles.
By now we know how to:Transform a signed permutation to an unsigned permutation.
Create an overlap graph from the cycle graph.
Compute the minimum number of inversion and inversion sequence from the overlap graph.
The bottleneck of the algorithm is building the overlap graph(o(n^2)).
How can we improve the algorithm?Build a much smaller graph which captures the same information as the overlap graph.
Building the overlap forest
Creating the overlap forest in two scans of the permutation:
First scan: Build a trivial forest F0 in which each node is its own forest.
Second scan: Iterative refinement of the forest, at each iteration we expand the domain where we search for overlapping cycles with one node. (2n+2 iterations overall).
Active tree: a tree rooted at f is active at stage j whenever j lies properly within the extent of f; the extent of the active trees is stores in a stack
Let Fj-1 be the forest constructed from processing the elements 0 through j-1. Let f be the cycle containing element j of the permutation. Build Fj from Fj-1 as follows:
If j is the beginning of its own cycle f then it is the root of a single node tree.
Otherwise if cycle f overlaps with cycle g, add an edge (g,f) and compute the combined extent of g and of the tree rooted at f.
Building the overlap forest – refinement stage
Building the overlap graph1) Scan the permutation, label each position i with
C[i].B, and set up [C[i].B, C[i].E].2) Initialize empty stack.3) For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].b) Extent C[i].
While(top.B > C[i].B){Extent.B min{extent.B, top.B}Extent.E max{ extent.E, top.E}parent[top.B] C[i].BPop top
End while}Top.B min{extent.B, top.B}Top.E max{extent.E, top.E}
c) if (i == top.E) then pop top.
i is the beginning of its own cycle.
Top is the top
element of the stack
The cycle i belongs to intersection with
another cycle
The tree at the top of the stack isn’t active anymore
Example of building an overlap forest
i
(i)
C[i].B
C[i].E
C[i]
First stage: setup a
b
c
d
e f
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 13 13 11 11 15 15 17 17 11 11 13 13 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 13 13 11 11 15 15 17 17 11 11 13 13 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
i
(i)
C[i].B
C[i].E
C[i]
i = 0:
a)i = C[0].B push A
b)extent A
while(0=top.B>C[i].B=0)?NO
C[0].B 0
C[0].E21
c) if(0==21)?NO
A
A
extent
topFor i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
A
i
(i)
C[i].B
C[i].E
C[i]
A
extent
topi = 1:
a)1=i C[1].B=0
b) extent A
while(0=top.B>C[i].B=0)
C[1].B 0
C[1].E21
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 13 13 11 11 15 15 17 17 11 11 13 13 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
B
A
i
(i)
C[i].B
C[i].E
C[i]
B
extent
i = 2:
a)i = C[2].B push B
b) extent B
while(2=top.B>C[i].B=2)
C[2].B 2
C[2].E13
i = 3:
very similar to i = 2.
top
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 13 13 11 11 15 15 17 17 11 11 13 13 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
C
B
A
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
C
extent
i = 4:
a)i = C[4].B push C
b) extent C
while(4=top.B>C[i].B=4)
C[4].B 4
C[4].E11
i = 5:
very similar to the case of i = 4.
i
(i)
C[i].B
C[i].E
C[i]
top
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 13 13 11 11 15 15 17 17 11 11 13 13 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
D
C
B
A
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
D
extent
i = 6:
a)i = C[6].B push D
b) extent D
while(6=top.B>C[i].B=6)
C[6].B 6
C[6].E15
i = 7:
very similar to the case of i = 6.
i
(i)
C[i].B
C[i].E
C[i]
top
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 13 13 11 11 15 15 17 17 11 11 13 13 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
E
D
C
B
A
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
E
extent
i = 8:
a)i = C[8].B push E
b) extent E
while(8=top.B>C[i].B=8)
C[8].B 8
C[8].E17
i = 9:
very similar to the case of i = 8.
i
(i)
C[i].B
C[i].E
C[i]
top
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 13 13 11 11 15 15 17 17 11 11 13 13 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
E
D
C
B
A
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
C
extent
i = 10:
a)i C[10].B
b) extent C
while(8=top.B>C[i].B=4)
extent.B min{4,8}
extent.Emax{11, 17}
parent[ 8] 4
pop top
while(top.B=6>C[i].B =4)
extent.B min{4,6}
extent.Emax{17,15}
parent[ 6] 4
pop top
i
(i)
C[i].B
C[i].E
C[i]
top
D
C
B
A
top
c
e
c
e d
[4,17]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 13 13 11 11 15 15 17 17 11 11 13 13 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
C
B
A
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
E
extent
i = 10 continue:
while(top.B = 4>C[i].B =4)?NO
top.B min{4,4}
top.Emax{17,11}
c) )if (10 == 17)? no
i
(i)
C[i].B
C[i].E
C[i]
top
c
e d
[4,17]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 13 13 17 17 15 15 17 17 17 17 13 13 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
C
B
A
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
C
extent
i = 11 :
a)11=i C[11].B=4
b)extent C
while(4=top.B>C[i].B= 4)
top.bmin{4,4}
top.Emax{17,17}
c) )if (11 == 17)? NO
i
(i)
C[i].B
C[i].E
C[i]
top
c
e d
[4,17]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 13 13 17 17 15 15 17 17 17 17 13 13 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
C
B
A
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
B
extent
i
(i)
C[i].B
C[i].E
C[i]
top
c
e d[2,17]
i = 12 :
a)12=i C[12].B=2
b)extent B
while(4=top.B>C[i].B= 2)
Extent.B min{2, 4}
extent.E max{13, 17}
Parent[4]2
Pop top
while(2=top.B>C[i].B=2)?NO
top.B min{2,2}
top.E max{17,13}
c) if(12==17)?NO
b
B
A
top
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 17 17 17 17 15 15 17 17 17 17 17 17 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
B
extent
i
(i)
C[i].B
C[i].E
C[i]
c
e d
i = 13 :
a)13=i C[13].B=2?NO
b)extent B
while(2=top.B>C[i].B= 2)
top.B min{2,2}
top.E max{17,17}
c) if(13==17)?NO
bB
A
top
[2,17]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 17 17 17 17 15 15 17 17 17 17 17 17 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
D
extent
i
(i)
C[i].B
C[i].E
C[i]
c
e d
i = 14 :
a)14=i C[14].B=6?NO
b)extent D
while(6=top.B>C[i].B= 6)?NO
top.B min{6,2}
top.E max{15,17}
c) if(14==17)?NO
i = 15..16:
similar to i=14
i=17: pop top
bB
A
top
[2,17]
Atop
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 17 17 17 17 15 15 17 17 17 17 17 17 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
i
(i)
C[i].B
C[i].E
C[i]
i = 18:
a)i = C[18].B push F
b)extent F
while(18=top.B>C[i].B=18)?NO
C[18].B 18
C[18].E23
c) if(18==23)?NO
i=19:
very similar to i=18
F
extent
c
e d
bF
A
top
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 17 17 17 17 15 15 17 17 17 17 17 17 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].
b) Extent C[i].While (top.B > C[i].B){
Extent.B min{extent.B, top.B}
Extent.E max{ extent.E, top.E}
parent[top.B] C[i].B
Pop top
End while }
Top.B min{extent.B, top.B}
Top.E max{extent.E, top.E}
c) If(i==top.E) then pop top.
i
(i)
C[i].B
C[i].E
C[i]
i = 20:
a)i C[20].B
b)extent A
while(18=top.B>C[i].B=0)
extent.B min{0,18}
extent.Emax{21, 23}
parent[18] 0
Pop top
i=21:
i==top.Epop top
A
extent
c
e d
bF
A
top
F
A
Atop
top
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23
0 0 2 2 4 4 6 6 8 8 4 4 2 2 6 6 8 8 18 18 0 0 18 18
21 21 17 17 17 17 15 15 17 17 17 17 17 17 15 15 17 17 23 23 21 21 23 23
A A B B C C D D E E C C B B D D E E F F A A F F
Lemma 1At iteration i of step (3) of the algorithm, if the tree rooted at top is active and i lies on cycle f and we have f.B < top.B, h In the tree rooted at top such that h overlaps with f.Proof: top is active it must have been pushed onto the stack before the current iteration (top.B <i) and we didn’t reach the end of top’s extent yet (i < top.E). i must be contained in top’s extent (top.B<i<top.E). Since i lies on the cycle f that begins before top (f.B<top.B), there must be an edge from cycle f that overlaps with top
Theorem
The algorithm produces a forest
in which each tree is composed of
exactly those nodes that form
a connected component.
Proof
after each iteration, the trees in the forest correspond exactly to the connected components determined by the permutation values scanned up to that point (induction on the number applications of step 3 of the algorithm:Base: each tree of F0 has a single node and no two nodes belong to the same connected component.Step: assume invariant after (i-1) iterations and let i lie on cycle f. We prove that the nodes of the tree containing i form the same set as the nodes of the connected component containing i (other connected components are unaffected and so still obey the invariant).
Proof A node in the tree containing i must be in the same connected component as I.
i = f.B: then, nothing changes in the overlap graph ( and thus in the connected components);
from step(3): the forest remains unchanged, so that the invariant is preserved.
i>f.B: ( top,f) will be added to the forest whenever f.B<top.B holds. (top,f) will join the sub tree rooted at f with that rooted at top into a single sub tree.
f.B<top.B h in the tree rooted at top such that h and f overlap (lemma 1) (h,f) must belong to the overlap graph (h,f) connecting the connected component containing f with that containing top merging them into a single connected component the invariant is preserved.
Whenever (j,i) and (k,l) with j<k<i<l, are gray edges on cycles f and h, respectively, then edge (f,h) must belong to the overlap graph. In such a case, our algorithm ensures that edge (h,f) belong to the overlap forest.
Proof A node in the same connected component as I must be in the tree containing i.
j k i l
h
f
Complexity:1) Setup and Initialize empty stack.2) For i 0 to 2n+1
a) If (i == C[i].B) then push C[i].b) Extent C[i].
While(top.B > C[i].B){Extent.B min{extent.B, top.B}Extent.E max{ extent.E, top.E}parent[top.B] C[i].BPop top
End while} Top.B min{extent.B, top.B} Top.E max{extent.E, top.E}
c) if (i == top.E) then pop top.
Linear time
Each cycle is inserted exactly once to the
stack, and is poped after adding an edge to the graph, or after passing the end of it. Thus also
linear time
Every step of the algorithm takes
linear time,so the entire algorithm,
runs in worst case linear time.
Experimental setupSigned permutations of length 10, 20, 40, 60, 80, 160, 320, and 640:
For each length, 10 groups of 3 signed permutations were generated from the identity permutation using Nadeau and Taylor’s model
5 evolutionary rates were used: 4, 16, 64, 236, and 1024 inversions per edge.
For each length, 10 groups of 3 permutations were also generated and used as an extreme test case.
For each of these test suites, the 3 distances among the 3 genomes in each group were computed 20,000 times in tight loop. An average and standard deviation were computed over the 10 groups.
Computed inversion distance are expected to be at most twice the evolutionary rate, since there are 2 edges between each pair of genomes.
Experimental setup
Run time of connected components computation as a function of the permutation size
Total run time as a function of the permutation size
Inversion distance as function of the permutation size
Speed comparison of Bader linear time algorithm and that of the UF approach(only connected components)
Speed comparison of Bader’s linear-time algorithm and that of the UF algorithm
summaryPresented a simple, practical, linear time algorithm for computing inversion distance between two signed permutations, with a detailed experimental study.