genome rearrangement

101
Genome Rearrangement By Ghada Badr Part I

Upload: cameo

Post on 20-Jan-2016

42 views

Category:

Documents


2 download

DESCRIPTION

Genome Rearrangement. By Ghada Badr Part I. Genome, chromosome, gene, gene order. The entire complement of genetic material carried by an individual is called the genome . Each genome contains one or more DNA molecules, one per chromosome. Genome, chromosome, gene, gene order. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Genome Rearrangement

Genome Rearrangement

ByGhada Badr

Part I

Page 2: Genome Rearrangement

2

Genome, chromosome, gene, gene order

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

• The entire complement of genetic material carried by an individual is called the genome.• Each genome contains one or more DNA molecules, one per chromosome

Page 3: Genome Rearrangement

3

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 4: Genome Rearrangement

4

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

• A gene is a segment of DNA sequence with a specific function

Genome, chromosome, gene, gene order

Page 5: Genome Rearrangement

5

• Genes can be ordered by their DNA sequence location. • DNA consists of two complementary strands twisted around each other to form a right-handed double helix. • A sign (+/-) is usually used to indicate on which strand a gene is located.

A

B

C D

E

F5’ 3’3’ 5’ QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Gene order: A -B C D -E F

Genome, chromosome, gene, gene order

Page 6: Genome Rearrangement

6

A B C D FE

J

HI

K

The DNA molecule (chromosome) may be circular or linear

Genome, chromosome, gene, gene order

Page 7: Genome Rearrangement

7

• The genome is structurally specific to each species, and it changes only slowly over time. Therefore genome comparison among different species can provide us with much evidence about evolution.

• Genome rearrangements are an important aspect of the evolution of species. Even when the gene content of two genomes is almost identical, gene order can be quite different.

Genome Rearrangement

A -B C D F-E

B -E F-D CA

Genome 1

Genome 2

Page 8: Genome Rearrangement

8

Genome Rearrangement

Gene order analysis on a set of organisms is a powerful technique for genomic comparison phylogenetic inference.

Page 9: Genome Rearrangement

9

General Definition for the problem:Given a set of genomes and a set of

possible evolutionary events (operations), find a shortest set of events transforming (sorting) those genomes into one another.

Genome Rearrangement

What genome means and what events are, makes the diversity of the problem.

Since these events are rare, scenarios minimizing their number are more likely close to reality.Many models have been proposed.

Page 10: Genome Rearrangement

10

Genes (or blocks of contiguous genes) are a good example of homologous markers, segments of genomes, that can be found in several species.

The simplest possible model is:

The order of genes in each genome is known,

All the genomes share the same set of genes,

All genomes contain a single copy of each gene, and

All genomes consist of a single chromosome.

Genome Models

Page 11: Genome Rearrangement

11

Genomes can be modeled by each gene can be assigned a unique number and is exactly found once in the genome.

Genome Models

permutations:

Signed Permutation: Each gene may be assigned + or - sign to indicate the strand it resides on.

Unsigned Permutation: If the corresponding strand is unknown.

Page 12: Genome Rearrangement

12

Genes (markers) are represented by integers: 1, 2, . . . . , n, with +,- sign to indicate the strand they lie on.

The order and orientation of genes of one genome in relation to the other is represented by a signed permutation .

= (2n-1n) of size n over {-n, ... , -1, 1, ... , n}, such that for each i from 1 to n, either i or -i is mandatory represented, but not both.

Permutaions

Page 13: Genome Rearrangement

13

The identity permutation n = (1, 2, 3, . . . . , n).

When multiple genomes with the same gene content are compared, one of them is chosen as a base (reference), i.e, represented as n, and all other identical genes are given the same integer values.

Permutaions

Identity permutation:

Page 14: Genome Rearrangement

14

In order to sort a permutation this means that we want to apply some operations on to change it to n.

If (1 = 2) We say that is sorted with respect to .

If (1 2) We say that is unsorted with respect to .

Permutaions

Sorted/unsorted permutation:

Page 15: Genome Rearrangement

15

Permutaions

Fruit Fly

Mosquito

Silkworm

Locust

Tick

Centipede

Example: Mitochondrial Genomes of 6 Arthropoda

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

1 2 3 4 5 6 8 7 9 -10 11 12 13 14 15 16 17

1 2 3 4 5 6 7 8 9 10 11 12 14 13 15 16 17

1 2 3 5 4 6 7 8 9 10 11 12 13 14 15 16 17

1 3 4 5 6 7 8 9 10 11 -2 12 13 14 15 16 17

1 3 4 5 6 7 8 9 10 11 -2 12 16 13 14 15 17

1= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)

2= (1 , 2 , 3 , 4 , 5 , 6 , 8 , 7 , 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)

3= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17)

4= (1 , 2 , 3 , 5 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)

5= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17)

6= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17)

Page 16: Genome Rearrangement

16

Permutaions

Fruit Fly

Mosquito

Silkworm

Locust

Tick

Centipede

Example: Mitochondrial Genomes of 6 Arthropoda

1= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)

2= (1 , 2 , 3 , 4 , 5 , 6 , 8 , 7 , 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)

3= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17)

4= (1 , 2 , 3 , 5 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)

5= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17)

6= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17)

Page 17: Genome Rearrangement

17

is linear when it represents a linear chromosome, or circular when it represents a circular chromosome.

When = (2n-1n) is circular: ’ = (-nn-121) all permutations obtained by shifts on or ’

shift( , i) = (n-i+1n-i+2n-1n1n-

i are all equivalent. Example: (-3,2,1,-4) & (-1,-2,3,4)

Permutaions

Linear and circular permutation:

Page 18: Genome Rearrangement

18

Permutaions

For a given permutation = (2n-

1n), there is a point between each pair of consecutive values i and i+1 in .

If is linear: there are two additional points, one before and one after n.

If is circular: there is one additional point between nand 1.

Pts() = n+1 if linear, and pts() = n if circular.

Points in permutations

Page 19: Genome Rearrangement

19

Permutaions

For a given = (2n-1n)

If is linear: a linear extension of is ’= (0, 2n-1n, n+1)

If is circular: a linear extension of is ’= (0, 2n-1n-1, n)

Linear extension of a permutation:

Page 20: Genome Rearrangement

20

Now: we want to compare our genomes.

Permutaions

Example: = (4,8,9,7,6,5,1,3,2) ’= (0,4,8,9,7,6,5,1,3,2,10)

’= (0.4.8.9.7.6.5.1.3.2.10)

Then Pts() = 10

Page 21: Genome Rearrangement

21

Problem: Given two genomes, How do we measure their similarity and/or distance?

Permutations - similarity/distance

A Related Problem: Given two permutations,How do we measure their similarity and/ordistance?

Page 22: Genome Rearrangement

22

A distance measure should be a metric on the set of genomes.

A Metric d on a set S (d: S S R) satisfies the following three axioms:

Permutations - similarity/distance

1. Positivity: for all s, t in S, d(s,t) 0, and d(s,t)=0 iff s = t.

2. Symmetry: for all s, t in S, d(s,t) = d(t,s).3. Triangular inequality: for all s, t, u in S, d(s,u) d(s,t) + d(t,u).

Page 23: Genome Rearrangement

23

Measures of similarity between permutations that are used in computational biology are numerous in literature.

First measures used are (will be useful later on):

Breakpoints (Introduced by Sankoff and Blanchette (1997))

Common intervals

Permutations - similarity/distance

Page 24: Genome Rearrangement

24

Permutations-distance - Breakpoints

When analyze with respect to , each point in can be an adjacency or a breakpoint.

A point (pair of consecutive values) (i, i+1) in is an adjacency between and : when either (i, i+1) or (-I+1, -i) are consecutive in .

If is linear: we have adjacency before if is also the first value in , and an adjacency after n, if n is also last value in .

If is circular: we assume that n is also last value in and (n, 1) is an adjacency if is also the first value in .

Page 25: Genome Rearrangement

25

Permutations-distance - Breakpoints

Breakpoint distance counts the lost adjacencies between genomes.

The breakpoint distance between and is:

brp() = pts() - adj() where: pts() is the number of points in . adj() is the number of adjacencies.

• If is sorted ( = ): has only adjacencies and no breakpoints (brp() = 0).

• If is unsorted ( ): has at least one breakpoint (brp() 0).

Page 26: Genome Rearrangement

26

Permutations-distance - Breakpoints

Back to our Example: = (4,8,9,7,6,5,1,3,2) ’= (0,4,8,9,7,6,5,1,3,2,10)

’= (0.4.8.9.7.6.5.1.3.2.10) Then Pts() = 10, brp()?Adjacencies?n= (0.1.2.3.4.5.6.7.8.9.10)

(8,9) (7,6) (6,5) (3,2) adj() = 4 brp() = pts() - adj() = 10 - 4 = 6

Page 27: Genome Rearrangement

27

Permutations-distance - Breakpoints

Breakpoint distance is based on the notion of conserved adjacencies and can be defined on a set of more than two genomes.

It is easy to compute.

It always fails to capture more global relations between genomes.

The first generalization of adjacencies is the notion of common intervals.

Page 28: Genome Rearrangement

28

Common intervals: subsets of genes that appear consecutively together in two or more genomes, where genes are the same in each interval but may be not in the same order or orientation.

Permutations-distance - Common Intervals

1= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 2= (1 , 2 , 3 , 4 , 5 , 6 , 8 , 7 , 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 3= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17) 4= (1 , 2 , 3 , 5 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 5= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17) 6= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17)

Example (circular chromosomes)

If compare the first 4 species: they share 6 adjacencies{1,2}, {2,3},{11.12},{15,16},{16,17},{17,1}If compare all 6 species: they share only 1 adjacency{17,1}

Page 29: Genome Rearrangement

29

Common intervals: subsets of genes that appear consecutively together in two or more genomes, where genes are the same in each interval but may be not in the same order or orientation.

Permutations-distance - Common Intervals

1= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 2= (1 , 2 , 3 , 4 , 5 , 6 , 8 , 7 , 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 3= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17) 4= (1 , 2 , 3 , 5 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 5= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17) 6= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17)

Example (circular chromosomes)

The six permutations are very similar.

The genes in the interval [1,12] are all the same, as genes in the intervals [3,6], [6,9],[9,11], and [12,17].

Page 30: Genome Rearrangement

30

We can use common intervals as a measure of similarity between species.

Permutations-distance - Common Intervals

Disadvantage: All these measures do not reflect rearrangement operations or explain what happened to the genome over time.

Page 31: Genome Rearrangement

31

Back to our original problem: Given a set of genomes and a set of

possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another.

Rearrangement operations (events)

What are the Rearrangement events (Operation)?These events (Operation) could be applied to a single gene or to a group of genes, intervals.

Page 32: Genome Rearrangement

32

Rearrangement operations

Fruit Fly

Mosquito

Silkworm

Locust

Tick

Centipede

Example: Mitochondrial Genomes of 6 Arthropoda

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Page 33: Genome Rearrangement

33

Rearrangement operations affect gene orderand gene content. There are various types:

In case of single-chromosome genome:• Inversions• Transpositions• Reverse transpositions• Gene Duplications• Gene loss

In case of multiple-chromosomes genomes we add:

• Translocations• fusions • fissions

Rearrangement Operations

Page 34: Genome Rearrangement

34

Rearrangement Operations - Single Chro.

Inversion

Page 35: Genome Rearrangement

35

Rearrangement Operations - Single Chro.

Inversion

Page 36: Genome Rearrangement

36

Rearrangement Operations - Single Chro.

Inversion

Page 37: Genome Rearrangement

37

Rearrangement Operations - Single Chro.

Fruit Fly

Mosquito

Silkworm

Locust

Tick

Centipede

An inversion.Example: Mitochondrial Genomes of 6 Arthropoda

Page 38: Genome Rearrangement

38

Rearrangement Operations - Single Chro.

Transposition

Page 39: Genome Rearrangement

39

Rearrangement Operations - Single Chro.

Transposition

Page 40: Genome Rearrangement

40

Rearrangement Operations - Single Chro.

Transposition

Page 41: Genome Rearrangement

41

Rearrangement Operations - Single Chro.

Fruit Fly

Mosquito

Silkworm

Locust

Tick

Centipede

Example: Mitochondrial Genomes of 6 Arthropoda

A transposition

Page 42: Genome Rearrangement

42

Rearrangement Operations - Single Chro.

Reverse Transposition

Page 43: Genome Rearrangement

43

Rearrangement Operations - Single Chro.

Reverse Transposition

Page 44: Genome Rearrangement

44

Rearrangement Operations - Single Chro.

Reverse Transposition

Page 45: Genome Rearrangement

45

Rearrangement Operations - Single Chro.

Fruit Fly

Mosquito

Silkworm

Locust

Tick

Centipede

Example: Mitochondrial Genomes of 6 Arthropoda

A reverse transposition

Page 46: Genome Rearrangement

46

Rearrangement Operations - Multiple Chro.

Translocation

Page 47: Genome Rearrangement

47

Rearrangement Operations - Multiple Chro.

Translocation

Page 48: Genome Rearrangement

48

Rearrangement Operations - Multiple Chro.

Translocation

Page 49: Genome Rearrangement

49

Rearrangement Operations - Multiple Chro.

Translocation

Page 50: Genome Rearrangement

50

Rearrangement Operations - Multiple Chro.

Translocation

Page 51: Genome Rearrangement

51

Rearrangement Operations - Multiple Chro.

Translocation

Page 52: Genome Rearrangement

52

Rearrangement Operations - Multiple Chro.

Fusion

Fission

Page 53: Genome Rearrangement

53

Rearrangement Operations - Multiple Chro.

Fusion

Fission

Page 54: Genome Rearrangement

54

Rearrangement Operations - Multiple Chro.

Fusion

Fission

Page 55: Genome Rearrangement

55

Rearrangement Operations - Multiple Chro.

Fusion

Fission

Page 56: Genome Rearrangement

56

Rearrangement Operations - Multiple Chro.

Fusion

Fission

Page 57: Genome Rearrangement

57

Rearrangement Operations - Multiple Chro.

Fusion

Fission

Page 58: Genome Rearrangement

58[Source: Linda Ashworth, LLNL]

DOE Human Genome Program Report

From 24 chromosomes

To 21 chromosomes

Rearrangement Operations - Multiple Chro.

Page 59: Genome Rearrangement

59

Rearrangement Problems

Back to our original problem: Given a set of genomes and a set of

possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another.

Any set of operations yields a distance between genomes, by counting the minimum number of operations needed to transform one genome into the other.

Page 60: Genome Rearrangement

60

Rearrangement Problems

Back to our original problem: Given a set of genomes and a set of

possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another.

• Computing the distance d()• Computing one optimal sorting sequence of events.

Two classical problems

Page 61: Genome Rearrangement

61

• Given a permutation , calculate reversal distance d() and find one optimal sequence of reversals sorting .

Assumption:

Only reversals are allowed.

No duplication in genes.

Genomes are unichromosomal.

Reversal Distance - Sorting by Reversals

Page 62: Genome Rearrangement

62

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

A reversal is represented as a set of genes appearing together in the given genome.

Page 63: Genome Rearrangement

63

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 64: Genome Rearrangement

64

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 65: Genome Rearrangement

65

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 66: Genome Rearrangement

66

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 67: Genome Rearrangement

67

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 68: Genome Rearrangement

68

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

This approach is symmetric

Page 69: Genome Rearrangement

69

Reversal Distance - Sorting by Reversals

Vertices: all permutations of n = 3.Edges: connect an edge between 1 and 2 if reversal distance

d(1, 2) = 1.

Reversal graph for n = 3

Page 70: Genome Rearrangement

70

Reversal Distance - Sorting by Reversals

Reversal graph for n = 3

Reversal distance d(i, k) = length of shortest path between vi and vk.

Page 71: Genome Rearrangement

71

Reversal Distance - Sorting by Reversals

Reversal graph for n = 3

The graph is huge |V| = n!.2n

A feasible graph-search algorithm is not possible!

Page 72: Genome Rearrangement

72

The classical approach for solving these two problems in polynomial time was developed by Hannenhalli and Pevzner. (1995)

The reversal distance can be computed in O(n) time by Bader et. al. (2000)

The fastest algorithm to find an optimal sorting sequence is < O(n2) by Tannier et. al. (2007)

Most approaches are based on a special structure called the breakpoint graph.

Reversal Distance - Sorting by Reversals

Page 73: Genome Rearrangement

73

Breakpoint Graph: edges are black or gray.

Given = (n-1n)

If is linear: we add the values 0, and n+1, the represents the extremities of the chromosome obtaining:

= (0, n-1n, n+1)

If is circular: assume n = n and add only the value 0, obtaining:

= (0, n-1n-1, n)

Reversal Distance - Sorting by Reversals

Page 74: Genome Rearrangement

74

Black edge: Links each pair of consecutive value in

by a horizontal (a point in ).

Gray edges: Link the extremities of black edges such that the values will be in order.

Graph: collection of cycles, where black and gray edges alternate.

Trivial cycle: one black and one gray edge (adjacency)

Long Cycle: four or more edges ( 2 breakpoints)

Reversal Distance - Sorting by Reversals

Page 75: Genome Rearrangement

75

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When sorted

0 +5

Page 76: Genome Rearrangement

76

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When sorted

0 +5

Page 77: Genome Rearrangement

77

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When sorted

0 +5

Page 78: Genome Rearrangement

78

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When sorted

0 +5

Page 79: Genome Rearrangement

79

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When sorted

0 +5

Page 80: Genome Rearrangement

80

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When sorted

0 +5

Page 81: Genome Rearrangement

81

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When sorted

0 +5

Page 82: Genome Rearrangement

82

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When sorted

0 +5

Page 83: Genome Rearrangement

83

Linear and circular permutations are different in breakpoint graph construction.

Same analyses.

Reversal Distance - Sorting by Reversals

Linear Circular

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

= (-3 , 2 , 1 , -4)

Page 84: Genome Rearrangement

84

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When sorted

0 +5

Page 85: Genome Rearrangement

85

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When sorted

0 +5

Page 86: Genome Rearrangement

86

If is sorted:

• Only adjacencies, no breakpoints.

• Breakpoint graph is a collection of trivial cycles.

• # cycles in sorted graph cyc() = pts()

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

= (-3 , 2 , 1 , -4) sorted

Page 87: Genome Rearrangement

87

If is unsorted:

• At least one breakpoint, at least one long cycle.

• # cycles cyc() is at most = pts() - 1

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

= (-3 , 2 , 1 , -4) sorted

Observation: To sort a permutation , we would like to increase the number of cycles in its breakpoint graph.

Page 88: Genome Rearrangement

88

The effects of a reversal over a breakpoint graph .

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Split reversal

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Neutral reversal

Joint reversal

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

cyccyc cyccyc

cyccyc

Page 89: Genome Rearrangement

89

The effects of a reversal over a breakpoint graph .

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Page 90: Genome Rearrangement

90

Observation: To sort , we must maximize the number of split reversals in the sorting sequence s.

Reversal Distance - Sorting by Reversals

If s has only split reversals: what will be the reversal distance d(Hint: in terms of pts() and cyc())

dpts() - cyc()

Are we done?

Page 91: Genome Rearrangement

91

Reversal Distance - Sorting by Reversals

dpts() - cyc()

A split reversal does not always exist.

For example, if all black edges in the graph have the same direction.

In this case, we need to add some joint and/or neutral reversals in the sorting sequence s.

Page 92: Genome Rearrangement

92

Reversal Distance - Sorting by Reversals

• It is always possible to calculate the number of non-split reversals in a sorting sequence.

• It will be the number of non-split reversals to sort some hard components in the graph with no orientation, unoriented components.

• Unoriented components can be a hurdle hrdor more hardly a fortress frt() in the breakpoint graph.

• Hardles are very rare, and fortresses are even more rare in permutations that represent real genomes.

• In practice, split reversals are sufficient to sort the permutation.

Page 93: Genome Rearrangement

93

Reversal Distance - Sorting by Reversals

Can we choose any split reversal? only safe reversals.

Safe reversal: a split reversal not producing hurdles.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Unsafe reversal Safe reversal

There is always a safe reversal for any oriented .

Page 94: Genome Rearrangement

94

Reversal Distance - Sorting by Reversals

The final formula for the reversal distance d is:

dpts() - cyc() + hrd() + frt()

Where: • frt() = 1, if is a fortress, and 0 otherwise. • pts() = n+1, if is linear, and n if is circular.

Page 95: Genome Rearrangement

95

Reversal Distance - Sorting by Reversals

Algorithm: Get optimal sorting sequence s that sorts Input: A signed permutation . Output: An optimal sequence of reversals sorting .1. Construct the breakpoint graph of .2. S [empty]3. If frt() = 1 then4. choose a reversal to eliminate the fortress5. 6 s s . [concatenate the reversal to s] 7. End if8. While there is hurdles in do9. choose a reversal to eliminate the hurdle10. s s . [concatenate the reversal to s]12. End while13. While is not sorted do14. choose a safe split reversal to 15. 6 s s . [concatenate the reversal to s]17. End while18. return s

Page 96: Genome Rearrangement

96

Reversal Distance - Sorting by Reversals

Algorithm: Get optimal sorting sequence s that sorts Input: A signed permutation . Output: An optimal sequence of reversals sorting .1. Construct the breakpoint graph of .2. S [empty]3. If frt() = 1 then4. choose a reversal to eliminate the fortress5. 6 s s . [concatenate the reversal to s] 7. End if8. While there is hurdles in do9. choose a reversal to eliminate the hurdle10. s s . [concatenate the reversal to s]12. End while13. While is not sorted do14. choose a safe split reversal to 15. 6 s s . [concatenate the reversal to s]17. End while18. return s

Page 97: Genome Rearrangement

97

Reversal Distance - Sorting by Reversals

Algorithm: Get optimal sorting sequence s that sorts Input: A signed permutation . Output: An optimal sequence of reversals sorting .1. Construct the breakpoint graph of .2. S [empty]3. If frt() = 1 then4. choose a reversal to eliminate the fortress5. 6 s s . [concatenate the reversal to s] 7. End if8. While there is hurdles in do9. choose a reversal to eliminate the hurdle10. s s . [concatenate the reversal to s]12. End while13. While is not sorted do14. choose a safe split reversal to 15. 6 s s . [concatenate the reversal to s]17. End while18. return s

Page 98: Genome Rearrangement

98

Reversal Distance - Sorting by Reversals

Algorithm: Get optimal sorting sequence s that sorts Input: A signed permutation . Output: An optimal sequence of reversals sorting .1. Construct the breakpoint graph of .2. S [empty]3. If frt() = 1 then4. choose a reversal to eliminate the fortress5. 6 s s . [concatenate the reversal to s] 7. End if8. While there is hurdles in do9. choose a reversal to eliminate the hurdle10. s s . [concatenate the reversal to s]12. End while13. While is not sorted do14. choose a safe split reversal to 15. 6 s s . [concatenate the reversal to s]17. End while18. return s

ComplexityO(n5)

Tools: GRIMM & GRAPPA

Page 99: Genome Rearrangement

99

Reversal Distance - Sorting by Reversals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

We can have more than one optimal solution

Page 100: Genome Rearrangement

100

conclusions

• Represented linear and circular genomes as permutations in our simple model.

• Described first measures for similarity between permutation were breakpoint and common intervals --> has no biological interpretation.

• Used genome rearrangement events to describe similarity/distances between genomes --> has more biological meaning.

• Described in details one distance measure (reversal distance) and events (reversals) to sort genomes.

Page 101: Genome Rearrangement

101

Thank you

Questions?

Next Lecture?