inferring phylogeny using permutation patterns on genomic data 1 md enamul karim 2 laxmi parida 1...

22
Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette 2 IBM T. J. Watson Research Center

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Inferring Phylogeny using Permutation Patterns on Genomic Data1Md Enamul Karim2Laxmi Parida1Arun Lakhotia

1University of Louisiana at Lafayette2IBM T. J. Watson Research Center

Page 2: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Phylogeny

Reconstruction of the evolutionary relationship of a collection of organisms, usually in the form of a tree.

Page 3: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Phylogenetic data Behavioral, morphological,

metabolic, etc. Molecular data: sequence data,

gene-order data etc.gene-order data

Page 4: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Why gene order data?

Low error rate. Rare evolutionary events unlikely

to cause “silent" changes; can help inferring millions of years.

Page 5: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Genomes rearrangements

• Inverted Transposition

1 2 3 9 -8 –7 –6 –5 –4 10

• Inversion

1 2 3 –8 –7 –6 –5 -4 9 10

• Transposition

1 2 3 9 4 5 6 7 8 10

1 2 3 4 5 6 7 8 9 10

Page 6: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Breakpoint distance

Breakpoints are number of adjacencies present in one genome, but not in the other.

1 2 3 4 5 6 7 8 9 10

1 –3 –2 4 5 9 6 7 8 10

For some datasets, a close-to-linear relationship between the breakpoints and evolutionary events may exist.

Can be used for building phylogeny (Blanchette et al.).

Page 7: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Limitations of breakpoint The number of breakpoints created by a

certain number of inversions may vary. Also, transpositions generally create more

breakpoints than inversions. Computing the breakpoint phylogeny is

NP-hard.

Page 8: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

MPBE (Maximum Parsimony on Binary Encoding)

A heuristic for the breakpoint phylogeny

(Cosner et al.). All ordered pairs of signed genes

appearing consecutively are coded as binary features.

Exponential time complexity, however, much faster than BPAnalysis.

Page 9: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Limitations

May fail to find feasible solutions to the breakpoint phylogeny problem.

Page 10: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Observation: The closer is the evolution history, the more permutations (of different granularity) are in common

1 2 3 4 5 6 7 8 9 10

1 2 3 –8 –7 –6 –5 –4 9 10

1 8 –3 –2 –7 –6 –5 –4 9 10

Page 11: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Maximal pi-pattern (Eres et al.)

Matches permutations at different granularity.

Polynomial time complexity.

Page 12: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

pi-pattern

Example :

For S = and k=2

All pi-patterns are: ac, bc, abc, abcc

acbcabacbcab

abc

Pattern with minimum k permutations

Page 13: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Cover

P1 covers P2=> Every P1 has a P2 Every P2 is within a P1

Example In S = acbcababc covers ac

Page 14: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Maximal pi-pattern

pi-pattern which is not covered

Example In S = acbcabpi-patterns: ac, bc, abc, abcc

Maximal pi-patterns: abc, abcc

not covered by abcc

Page 15: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Results

Page 16: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Phylogeny for simulated evolution on synthetic data

a

b

c

d

e

f

g

h

i

j

k

l

m

n

o

p

Page 17: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

12 genera of Campanulaceaeand the outgroup tobacco

Page 18: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Tree1: MPBE tree

Page 19: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Tree2: Neighbor joining tree (using few different distances)

Tra

Sym

Cam

Ade

Wah

Mer

Leg

Asy

Tri

Cod

Cya

Pla

Tob

Page 20: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Tree3: Neighbor joining tree using permutation patterns

Tra

Sym

Cam

Ade

Wah

Mer

Asy

Leg

Tri

Cod

Cya

Pla

Tob

167 Maximal pi-patterns(from 10769 pi-patterns) used as binary feature

XOR Distance measure

Distance/Similarity matrix is created to find neighbor joining tree

Page 21: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Tree3 vs Tree2

Page 22: Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette

Conclusion Permutation patterns may preserve more

evolutionary information. Evolutionary events could be counted

within permuted segments to develop a hybrid

scheme. Current approaches remain unable to

handle unequal gene content, which could be solved using maximal pi-patterns.