parametric sequence alignment

34
Outline Introduction An Upper Bound A Lower Bound Parametric Sequence Alignment Cynthia Vinzant University of California, Berkeley September 19, 2008 Cynthia Vinzant Parametric Sequence Alignment

Upload: others

Post on 10-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

Parametric Sequence Alignment

Cynthia Vinzant

University of California, Berkeley

September 19, 2008

Cynthia Vinzant Parametric Sequence Alignment

Page 2: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

IntroductionSequence Alignment

An Upper Boundα− β planeproof√

n conjecture

A Lower BoundConstructionExample for q = 4

Cynthia Vinzant Parametric Sequence Alignment

Page 3: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

Sequence Alignment

Sequence Alignment

We have two sequences (representing species) and would like somemeasure their similarity.

To align, you insert spaces, to form new sequences.

Example: One alignment of ACTAG and CAGAA is

− A − C A T G

C A G A − − A

Cynthia Vinzant Parametric Sequence Alignment

Page 4: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

Sequence Alignment

Sequence Alignment

We have two sequences (representing species) and would like somemeasure their similarity.

To align, you insert spaces, to form new sequences.

Example: One alignment of ACTAG and CAGAA is

− A − C A T G

C A G A − − A

Cynthia Vinzant Parametric Sequence Alignment

Page 5: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

Sequence Alignment

Sequence Alignment

We have two sequences (representing species) and would like somemeasure their similarity.

To align, you insert spaces, to form new sequences.

Example: One alignment of ACTAG and CAGAA is

− A − C A T G

C A G A − − A

Cynthia Vinzant Parametric Sequence Alignment

Page 6: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

Sequence Alignment

Alignment Graphs

One way of representing an alignment of sequences of length n isas a path through a n × n grid.

A

A

G

A

C

A C A T G

@@

@@

@@

�− A − C A T G

C A G A − − A

Cynthia Vinzant Parametric Sequence Alignment

Page 7: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

Sequence Alignment

Alignment Summaries

Every alignment has an alignment summary, (w , x , y), where

I w = # of matches

I x = # of mismatches

I y = # of spaces

Example:

− A − C A T G

C A G A − − A→ (1,2,2)

**NOTE: w + x + y = n (= length of sequences)

Cynthia Vinzant Parametric Sequence Alignment

Page 8: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

Sequence Alignment

Alignment Summaries

Every alignment has an alignment summary, (w , x , y), where

I w = # of matches

I x = # of mismatches

I y = # of spaces

Example:

− A − C A T G

C A G A − − A→ (1,2,2)

**NOTE: w + x + y = n (= length of sequences)

Cynthia Vinzant Parametric Sequence Alignment

Page 9: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

Sequence Alignment

Alignment Summaries

Every alignment has an alignment summary, (w , x , y), where

I w = # of matches

I x = # of mismatches

I y = # of spaces

Example:

− A − C A T G

C A G A − − A→ (1,2,2)

**NOTE: w + x + y = n (= length of sequences)

Cynthia Vinzant Parametric Sequence Alignment

Page 10: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

Sequence Alignment

Optimal Alignments of ACATG, CAGAA

Γ4 :ACATG−−−−−−−−−−CAGAA

Γ3 :ACATG−−−CA−GAA

Γ2 :ACATG−−CAGAA

Γ1 :ACATGCAGAA

Alignments Alignment Summaries

(0, 0, 5)

(3, 0, 2)

(2, 2, 1)

(0, 5, 0)

Cynthia Vinzant Parametric Sequence Alignment

Page 11: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

How many optimal alignment summaries are there?

Theorem (Gusfield, 1994)

For any alphabet Σ, fΣ(n) = O(n2/3), that is, there is a constant cs.t.

fΣ(n) ≤ c · n2/3

Theorem (Fernandez-Baca et. al., 2002 )

fΣ(n) ≤ 3

(2π)2/3n2/3 + O(n1/3 log n)

Cynthia Vinzant Parametric Sequence Alignment

Page 12: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

How many optimal alignment summaries are there?

Theorem (Gusfield, 1994)

For any alphabet Σ, fΣ(n) = O(n2/3), that is, there is a constant cs.t.

fΣ(n) ≤ c · n2/3

Theorem (Fernandez-Baca et. al., 2002 )

fΣ(n) ≤ 3

(2π)2/3n2/3 + O(n1/3 log n)

Cynthia Vinzant Parametric Sequence Alignment

Page 13: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Observations

1) Boundary lines are of the form

{(α, β) : scoreα,β(Γ1) = scoreα,β(Γ2)}.

2) All boundary lines pass through the point (−1,−1).

3) No boundary lines intersect the ray β = 0, α > 0.

Conclusion: All boundary lines must (uniquely) intersect thenon-negative β-axis.

Cynthia Vinzant Parametric Sequence Alignment

Page 14: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Observations

1) Boundary lines are of the form

{(α, β) : scoreα,β(Γ1) = scoreα,β(Γ2)}.

2) All boundary lines pass through the point (−1,−1).

3) No boundary lines intersect the ray β = 0, α > 0.

Conclusion: All boundary lines must (uniquely) intersect thenon-negative β-axis.

Cynthia Vinzant Parametric Sequence Alignment

Page 15: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Observations

1) Boundary lines are of the form

{(α, β) : scoreα,β(Γ1) = scoreα,β(Γ2)}.

2) All boundary lines pass through the point (−1,−1).

3) No boundary lines intersect the ray β = 0, α > 0.

Conclusion: All boundary lines must (uniquely) intersect thenon-negative β-axis.

Cynthia Vinzant Parametric Sequence Alignment

Page 16: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Observations

1) Boundary lines are of the form

{(α, β) : scoreα,β(Γ1) = scoreα,β(Γ2)}.

2) All boundary lines pass through the point (−1,−1).

3) No boundary lines intersect the ray β = 0, α > 0.

Conclusion: All boundary lines must (uniquely) intersect thenon-negative β-axis.

Cynthia Vinzant Parametric Sequence Alignment

Page 17: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Simplification

Now we only need to know how many optimality regions there areon the non-negative β-axis.

Boundary lines (now just points) look like

{β : score(0,β)(Γ1) = score(0,β)(Γ2)},

meaning

w1 − βy1 = w2 − βy2 ⇒ β =w2 − w1

y2 − y1

Cynthia Vinzant Parametric Sequence Alignment

Page 18: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Simplification

Now we only need to know how many optimality regions there areon the non-negative β-axis.

Boundary lines (now just points) look like

{β : score(0,β)(Γ1) = score(0,β)(Γ2)},

meaning

w1 − βy1 = w2 − βy2 ⇒ β =w2 − w1

y2 − y1

Cynthia Vinzant Parametric Sequence Alignment

Page 19: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Simplification

Now we only need to know how many optimality regions there areon the non-negative β-axis.

Boundary lines (now just points) look like

{β : score(0,β)(Γ1) = score(0,β)(Γ2)},

meaning

w1 − βy1 = w2 − βy2 ⇒ β =w2 − w1

y2 − y1

Cynthia Vinzant Parametric Sequence Alignment

Page 20: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Constraints

We only need to look at (w , y)-plane.

I Suppose we have vertices (w1, y1), (w2, y2), . . . , (wm, ym) ofan alignment polytope for sequences of length n.(Want to know: How big can m be in terms of n?)

I Note:m∑

i=1

∆wi ≤ n andm∑

i=1

∆yi ≤ n

I To maximize m, we need the most distinct ∆wi∆yi

, with∆wi + ∆yi small.

Cynthia Vinzant Parametric Sequence Alignment

Page 21: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Constraints

We only need to look at (w , y)-plane.

I Suppose we have vertices (w1, y1), (w2, y2), . . . , (wm, ym) ofan alignment polytope for sequences of length n.(Want to know: How big can m be in terms of n?)

I Note:m∑

i=1

∆wi ≤ n andm∑

i=1

∆yi ≤ n

I To maximize m, we need the most distinct ∆wi∆yi

, with∆wi + ∆yi small.

Cynthia Vinzant Parametric Sequence Alignment

Page 22: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Constraints

We only need to look at (w , y)-plane.

I Suppose we have vertices (w1, y1), (w2, y2), . . . , (wm, ym) ofan alignment polytope for sequences of length n.(Want to know: How big can m be in terms of n?)

I Note:m∑

i=1

∆wi ≤ n andm∑

i=1

∆yi ≤ n

I To maximize m, we need the most distinct ∆wi∆yi

, with∆wi + ∆yi small.

Cynthia Vinzant Parametric Sequence Alignment

Page 23: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Given that∑

i (∆wi + ∆yi ) ≤ 2n, how many different slopes wiyi

can there be?

Define Fr = { ab s.t. a + b = r and a, b relatively prime}

Taking our ∆wi∆yi

fromq⋃

r=1Fr gives us

m =

q∑r=1

|Fr | and n =

q∑r=1

r · |Fr |

⇒m ≈ q2 and n ≈ q3

Cynthia Vinzant Parametric Sequence Alignment

Page 24: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Given that∑

i (∆wi + ∆yi ) ≤ 2n, how many different slopes wiyi

can there be?

Define Fr = { ab s.t. a + b = r and a, b relatively prime}

Taking our ∆wi∆yi

fromq⋃

r=1Fr gives us

m =

q∑r=1

|Fr | and n =

q∑r=1

r · |Fr |

⇒m ≈ q2 and n ≈ q3

Cynthia Vinzant Parametric Sequence Alignment

Page 25: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Given that∑

i (∆wi + ∆yi ) ≤ 2n, how many different slopes wiyi

can there be?

Define Fr = { ab s.t. a + b = r and a, b relatively prime}

Taking our ∆wi∆yi

fromq⋃

r=1Fr gives us

m =

q∑r=1

|Fr | and n =

q∑r=1

r · |Fr |

⇒m ≈ q2 and n ≈ q3

Cynthia Vinzant Parametric Sequence Alignment

Page 26: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Given that∑

i (∆wi + ∆yi ) ≤ 2n, how many different slopes wiyi

can there be?

Define Fr = { ab s.t. a + b = r and a, b relatively prime}

Taking our ∆wi∆yi

fromq⋃

r=1Fr gives us

m =

q∑r=1

|Fr | and n =

q∑r=1

r · |Fr |

⇒m ≈ q2 and n ≈ q3

Cynthia Vinzant Parametric Sequence Alignment

Page 27: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Can the bound be improved for finite alphabets?

Fernandez-Baca et. al. (2002) showed

I c · n2/3 ≤ fΣ(n) for Σ infinite (by constructing sequences thatattained n2/3 optimal alignments).

I c ·√

n ≤ f{0,1}(n).

I E (g(σ1, σ2) ≈√

n.

Conjecture: f{0,1}(n) = Θ(√

n). That is, there are constants, c ,Cso that

c ·√

n ≤ f{0,1}(n) = C ·√

n

Cynthia Vinzant Parametric Sequence Alignment

Page 28: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

α− β planeproof√

n conjecture

Can the bound be improved for finite alphabets?

Fernandez-Baca et. al. (2002) showed

I c · n2/3 ≤ fΣ(n) for Σ infinite (by constructing sequences thatattained n2/3 optimal alignments).

I c ·√

n ≤ f{0,1}(n).

I E (g(σ1, σ2) ≈√

n.

Conjecture: f{0,1}(n) = Θ(√

n). That is, there are constants, c ,Cso that

c ·√

n ≤ f{0,1}(n) = C ·√

n

Cynthia Vinzant Parametric Sequence Alignment

Page 29: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

ConstructionExample for q = 4

The bound is tight!

Claim: f{0,1}(n) = Θ(n2/3).

So our goal is to construct two sequences σ1, σ2 that have n2/3

optimal alignments.

Cynthia Vinzant Parametric Sequence Alignment

Page 30: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

ConstructionExample for q = 4

Define

F r = {ab≤ 1 s.t. a, b relatively prime, and a + b = r},

and let { a1b1, a2

b2, . . . , am

bm} be the elements of

q⋃r=1

F r .

First sequence, σ1 =

0b1+a11b1−a10b2+a2 . . . 0bm+am1bm−am 0bm−am1bm+am . . . 0b1−a11b1+a1

Second sequence, σ2 = 1(Σ2bi )0(Σ2bi ).

Cynthia Vinzant Parametric Sequence Alignment

Page 31: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

ConstructionExample for q = 4

Define

F r = {ab≤ 1 s.t. a, b relatively prime, and a + b = r},

and let { a1b1, a2

b2, . . . , am

bm} be the elements of

q⋃r=1

F r .

First sequence, σ1 =

0b1+a11b1−a10b2+a2 . . . 0bm+am1bm−am 0bm−am1bm+am . . . 0b1−a11b1+a1

Second sequence, σ2 = 1(Σ2bi )0(Σ2bi ).

Cynthia Vinzant Parametric Sequence Alignment

Page 32: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

ConstructionExample for q = 4

Define

F r = {ab≤ 1 s.t. a, b relatively prime, and a + b = r},

and let { a1b1, a2

b2, . . . , am

bm} be the elements of

q⋃r=1

F r .

First sequence, σ1 =

0b1+a11b1−a10b2+a2 . . . 0bm+am1bm−am 0bm−am1bm+am . . . 0b1−a11b1+a1

Second sequence, σ2 = 1(Σ2bi )0(Σ2bi ).

Cynthia Vinzant Parametric Sequence Alignment

Page 33: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

ConstructionExample for q = 4

1

1

1

1

0

0

1

1

1

0

1

1

0

0

1

0

0

0

1

1

0

0

0

0

1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@

@@

@@@

@@

@@

@@

qqqq

Cynthia Vinzant Parametric Sequence Alignment

Page 34: Parametric Sequence Alignment

OutlineIntroduction

An Upper BoundA Lower Bound

ConstructionExample for q = 4

Some open questions:

1) What is E (g(σ1, σ2))? Θ(√

n)?

2) Pachter and Sturmfels (year) showed that for d-parametermodels, fΣ(n) = O(nd(d−1)/(d+1)). Is this also tight?

Cynthia Vinzant Parametric Sequence Alignment