bioinformatics bio314madhusudhan/bio314_2017/bio_314_lecture_1… · r. durbin, s. r. eddy, a....

40
Bioinformatics Bio314 Sequence alignments - dynamic programming - local; global; different gap penalties Substitution matrices - Construction; different types Multiple sequence alignments Phylogeny and clustering Markov models - Hidden Markov model algorithms Motif finding Heuristic Alignments Structural biology/bioinformatics - Secondary structure prediction - Neural networks - 3D structure modeling - Structural analysis - Molecular dynamics simulations - drug design Next generation sequencing 1

Upload: others

Post on 22-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Bioinformatics Bio314Sequence alignments

- dynamic programming

- local; global; different gap penalties

Substitution matrices

- Construction; different types

Multiple sequence alignments

Phylogeny and clustering

Markov models

- Hidden Markov model algorithms

Motif finding

Heuristic Alignments

Structural biology/bioinformatics

- Secondary structure prediction

- Neural networks

- 3D structure modeling

- Structural analysis

- Molecular dynamics simulations

- drug design

Next generation sequencing1

Page 2: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Bio314 Evaluation/assessment

Mid-sem exam 30%

End-sem exam 30%

Quizzes 15%

Assignments 25%

2

Page 3: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Reference material

R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, 1999), ISBN 0521629713

Arthur Lesk, Introduction to Bioinformatics (Oxford University Press), ISBN-13: 978-0199208043

David Mount, Bioinformatics: Sequence and Genome Analysis (CSHL press, 2004), ISBN-13: 978-0879697129

Other books (names will be mentioned later in class).

Several research papers + web sites + web servers.

3

Page 4: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

What is Bioinformatics?

Representation

Scoring

Sampling (Optimization)

4

Page 5: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

The central dogma

source: wikipedia

5

Page 6: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Evolution at the DNA level

6

AGCCTAGACAGTTAAG---AGACGGTTA

inversiontranslocationduplication

insertion mutation

Page 7: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Sequence Alignments

Important in

- Deducing evolutionary relationships

- Function annotation

- Identifying important regions

7

Page 8: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Aligning pairs of sequences

Given two strings

X = X1,X2,X3….XM

Y = Y1,Y2,Y3….YN

An alignment is an assignment of gaps to positions 0,….M in X and 0,….N in Y, so as to line up each letter in one sequence with either a letter or a gap in the other sequence.

8

Page 9: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Why sequence alignments?

Sequence similarity often implies structural functional relationship

New sequences emerge by

- insertion/deletion

- substitutions

Similarity between 2 sequences is assessed by an alignment

9

Page 10: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

References

R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, 1999), ISBN 0521629713

Arthur Lesk, Introduction to Bioinformatics (Oxford University Press), ISBN-13: 978-0199208043

Papers/articles: will be distributed at appropriate times

10

Page 11: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Alignment examples

11

HBA_HUMAN GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL HBB_HUMAN GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

HBA_HUMAN GSAQVKGHGKKVADALTNAVAHV---D—-DMPNALSALSDLHAHKLLGB2_LUPLU NNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKG

HBA_HUMAN GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD----LHAHKLF11G11.2 GSGYLVGDSLTFVDLL--VAQHTADLLAANAALLDEFPQFKAHQE

MatchesMismatchesGaps (Insertions/deletions)

Page 12: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

The alignment scheme

Representation

Scoring

- substitution scores

- gap penalties

sampling/optimization

- Enumerate all possible alignments?

- dynamic programming

12

Page 13: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Alignment by enumeration

Too many to compute/enumerate!

13

GAATC CATAC GAATC- CA-TAC

GAAT-C C-ATAC GAAT-C CA-TAC

-GAAT-C C-A-TAC GA-ATC CATA-C

Page 14: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Substitution scores

S(a,b) = Score for substituting residue ‘a’ by residue ‘b’

S(a,b) for proteins is a 20 X 20 matrix

How do we get the values of this matrix?

- Popular matrices = PAM250, BLOSUM62, BLOSUM50...

14

)log(),(ba

ab

qqpbas =

Page 15: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Gap penalties

Gaps are penalized

Linear gap penalty: all gaps are penalized with equal penalty d

Affine gap penalty:

- Distinguish between opening and extension of gaps

- penalty for gap opening larger than for gap extension

15

HBA_HUMAN GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD----LHAHKLF11G11.2 GSGYLVGDSLTFVDLL--VAQHTADLLAANAALLDEFPQFKAHQE

Page 16: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Dynamic ProgrammingOptimization problem: Find optimal alignment by maximizing score

Optimal alignments are obtained from optimal sub-alignments

The alignment of

X1,X2......Xi

Y1,Y2......Yj

can be constructed from the 3 sub-alignments:

X1,X2......Xi-1

Y1,Y2......Yj-1

X1,X2......Xi-1

Y1,Y2......Yj

X1,X2......Xi

Y1,Y2......Yj-116

Page 17: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Global and local alignments

2 flavours of dynamic programming

- Global Needleman-Wunsch [1970], Peter Sellers[1974]

LGPSTKDFGKISESREFDN LNQLERSFGKINMRLEDA

- Local Smith-Waterman [1981]

-------FGKI---------------FGKI-------Overhangs are not penalized.

17

Page 18: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Global dynamic programmingAlign the 2 sequences X and Y

X: HEAGAWGHEEY: PAWHEAE

18

6 6 0 -3 -3 -1 -3 -1 6 0 E

-1 -1 -2 0 -3 5 0 5 -1 -2 A

6 6 0 -3 -3 -1 -3 -1 6 0 E

0 0 10 -2 -3 -2 -2 -2 0 10 H

-3 -3 -3 -3 15 -3 -3 -3 -3 -3 W

-1 -1 -2 0 -3 5 0 5 -1 -2 A

-1 -1 -2 -2 -4 -1 -2 -1 -1 -2 P

E E H G W A G A E H

BLOSUM50 substitution matrix

Linear gap penalty = -8

Page 19: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

X X X X . . .Y Y Y . . .

A compact representation

19

XXXXXXXXXXXXX--YYYYYYYYYYYYYYY

XXXX--XXXXXXXXX-YYYYYYYYYY--YYYY

All possible alignments can be enumerated in this matrix representation

Page 20: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

The alignment schemeRepresentation

Scoring

- substitution scores

- gap penalties

sampling/optimization

- dynamic programming

- Compute scoring matrix

- Recurrence formula

- Traceback20

Page 21: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Global dynamic programming: recurrence relationship

21

X1,X2......Xi-1

Y1,Y2......Yj-1

X1,X2......Xi

Y1,Y2......Yj-1

X1,X2......Xi-1

Y1,Y2......Yj

X1,X2......Xi

Y1,Y2......Yj

F(i-1,j-1) F(i,j-1)

F(i-1,j) F(i,j)

+S(Xi,Yj)

-d

-d

Page 22: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Global dynamic programming: recurrence relationship

F(i,j) = Maximum of

F(i-1,j-1) + S(Xi,Yj)

F(i,j-1) -d

F(i-1,j) -d

Base conditions:

F(i,0) = -i * d

F(0,j) = -j * d

22

{

Page 23: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Global dynamic programming: Scoring Matrix computation

H E A G A W G H E E0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

P -8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

A -16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

W -24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

H -32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

E -40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

A -48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

E -56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

! F(1,0) + s(-,P) = -8 + (-8) = -16F(1,1) = max F(0,0) + s(H,P) = 0 + (-2) = -2! F(0,1) + s(H,-) = -8 + (-8) = -16

! F(2,0) + s(-,P) = -16 + (-8) = -24!F(2,1) = max F(1,0) + s(E,P) = -8 + (-1) = -9! F(1,1) + s(E,-) = -2 + (-8) = -10

F(i,j-1) + s(-,yj)F(i,j) = max F(i-1,j-1) + s(xi,yj)! F(i-1,j) + s(xi,-)

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

0

0

?

Page 24: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Global dynamic programming: Traceback

H E A G A W G H E E0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

P -8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

A -16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

W -24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

H -32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

E -40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

A -48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

E -54 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

G A W G H E - E- A W – H E A E

1 2 3 4 5 6 7 8 9 100

0

1

2

3

4

5

6

7 1

-5

3

-3

-13-5

-20

-25-17

-16-80

H-

E-

AP

Page 25: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Global dynamic programming: Traceback

H E A G A W G H E E0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

P -8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

A -16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

W -24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

H -32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

E -40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

A -48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

E -54 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

G A W G H E - E- A W – H E A E

1 2 3 4 5 6 7 8 9 100

0

1

2

3

4

5

6

7 1

-5

3

-3

-13-5

-20

-25-17

-16-80

H-

E-

AP

Page 26: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Dynamic programming: computational complexity

• F-matrix computation:– We store (n+1) x (m+1) numbers– Each number involves a constant number of computations

• Three sums and a maximum

• • Traceback:

– O(m+n) time

• Complexity of the algorithm:– O(mn) time– O(mn) memory

For sequences of comparable lengths, the dynamic programming algorithm is O(n2)– Computationally tractable

26

Page 27: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Global and local alignments

2 flavours of dynamic programming

- Global Needleman-Wunsch [1970], Peter Sellers[1974]

LGPSTKDFGKISESREFDN LNQLERSFGKINMRLEDA

- Local Smith-Waterman [1981]

-------FGKI---------------FGKI-------Overhangs are not penalized.

27

Page 28: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Local alignments

Global Alignment shortcomingsEffective when the two sequences have detectable sequence similarity over their entire lengths

It could be inaccurate when we are only interested in finding best alignment between subsequences

Local AlignmentsEffective when the two sequences have detectable sequence similarity over their entire lengths

Problem: Given two sequences X and Y, find subsequences α and β of X and Y, respectively, whose alignment score is maximum over all pairs of subsequences from X and Y

28

Page 29: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Local dynamic programming: recurrence relationship

F(i,j) = Maximum of

F(i-1,j-1) + S(Xi,Xj)

F(i,j-1) -d

F(i-1,j) -d

Base conditions:

F(i,0) = -i * d

F(0,j) = -j * d

29

0

0

0 0

0

{

Smith, T. F. and Waterman, M. S. 1981. Journal of Molecular Biology 147:195-197

Page 30: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Local dynamic programming

30

H E A G A W G H E E0 0 0 0 0 0 0 0 0 0 0

P 0

A 0

W 0

H 0

E 0

A 0

E 0

Page 31: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

H E A G A W G H E E0 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0 0 0

A 0 0 0 5 0 5 0 0 0 0 0

W 0 0 0 0 2 0 20 12 4 0 0

H 0 10 2 0 0 0 12 18 22 14 6

E 0 2 16 8 0 0 4 10 18 28 20

A 0 0 8 21 13 5 0 4 10 20 27

E 0 0 6 13 18 12 4 0 4 16 26

Smith-Waterman algorithm

Page 32: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

H E A G A W G H E E0 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0 0 0

A 0 0 0 5 0 5 0 0 0 0 0

W 0 0 0 0 2 0 20 12 4 0 0

H 0 10 2 0 0 0 12 18 22 14 6

E 0 2 16 8 0 0 4 10 18 28 20

A 0 0 8 21 13 5 0 4 10 20 27

E 0 0 6 13 18 12 4 0 4 16 26

28

22

1220

5

0

AA

WW

G-

HH

EE

Local dynamic programming: Traceback

Page 33: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

H E A G A W G H E E0 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0 0 0

A 0 0 0 5 0 5 0 0 0 0 0

W 0 0 0 0 2 0 20 12 4 0 0

H 0 10 2 0 0 0 12 18 22 14 6

E 0 2 16 8 0 0 4 10 18 28 20

A 0 0 8 21 13 5 0 4 10 20 27

E 0 0 6 13 18 12 4 0 4 16 26

28

22

1220

5

0

AA

WW

G-

HH

EE

Local dynamic programming: Traceback

Page 34: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Global vs Local dynamic programming

34

Recurrence relationship Similar in both casesSimilar in both cases

Traceback Start with the last value Start with the highest value

Overhangs Penalized Not-penalized

Output Optimal alignment between whole sequences

Optimal alignment between sub-sequences

Sensitivity Detecting sequence with high similarity

Better at detecting remote similarities

Computational Complexity O(n*n) in time and memoryO(n*n) in time and memory

Page 35: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Gap penalties

Gaps are penalized

Linear gap penalty: all gaps are penalized with equal penalty d

Affine gap penalty:

- Distinguish between opening and extension of gaps

- penalty for gap opening larger than for gap extension

35

HBA_HUMAN GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD----LHAHKLF11G11.2 GSGYLVGDSLTFVDLL--VAQHTADLLAANAALLDEFPQFKAHQE

Page 36: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Affine gap penaltiesGap penalty: G = U + k*L

- U = gap open penalty; k= gap extension penalty; L = gap length

Base conditions -

36

H E A G A W G H E E0 U U+k U+2k U+3k U+4k U+5k U+6k U+7k U+8k U+9k

P U

A U+1k

W U+2k

H U+3k

E U+4k

A U+5k

E U+6k

Page 37: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Global dynamic programming: recurrence relationship

37

X1,X2......Xi-1

Y1,Y2......Yj-1

X1,X2......Xi

Y1,Y2......Yj-1

X1,X2......Xi-1

Y1,Y2......Yj

X1,X2......Xi

Y1,Y2......Yj

Page 38: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Affine gap penalties: recurrence relationships

38

X1 X2 . . . . . Xi-1 Xi

Y1 Y2 . . . . . Yj -

X1 X2 . . . . . Xi-1 Xi

Y1 Y2 . . . . Yj - -

X1 X2 . . . . . Xi-1 Xi

Y1 Y2 . . . . . Yj-1 Yj

F(i,j) = V(i-1,j-1) + S(Xi,Yi)

G(i,j) = V(i-1,j) - U

G(i,j) = G(i-1,j) - k*1

Page 39: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Affine gap penalties: recurrence relationships

39

F(i,j) = V(i-1,j-1) + S(Xi,Yi)

H(i,j) = V(i,j-1) - U

H(i,j) = H(i,j-1) - k*1

X1 X2 . . . . . Xi-1 -

Y1 Y2 . . . . . Yj Yj

X1 X2 . . . . . - -

Y1 Y2 . . . . . Yj-1 Yj

X1 X2 . . . . . Xi-1 Xi

Y1 Y2 . . . . . Yj-1 Yj

Page 40: Bioinformatics Bio314madhusudhan/Bio314_2017/Bio_314_lecture_1… · R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins

Affine gap penalties: recurrence relationships

V(i,j) = max{F(i,j), G(i,j), H(i,j)}

F(i,j) = V(i-1,j-1) + S(Xi,Yj)

G(i,j) = max

H(i,j) = max

40

V(i-1,j) - UG(i-1,j) - k*1

V(i,j-1) - UG(i,j-1) - k*1

{

{