formulationandanalysisofpatternsinascorematrixforglobal...

9
Research Article FormulationandAnalysisofPatternsinaScoreMatrixforGlobal Sequence Alignment James Owusu Asare, 1 Justice Kwame Appati , 2 and Kwaku Darkwah 1 1 Department of Mathematics, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana 2 Department of Computer Science, University of Ghana, Legon, Ghana Correspondence should be addressed to Justice Kwame Appati; [email protected] Received 15 April 2020; Accepted 16 May 2020; Published 1 June 2020 Academic Editor: Aloys Krieg Copyright © 2020 James Owusu Asare et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Global sequence alignment is one of the most basic pairwise sequence alignment procedures used in molecular biology to understand the similarity that arises among the structure, function, or evolutionary relationship between two nucleotide se- quences. e general algorithm associated with global sequence alignment is the dynamic programming algorithm of Needleman and Wunsch. In this paper, patterns are exploited in the score matrix of the Needleman–Wunsch algorithm. With the help of some examples, the general patterns realized are formulated as new a priori propositions and corollaries that are established for both equal and unequal length comparisons of any two arbitrary sequences. 1. Introduction Sequence alignment is the matching of strings or sequences of characters to identify patterns that may lead to informed structural or functional relationships between the strings or sequences matched. Varying situational problems seem to require the use of sequence alignment and examples abound from computer science to molecular biology, where se- quences are usually aligned to make more meaning out of them. Bare [1] has discussed how researchers have over the years considered genetic sequences as strings of characters instead of focusing on their intrinsic properties to be able to compute the similarity among two or more related se- quences. Global and local sequence alignment are used for matching two sequences, and they are categorized as pair- wise sequence alignment or multiple sequence alignment when any alignment matches more than two sequences [2–4]. e Needleman–Wunsch algorithm [5–7] and the Smith–Waterman algorithm [8] are the general algorithms associated with pairwise global and local alignment, re- spectively. ese two algorithms utilize a procedure called the dynamic programming approach. Dynamic program- ming as coined by Bellman in the 1940s is simply the process of solving a bigger problem by finding optimal solutions to its smaller nested problems [9–11]. us, to tackle a problem in the context of dynamic programming, it must possess the notion of recurrence. In [12, 13], an algorithm is said to be a well-defined computational task that accepts input and produces output values after following through a systematic method. Mathematical theory is thus a prerequisite behind the designing of functional programs [14, 15], and the al- gorithm design specializes in solving such problems. Global sequence alignment is mentioned as one of the vast dynamic programming applications in practical problems [16–18]. More recently, Ouzounis and Baichoo [19] have stated that even though pairwise sequence alignment has been dealt with over time, concerns still remain in resolving exact evolutionary distances that demand very specific estimates. ey have suggested the existence of theoretical relation- ships within alignments, algorithms, and data that are yet to be found. Motivated by the display of position-dependent arrays for affine gaps using the Needleman–Wunsch algo- rithm in [20, 21], we consider the more basic linear gap penalties using arbitrary sequences and proceed to find patterns in the score matrix which were absent in the presentation. is approach was taken because it is found Hindawi International Journal of Mathematics and Mathematical Sciences Volume 2020, Article ID 3858057, 9 pages https://doi.org/10.1155/2020/3858057

Upload: others

Post on 19-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FormulationandAnalysisofPatternsinaScoreMatrixforGlobal ...downloads.hindawi.com/journals/ijmms/2020/3858057.pdf · wise sequence alignment or multiple sequence alignment when any

Research ArticleFormulation and Analysis of Patterns in a Score Matrix for GlobalSequence Alignment

James Owusu Asare1 Justice Kwame Appati 2 and Kwaku Darkwah1

1Department of Mathematics Kwame Nkrumah University of Science and Technology Kumasi Ghana2Department of Computer Science University of Ghana Legon Ghana

Correspondence should be addressed to Justice Kwame Appati jkappatiugedugh

Received 15 April 2020 Accepted 16 May 2020 Published 1 June 2020

Academic Editor Aloys Krieg

Copyright copy 2020 James Owusu Asare et al )is is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Global sequence alignment is one of the most basic pairwise sequence alignment procedures used in molecular biology tounderstand the similarity that arises among the structure function or evolutionary relationship between two nucleotide se-quences )e general algorithm associated with global sequence alignment is the dynamic programming algorithm of NeedlemanandWunsch In this paper patterns are exploited in the score matrix of the NeedlemanndashWunsch algorithmWith the help of someexamples the general patterns realized are formulated as new a priori propositions and corollaries that are established for bothequal and unequal length comparisons of any two arbitrary sequences

1 Introduction

Sequence alignment is the matching of strings or sequencesof characters to identify patterns that may lead to informedstructural or functional relationships between the strings orsequences matched Varying situational problems seem torequire the use of sequence alignment and examples aboundfrom computer science to molecular biology where se-quences are usually aligned to make more meaning out ofthem Bare [1] has discussed how researchers have over theyears considered genetic sequences as strings of charactersinstead of focusing on their intrinsic properties to be able tocompute the similarity among two or more related se-quences Global and local sequence alignment are used formatching two sequences and they are categorized as pair-wise sequence alignment or multiple sequence alignmentwhen any alignment matches more than two sequences[2ndash4] )e NeedlemanndashWunsch algorithm [5ndash7] and theSmithndashWaterman algorithm [8] are the general algorithmsassociated with pairwise global and local alignment re-spectively )ese two algorithms utilize a procedure calledthe dynamic programming approach Dynamic program-ming as coined by Bellman in the 1940s is simply the process

of solving a bigger problem by finding optimal solutions toits smaller nested problems [9ndash11])us to tackle a problemin the context of dynamic programming it must possess thenotion of recurrence In [12 13] an algorithm is said to be awell-defined computational task that accepts input andproduces output values after following through a systematicmethod Mathematical theory is thus a prerequisite behindthe designing of functional programs [14 15] and the al-gorithm design specializes in solving such problems Globalsequence alignment is mentioned as one of the vast dynamicprogramming applications in practical problems [16ndash18]

More recently Ouzounis and Baichoo [19] have statedthat even though pairwise sequence alignment has been dealtwith over time concerns still remain in resolving exactevolutionary distances that demand very specific estimates)ey have suggested the existence of theoretical relation-ships within alignments algorithms and data that are yet tobe found Motivated by the display of position-dependentarrays for affine gaps using the NeedlemanndashWunsch algo-rithm in [20 21] we consider the more basic linear gappenalties using arbitrary sequences and proceed to findpatterns in the score matrix which were absent in thepresentation )is approach was taken because it is found

HindawiInternational Journal of Mathematics and Mathematical SciencesVolume 2020 Article ID 3858057 9 pageshttpsdoiorg10115520203858057

missing in the available stream of literature although it canbe likened to the edit graph illustration seen in [1] To thebest of our knowledge there has never been any theoreticalexposition of this kind for the most basic and very funda-mental concept of constant linear gap penalties whichconstitute an implicit part of affine gap penalties )us wefocus on finding patterns in global sequence alignment usingconstant linear gap penalties and attempt the display ofcompleted score matrices of the NeedlemanndashWunsch al-gorithm with distinct positional arrays that can be inspectedWe give confirmation of this by basic proofs and suggesthow predictions can be made To offer the reader the muchneeded convenience concepts that are deemed useful arerecalled in brief with corresponding references of compre-hensive works that give more in-depth information

2 NeedlemanndashWunsch Algorithm

Let the recursive formulation for the NeedlemanndashWunschalgorithm be

(1) Initialization

Let P(0 0) 0Pm(i 0) Pm(i minus 1 0) + gap penalty foralli 1 μ1andPm(0 j) Pm(0 j minus1) + gap penalty forallj 1 μ2where P(0 0) is the initialization pivot for the scorematrix Pm(i 0) is the initial row pivot for the scorematrix and Pm(0 j) is the initial column pivot forthe score matrix

(2) Cell Box Calculation

Let

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

(1)

where Pm(i j) is the pivot for each cell box calculationPD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j)) is the diagonalvalue of a cell box PR(i j) Pm(i minus 1 j) + gap penalty is theright value of a cell box PL(i j) Pm(i j minus 1) + gap penaltyis the left value of a cell box and σ(α(i) β(j)) is the score foraligning the sequence characters of α(i) and β(j)

Specifically the procedure for the NeedlemanndashWunschalgorithm follows the dynamic programming approach ofthe score matrix traceback and alignment as outlined in thesubsequent sections

21 Score Matrix )e score matrix is a tabular box con-structed to keep count of score results )e score matrix forthe NeedlemanndashWunsch algorithm begins with an initiali-zation process and ends with the calculation of cell boxes

211 Initialization A sequence matrix is created withμ1 + 1 columns and μ2 + 1 rows in order for the initialmatrix gap to be aligned where μ1 and μ2 are the lengths

of arbitrary sequences langαrang and langβrang respectively )eletters of the sequence α fill in the horizontal axis andsimilarly the characters of langβrang fill in the vertical axis ofthe sequence matrix created Before the scoring beginsfrom the upper left corner of the initialized matrix to thelower right corner of the matrix the value P(0 0) 0 isassigned to the intersection of the first row and the firstcolumn of the matrix (ie the initial gap) )e reason forthe gap penalty for an alignment is because of thepossibility of mutation which may insert or delete astring character from one of the sequences Arrows thatpoint in the direction of positional movement (diagonaland left or right) are placed in each cell box of the matrixand only terminate when all the cell boxes are completelyfilled

212 Calculation of Cell Boxes in the Score Matrix

(1) Fill the initialized gap values first on both the hor-izontal axis and the vertical axis with a definedconstant gap value score

(2) Follow with the calculation of each cell box havingthe three position-dependent arrays (leftbesiderightbottom or diagonal)

(3) Allow only a matchmismatch value for a ldquodiagonalrdquoposition and allow the ldquobottomrdquo or ldquobesiderdquo posi-tions to take linear gap values only

(4) For each computed cell box values find the maxi-mum score and let that be the pivot

(5) )e pivot of a computed cell box directly affects thenext cell boxes in the row or the column

22 Traceback A simpler score matrix table that containsonly the pivot of each cell box calculation is constructedfrom the original position-dependent arrays of the scorematrix table Arrow pointers are used to direct a path fromthe highest score or an optimal value in the matrix (whichactually occurs at the lower right end corner of the matrix)and traced back to the next biggest value of the predecessorsuntil we reach the intersection of the first row and thecolumn with the initial gap

23 Alignment

231 Alignment Generation from a Traceback To write thesequence characters that appear from the optimal alignmentpath of the traceback stage the following steps are followed

(1) When the arrow is diagonal write both characters inthe alignment

(2) When the arrow is vertical write the correspondinghorizontal character and in place of the verticalcharacter leave a gap

(3) When the arrow is horizontal write the corre-sponding vertical character and in place of thehorizontal character leave a gap

2 International Journal of Mathematics and Mathematical Sciences

)us for both the vertical and horizontal arrow posi-tions one character and a gap are written for the alignmentwhere the gap explicitly replaces a character position in thealignment An alignment can only be inferred as the best ifthe optimal value from the score matrix table corresponds tothe alignment score calculation (based on the scoringscheme defined)

232 0e Problem of Aligning Any Two Sequences )eproblem of aligning any two sequences can be simplified asdiscovering the optimal means of aligning any two arbitrarysequences say langαrang and langβrang such that the character ldquominusrdquo notedas a gap is filled into both langαrang and langβrang or either of themwhere

(1) Any single character in langαrang matches a single char-acter in langβrang or a gap

(2) )e final sum of scores from the scoring functionover the aligned pairs and the gap penalties as givenby the function of gap penalty is maximized

233 Letter Choice for Arbitrary Sequences )e letterchoice of this study shall be that of DNA considered as astring of four characters of adeninemdashA guaninemdashGcytosinemdashC and thyminemdashT

3 Scheme for Scoring

Definition 1 )e constant gap penalty v(k) g times k whereg is an assigned constant and k is the sequence length countof i 1 μ1 for the score matrix row and j 1 μ2 forthe score matrix column)is is the penalty awarded to gapsand is also known as the linear gap function

Definition 2 )e affine gap penalty is the penalty awarded togaps where a greatest consecutive sequence of k gaps is givenas v(k) q + gk where q is the penalty charged for openingthe gap and g is the penalty charged for extending it

)us more formally we define

(1) A constant linear gap as

v(k) gk (2)

(2) An affine gap as

v(k) q + gk kge 1

0 k 01113896 (3)

Altschulrsquos theory assign positive scores to an identityand conserved replacements and assign negative scores toless likely replacements

Remark 1 Needleman and Wunsch use the ldquoidentity ma-trixrdquo in scoring with 1 for a match and 0 for a mismatchNeedlemanndashWunschrsquos score was criticised for not reflecting

observations from the nature because purine-purine orpyrimidine-pyrimidine is less prone to be mutated incomparison with mutations of purine-pyrimidine Becausethere is no definite defined score for DNA alignment thechoice of scoring for this study is +5 for a match minus1 for amismatch and minus2 for a gap as proposed in [18] following theabove theory

More formally let

σ(α(i) β(j)) +5 for α(i) β(j) charactermatch

minus1 for α(i) β(j) charactermismatch1113896

(4)

foralli 1 μ1 andforallj 1 μ2Also let the linear gap penalty v(k) gk where g is a

constant of ldquominus2rdquo and k is the sequence length count of i

1 μ1 for the score matrix row and j 1 μ2 for thescore matrix column

4 Results and Discussion

41 Pattern Investigations in the Score Matrix of Needle-manndashWunschAlgorithm )e sequences α and β that will beused are arbitrary sample sequences that were chosen basedon consideration of the following

(1) μ1 μ2 (equal character length of sequences)(2) μ1 ne μ2 (unequal character length of sequences)

411 Equal Character Length (μ1 μ2 μ)

Example 1 Let langαrang CTTGA and langβrang CTAGA Becausethe length of the sequence characters of langαrang and langβrang isμ1 μ2 5 respectively we align α and β in a score matrixfollowing the above formulation

(1) Initialization P(0 0) 0 for score matrix initiali-zation and then for the initial row pivot Pm(i 0)

Pm(i minus 1 0) + gap penalty where forallk i we find thegap penalty as follows

for i 1 Pm(1 0) P(0 0) + gap penalty

⟹ 0 +(minus2) minus2

for i 2 Pm(2 0) Pm(1 0) + gap penalty

⟹ minus 2 +(minus2) minus4

for i 3 Pm(3 0) Pm(2 0) + gap penalty

⟹ minus 4 +(minus2) minus6

for i 4 Pm(4 0) Pm(3 0) + gap penalty

⟹ minus 6 +(minus2) minus8

for i 5 Pm(5 0) Pm(4 0) + gap penalty

⟹ minus 8 +(minus2) minus10

(5)

and for the initial column pivot Pm(0 j) Pm(0 j minus 1) +gap penalty forallk j the gap penalty is obtained as follows

International Journal of Mathematics and Mathematical Sciences 3

for j 1 Pm(0 1) P(0 0) + gap penalty

⟹ 0 +(minus2) minus2

for j 2 Pm(0 2) Pm(0 1) + gap penalty

⟹ minus 2 +(minus2) minus4

for j 3 Pm(0 3) Pm(0 2) + gap penalty

⟹ minus 4 +(minus2) minus6

for j 4 Pm(0 4) Pm(0 3) + gap penalty

⟹ minus 6 +(minus2) minus8

for j 5 Pm(0 5) Pm(0 4) + gap penalty

⟹ minus 8 +(minus2) minus10

(6)

Before following with further calculation some iden-tification is placed on each cell box for easy noting asshown in Table 1

(2) Cell box calculations

Pm(1 1) max

PD(1 1) P(0 0) + σ(C C) 0 + 5 5

PR(1 1) Pm(0 1) + gap minus2 +(minus2) minus4

PL(1 1) Pm(1 0) + gap minus2 +(minus2) minus4

⎧⎪⎪⎨

⎪⎪⎩

(7)

Remark 2 σ(C C) in PD(1 1) was ldquo5rdquo because that is theassigned score for matched characters Pm(1 1) maxPD(1 1) PR(1 1) PL(1 1)1113864 1113865 max 5 minus4 minus41113858 1113859 5

Pm(1 2) max

PD(1 2) Pm(0 1) + σ(C T) minus2 +(minus1) minus3

PR(1 2) Pm(0 2) + gap minus4 +(minus2) minus6

PL(1 2) Pm(1 1) + gap 5 +(minus2) 3

⎧⎪⎪⎨

⎪⎪⎩

(8)

Remark 3 σ(C T) in PD(1 2) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(12)

max PD1113864 (12)PR(12)PL(12) max minus3 minus6 31113858 1113859 3

Pm(1 3) max

PD(1 3) Pm(0 2) + σ(C A) minus4 +(minus1) minus5

PR(1 3) Pm(0 3) + gap minus6 +(minus2) minus8

PL(1 3) Pm(1 2) + gap 3 +(minus2) 1

⎧⎪⎪⎨

⎪⎪⎩

(9)

Remark 4 σ(C A) in PD(1 3) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(1 3)

max PD1113864 (1 3) PR(1 3) PL(1 3) max minus5 minus8 11113858 1113859 1Consequently the recursion follows similarly until all

the cell boxes are filled In the event of a pivot tie in a cell boxone tie value is picked For the tabular display these no-tations are used interchangeably PD(i j) is the same as DVPL(i j) is the same as LV PR(i j) is the same as RV andPm(i j) is the same as the pivot

412 Pattern Results for Equal Character Length Withreference to Tables 1 and 2 and Figure 1

(1) )e filled-in cell boxes for column 1 ie cell boxes(i vi xi xvi and xxi) have values coinciding withthat of row 1 ie cell boxes (i ii iii iv and v)

(2) )e leading value (pivot) of each cell box in both row1 and column 1 remains the same

(3) )e bottom value in column 1 switches to becomethe beside value in row 1 and vice versa

(4) For each 3 pointed arrow intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box

(5) For each cell box the value of the preceding bottomvalue is less than the value of the immediate nextadjacent diagonal value

413 Traceback and Alignment for Equal Character Length)e traceback and alignment stages for equal characterlength of sequences are respectively shown in this section

Figure 2 shows the pivot of each cell box calculation ofthe score matrix table of CTTGA and CTAGA in Figure 1)us Figure 2 is a simpler score matrix table constructedfrom the original score matrix table of CTTGA and CTAGAin Figure 1 )e arrow pointers direct the path from theoptimal value and traceback to the initialization value ofzero Based on the traceback and the diagonal direction ofthe arrow pointers the alignment is written as

CTTGA

CTAGA(10)

To confirm the correctness of the alignment done wecheck using calculations Recall the scoring schemematch+5 mismatch minus1 and gap minus2 We have fouralignment matches C minus C T minus T G minus G and A minus A and onemismatched alignment T minus A Hence

5 + 5 minus 1 + 5 + 5 19 (11)

which is the same as the optimal value from the score matrixtable )e alignment is thus optimal

414 Unequal Character Length (μ1 ne μ2)

Example 2 Suppose langαrang AGCTG and langβrang TCAG then tofill in the score matrix values of the cell boxes the prior-stated recursive formulation is used )is results in Figures 3and 4

Table 1 Identifying cell boxes of CTTGA and CTAGA

C T T G A0 minus2 minus4 minus6 minus8 minus10

C minus2 i ii iii iv vT minus4 vi vii viii ix xA minus6 xi xii xiii xiv xvG minus8 xvi xvii xviii xix xxA minus10 xxi xxii xxiii xxiv xxv

4 International Journal of Mathematics and Mathematical Sciences

415 Pattern Results for Unequal Character Length

(1) )e filled-in cell boxes for column 1 ie(i vi xi and xvi) cell boxes have consistent valueswith that of row 1 ie (i ii iii and iv) up untilwhere they terminate at (v) and at the inconsistent(iii)

(2) Except for where an inconsistency is noted at (v) and(iii) the pivot values for each cell box of row 1 andcolumn 1 remain the same

(3) For each 3 pointed arrows intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box

GTC T A

C

T

A

A

G

0

9

6

ndash2 ndash4

3 1

3

ndash3

ndash4

ndash6

ndash6 ndash8 ndash10

ndash15

ndash1

8 11

ndash8

ndash10

ndash2

10 4

1 7

8

1476 12

ndash3 1254 19

Figure 2 Traceback in the score matrix of CTTGA and CTAGA

Table 2 Filled-in table of CTTGA and CTAGAi DV 5 LV minus4 RV minus4 Pivot 5ii DV minus3 LV 3 RV minus6 Pivot 3iii DV minus5 LV 1 RV minus8 Pivot 1iv DV minus7 LV minus1 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV 3 Pivot 3vii DV 10 LV 1 RV 1 Pivot 10viii DV 8 LV 8 RV minus1 Pivot 8ix DV 0 LV 6 RV minus3 Pivot 6x DV minus2 LV 4 RV minus5 Pivot 4xi DV minus5 LV minus8 RV 1 Pivot 1xii DV 2 LV minus1 RV 8 Pivot 8xiii DV 9 LV 6 RV 6 Pivot 9xiv DV 7 LV 7 RV 4 Pivot 7xv DV 11 LV 5 RV 2 Pivot 11xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 0 LV minus3 RV 6 Pivot 6xviii DV 7 LV 4 RV 7 Pivot 7xix DV 14 LV 5 RV 5 Pivot 14xx DV 6 LV 12 RV 9 Pivot 12xxi DV minus9 LV minus12 RV minus3 Pivot minus3xxii DV minus2 LV minus5 RV 4 Pivot 4xxiii DV 5 LV 2 RV 5 Pivot 5xxiv DV 6 LV 3 RV 12 Pivot 12xxv DV 19 LV 10 RV 10 Pivot 19

GTC T A

C

T

A

A

G

0 ndash2 ndash4

ndash4

ndash6

ndash6 ndash8 ndash10

ndash8

ndash10

ndash25 ndash4 ndash3 ndash6 ndash5 ndash8 ndash7 ndash10 ndash12ndash9

ndash4 ndash3ndash113

3ndash3 10 1 8 ndash1 0 ndash3 ndash2 ndash54

4

6

6

8

98

1

1 2

ndash6

ndash5

ndash8 ndash1ndash1ndash7

ndash10 ndash3ndash3

66 6

7

77

7

5

5

11

912

12 1954

ndash9

ndash12 ndash5ndash2 4 5 5

2 3

6

1010

140

2

Figure 1 Score matrix of the cell boxes of CTTGA and CTAGA

T

C

A

G

A G C T G

0

ndash2

ndash2

ndash4ndash4

ndash4 ndash4ndash4

ndash4

ndash4

ndash4

ndash6

ndash6

ndash6

ndash6

ndash8

ndash8

ndash10

ndash8

ndash8

ndash10

ndash10 ndash9ndash1 ndash12ndash1 ndash3

ndash3ndash3

ndash5

ndash5ndash5

ndash5

ndash5

ndash2 ndash22

2

ndash2

ndash2

ndash2

ndash2ndash2

ndash5

ndash7

ndash7

ndash7 ndash1

ndash1 ndash1

ndash1 ndash1

ndash1

ndash6

6 6

ndash3

ndash3

ndash3

ndash3

ndash3ndash3 ndash3

ndash3

0

0

04

11

Figure 3 Score matrix of AGCTG and TCAG

T

T

C

C

A

A

G

G G

0 ndash10

ndash8

ndash8

ndash6

ndash6

ndash4

ndash4

ndash2

ndash2

i ii iii iv v

vi vii vii ix x

x xii xiii xiv xv

xvi xvii xviii xix xx

Figure 4 Labelling of the cell boxes of AGCTG and TCAG

International Journal of Mathematics and Mathematical Sciences 5

(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value

416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as

AGCTG

T minus CAG(12)

We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence

minus1 minus 2 + 5 minus 1 + 5 6 (13)

which is the same as the optimal value from the score matrixtable )e alignment is thus optimal

42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences

Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column

Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix

Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix

Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix

Definition 7 Define

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

(14)

foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +

σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)

Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)

Definition 8 Define

σ(α(i) β(j)) +5 for α(i) β(j) charactermatch

minus1 for α(i) β(j) charactermismatch1113896

(15)foralli 1 μ1 and forallj 1 μ2

Proposition 2 Filled-in cell boxes for Pm(1 j)forallj

1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli

1 μ1 for equal length comparison of sequences

Proof By Definition 7 we have

Pm(1 j) max

PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))

PR(1 j) Pm(0 j) + gap

PL(1 j) Pm(1 j minus 1) + gap

⎧⎪⎪⎨

⎪⎪⎩

Pm(i 1) max

PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))

PR(i 1) Pm(i minus 1 1) + gap

PL(i 1) Pm(i 0) + gap

⎧⎪⎪⎨

⎪⎪⎩

(16)

TCA G G

T

C

A

G

0

0

0

ndash2 ndash4

ndash3 ndash5

ndash3

ndash3

ndash4

ndash6

ndash6 ndash8 ndash10

ndash1ndash1

ndash1

ndash1 ndash1

ndash8

ndash2

ndash2 ndash2

1 1

2

246 6

Figure 5 Traceback in the score matrix of AGCTG and TCAG

Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6

6 International Journal of Mathematics and Mathematical Sciences

foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6

Pm(i 0) Pm(i minus 1 0) + gap

⟹PL(i 1) Pm(i 0) + gap

Pm(i minus 1 0) + gap( 1113857 + gap

(17)

also

Pm(0 j) Pm(0 j minus 1) + gap

⟹PR(1 j) Pm(0 j) + gap

Pm(0 j minus 1) + gap( 1113857 + gap

(18)

Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4Pm(i 0) Pm(0 j)

(19)

Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof

Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)

Pm(i minus 1 1) + gap and by induction let i 1 j then

PL(1 j) Pm(1 0) + gap

P(0 0) + gap + gap

P(0 0) + 2 gaps

PR(i 1) Pm(0 1) + gap

P(0 0) + gap + gap

P(0 0) + 2gaps

(20)

Assume Pm(k 0) Pm(0 k) then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4PR(i 1) PL(1 j) where i j

(21)

which completes the second part of our proofRecall

PL(i 1) Pm(i minus 1 0) + 2 gaps

PR(1 j) Pm(0 j minus 1) + 2 gaps

PL(i 1) PR(1 j)was established

(22)

It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)

Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))

For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)

and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us

σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)

Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)

Pm(0 j minus 1)

Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa

Proof From Proposition 2 and Definitions 5ndash7 it is clearthat

PL(i 1) PR(1 j)

PR(i 1) PL(1 j)

where i j

(24)

Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences

Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2

Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1

(25)

and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved

Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7

International Journal of Mathematics and Mathematical Sciences 7

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j) max

PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))

PR(i + 1 j) Pm(i j) + gap penalty

PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i j + 1) max

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))

PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j + 1) max

PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))

PR(i + 1 j + 1) Pm(i j + 1) + gap penalty

PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

PR(i + 1 j) Pm(i j) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

(26)

Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write

PR(i j) Pm(i minus 1 j) + gap penalty

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)

Weare left to show that gap penalty lt σ(α(i) β(j+ 1))

Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match

For any k count forallk 1 μ the linear gap penalty gk

decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us

gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score

match diagonal score

foralli j 1 μ

(28)

Hence PR(i j)ltPD(i j + 1)foralli j 1 μ

43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences

Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1

j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof

Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof

8 International Journal of Mathematics and Mathematical Sciences

5 Conclusion

In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs

Data Availability

No data were used to support this study

Conflicts of Interest

)e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005

[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003

[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007

[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005

[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970

[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008

[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005

[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981

[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004

[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002

[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957

[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004

[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018

[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication

company using prometheerdquo Applied Informatics vol 5 no 12018

[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019

[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010

[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003

[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012

[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017

[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009

[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986

International Journal of Mathematics and Mathematical Sciences 9

Page 2: FormulationandAnalysisofPatternsinaScoreMatrixforGlobal ...downloads.hindawi.com/journals/ijmms/2020/3858057.pdf · wise sequence alignment or multiple sequence alignment when any

missing in the available stream of literature although it canbe likened to the edit graph illustration seen in [1] To thebest of our knowledge there has never been any theoreticalexposition of this kind for the most basic and very funda-mental concept of constant linear gap penalties whichconstitute an implicit part of affine gap penalties )us wefocus on finding patterns in global sequence alignment usingconstant linear gap penalties and attempt the display ofcompleted score matrices of the NeedlemanndashWunsch al-gorithm with distinct positional arrays that can be inspectedWe give confirmation of this by basic proofs and suggesthow predictions can be made To offer the reader the muchneeded convenience concepts that are deemed useful arerecalled in brief with corresponding references of compre-hensive works that give more in-depth information

2 NeedlemanndashWunsch Algorithm

Let the recursive formulation for the NeedlemanndashWunschalgorithm be

(1) Initialization

Let P(0 0) 0Pm(i 0) Pm(i minus 1 0) + gap penalty foralli 1 μ1andPm(0 j) Pm(0 j minus1) + gap penalty forallj 1 μ2where P(0 0) is the initialization pivot for the scorematrix Pm(i 0) is the initial row pivot for the scorematrix and Pm(0 j) is the initial column pivot forthe score matrix

(2) Cell Box Calculation

Let

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

(1)

where Pm(i j) is the pivot for each cell box calculationPD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j)) is the diagonalvalue of a cell box PR(i j) Pm(i minus 1 j) + gap penalty is theright value of a cell box PL(i j) Pm(i j minus 1) + gap penaltyis the left value of a cell box and σ(α(i) β(j)) is the score foraligning the sequence characters of α(i) and β(j)

Specifically the procedure for the NeedlemanndashWunschalgorithm follows the dynamic programming approach ofthe score matrix traceback and alignment as outlined in thesubsequent sections

21 Score Matrix )e score matrix is a tabular box con-structed to keep count of score results )e score matrix forthe NeedlemanndashWunsch algorithm begins with an initiali-zation process and ends with the calculation of cell boxes

211 Initialization A sequence matrix is created withμ1 + 1 columns and μ2 + 1 rows in order for the initialmatrix gap to be aligned where μ1 and μ2 are the lengths

of arbitrary sequences langαrang and langβrang respectively )eletters of the sequence α fill in the horizontal axis andsimilarly the characters of langβrang fill in the vertical axis ofthe sequence matrix created Before the scoring beginsfrom the upper left corner of the initialized matrix to thelower right corner of the matrix the value P(0 0) 0 isassigned to the intersection of the first row and the firstcolumn of the matrix (ie the initial gap) )e reason forthe gap penalty for an alignment is because of thepossibility of mutation which may insert or delete astring character from one of the sequences Arrows thatpoint in the direction of positional movement (diagonaland left or right) are placed in each cell box of the matrixand only terminate when all the cell boxes are completelyfilled

212 Calculation of Cell Boxes in the Score Matrix

(1) Fill the initialized gap values first on both the hor-izontal axis and the vertical axis with a definedconstant gap value score

(2) Follow with the calculation of each cell box havingthe three position-dependent arrays (leftbesiderightbottom or diagonal)

(3) Allow only a matchmismatch value for a ldquodiagonalrdquoposition and allow the ldquobottomrdquo or ldquobesiderdquo posi-tions to take linear gap values only

(4) For each computed cell box values find the maxi-mum score and let that be the pivot

(5) )e pivot of a computed cell box directly affects thenext cell boxes in the row or the column

22 Traceback A simpler score matrix table that containsonly the pivot of each cell box calculation is constructedfrom the original position-dependent arrays of the scorematrix table Arrow pointers are used to direct a path fromthe highest score or an optimal value in the matrix (whichactually occurs at the lower right end corner of the matrix)and traced back to the next biggest value of the predecessorsuntil we reach the intersection of the first row and thecolumn with the initial gap

23 Alignment

231 Alignment Generation from a Traceback To write thesequence characters that appear from the optimal alignmentpath of the traceback stage the following steps are followed

(1) When the arrow is diagonal write both characters inthe alignment

(2) When the arrow is vertical write the correspondinghorizontal character and in place of the verticalcharacter leave a gap

(3) When the arrow is horizontal write the corre-sponding vertical character and in place of thehorizontal character leave a gap

2 International Journal of Mathematics and Mathematical Sciences

)us for both the vertical and horizontal arrow posi-tions one character and a gap are written for the alignmentwhere the gap explicitly replaces a character position in thealignment An alignment can only be inferred as the best ifthe optimal value from the score matrix table corresponds tothe alignment score calculation (based on the scoringscheme defined)

232 0e Problem of Aligning Any Two Sequences )eproblem of aligning any two sequences can be simplified asdiscovering the optimal means of aligning any two arbitrarysequences say langαrang and langβrang such that the character ldquominusrdquo notedas a gap is filled into both langαrang and langβrang or either of themwhere

(1) Any single character in langαrang matches a single char-acter in langβrang or a gap

(2) )e final sum of scores from the scoring functionover the aligned pairs and the gap penalties as givenby the function of gap penalty is maximized

233 Letter Choice for Arbitrary Sequences )e letterchoice of this study shall be that of DNA considered as astring of four characters of adeninemdashA guaninemdashGcytosinemdashC and thyminemdashT

3 Scheme for Scoring

Definition 1 )e constant gap penalty v(k) g times k whereg is an assigned constant and k is the sequence length countof i 1 μ1 for the score matrix row and j 1 μ2 forthe score matrix column)is is the penalty awarded to gapsand is also known as the linear gap function

Definition 2 )e affine gap penalty is the penalty awarded togaps where a greatest consecutive sequence of k gaps is givenas v(k) q + gk where q is the penalty charged for openingthe gap and g is the penalty charged for extending it

)us more formally we define

(1) A constant linear gap as

v(k) gk (2)

(2) An affine gap as

v(k) q + gk kge 1

0 k 01113896 (3)

Altschulrsquos theory assign positive scores to an identityand conserved replacements and assign negative scores toless likely replacements

Remark 1 Needleman and Wunsch use the ldquoidentity ma-trixrdquo in scoring with 1 for a match and 0 for a mismatchNeedlemanndashWunschrsquos score was criticised for not reflecting

observations from the nature because purine-purine orpyrimidine-pyrimidine is less prone to be mutated incomparison with mutations of purine-pyrimidine Becausethere is no definite defined score for DNA alignment thechoice of scoring for this study is +5 for a match minus1 for amismatch and minus2 for a gap as proposed in [18] following theabove theory

More formally let

σ(α(i) β(j)) +5 for α(i) β(j) charactermatch

minus1 for α(i) β(j) charactermismatch1113896

(4)

foralli 1 μ1 andforallj 1 μ2Also let the linear gap penalty v(k) gk where g is a

constant of ldquominus2rdquo and k is the sequence length count of i

1 μ1 for the score matrix row and j 1 μ2 for thescore matrix column

4 Results and Discussion

41 Pattern Investigations in the Score Matrix of Needle-manndashWunschAlgorithm )e sequences α and β that will beused are arbitrary sample sequences that were chosen basedon consideration of the following

(1) μ1 μ2 (equal character length of sequences)(2) μ1 ne μ2 (unequal character length of sequences)

411 Equal Character Length (μ1 μ2 μ)

Example 1 Let langαrang CTTGA and langβrang CTAGA Becausethe length of the sequence characters of langαrang and langβrang isμ1 μ2 5 respectively we align α and β in a score matrixfollowing the above formulation

(1) Initialization P(0 0) 0 for score matrix initiali-zation and then for the initial row pivot Pm(i 0)

Pm(i minus 1 0) + gap penalty where forallk i we find thegap penalty as follows

for i 1 Pm(1 0) P(0 0) + gap penalty

⟹ 0 +(minus2) minus2

for i 2 Pm(2 0) Pm(1 0) + gap penalty

⟹ minus 2 +(minus2) minus4

for i 3 Pm(3 0) Pm(2 0) + gap penalty

⟹ minus 4 +(minus2) minus6

for i 4 Pm(4 0) Pm(3 0) + gap penalty

⟹ minus 6 +(minus2) minus8

for i 5 Pm(5 0) Pm(4 0) + gap penalty

⟹ minus 8 +(minus2) minus10

(5)

and for the initial column pivot Pm(0 j) Pm(0 j minus 1) +gap penalty forallk j the gap penalty is obtained as follows

International Journal of Mathematics and Mathematical Sciences 3

for j 1 Pm(0 1) P(0 0) + gap penalty

⟹ 0 +(minus2) minus2

for j 2 Pm(0 2) Pm(0 1) + gap penalty

⟹ minus 2 +(minus2) minus4

for j 3 Pm(0 3) Pm(0 2) + gap penalty

⟹ minus 4 +(minus2) minus6

for j 4 Pm(0 4) Pm(0 3) + gap penalty

⟹ minus 6 +(minus2) minus8

for j 5 Pm(0 5) Pm(0 4) + gap penalty

⟹ minus 8 +(minus2) minus10

(6)

Before following with further calculation some iden-tification is placed on each cell box for easy noting asshown in Table 1

(2) Cell box calculations

Pm(1 1) max

PD(1 1) P(0 0) + σ(C C) 0 + 5 5

PR(1 1) Pm(0 1) + gap minus2 +(minus2) minus4

PL(1 1) Pm(1 0) + gap minus2 +(minus2) minus4

⎧⎪⎪⎨

⎪⎪⎩

(7)

Remark 2 σ(C C) in PD(1 1) was ldquo5rdquo because that is theassigned score for matched characters Pm(1 1) maxPD(1 1) PR(1 1) PL(1 1)1113864 1113865 max 5 minus4 minus41113858 1113859 5

Pm(1 2) max

PD(1 2) Pm(0 1) + σ(C T) minus2 +(minus1) minus3

PR(1 2) Pm(0 2) + gap minus4 +(minus2) minus6

PL(1 2) Pm(1 1) + gap 5 +(minus2) 3

⎧⎪⎪⎨

⎪⎪⎩

(8)

Remark 3 σ(C T) in PD(1 2) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(12)

max PD1113864 (12)PR(12)PL(12) max minus3 minus6 31113858 1113859 3

Pm(1 3) max

PD(1 3) Pm(0 2) + σ(C A) minus4 +(minus1) minus5

PR(1 3) Pm(0 3) + gap minus6 +(minus2) minus8

PL(1 3) Pm(1 2) + gap 3 +(minus2) 1

⎧⎪⎪⎨

⎪⎪⎩

(9)

Remark 4 σ(C A) in PD(1 3) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(1 3)

max PD1113864 (1 3) PR(1 3) PL(1 3) max minus5 minus8 11113858 1113859 1Consequently the recursion follows similarly until all

the cell boxes are filled In the event of a pivot tie in a cell boxone tie value is picked For the tabular display these no-tations are used interchangeably PD(i j) is the same as DVPL(i j) is the same as LV PR(i j) is the same as RV andPm(i j) is the same as the pivot

412 Pattern Results for Equal Character Length Withreference to Tables 1 and 2 and Figure 1

(1) )e filled-in cell boxes for column 1 ie cell boxes(i vi xi xvi and xxi) have values coinciding withthat of row 1 ie cell boxes (i ii iii iv and v)

(2) )e leading value (pivot) of each cell box in both row1 and column 1 remains the same

(3) )e bottom value in column 1 switches to becomethe beside value in row 1 and vice versa

(4) For each 3 pointed arrow intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box

(5) For each cell box the value of the preceding bottomvalue is less than the value of the immediate nextadjacent diagonal value

413 Traceback and Alignment for Equal Character Length)e traceback and alignment stages for equal characterlength of sequences are respectively shown in this section

Figure 2 shows the pivot of each cell box calculation ofthe score matrix table of CTTGA and CTAGA in Figure 1)us Figure 2 is a simpler score matrix table constructedfrom the original score matrix table of CTTGA and CTAGAin Figure 1 )e arrow pointers direct the path from theoptimal value and traceback to the initialization value ofzero Based on the traceback and the diagonal direction ofthe arrow pointers the alignment is written as

CTTGA

CTAGA(10)

To confirm the correctness of the alignment done wecheck using calculations Recall the scoring schemematch+5 mismatch minus1 and gap minus2 We have fouralignment matches C minus C T minus T G minus G and A minus A and onemismatched alignment T minus A Hence

5 + 5 minus 1 + 5 + 5 19 (11)

which is the same as the optimal value from the score matrixtable )e alignment is thus optimal

414 Unequal Character Length (μ1 ne μ2)

Example 2 Suppose langαrang AGCTG and langβrang TCAG then tofill in the score matrix values of the cell boxes the prior-stated recursive formulation is used )is results in Figures 3and 4

Table 1 Identifying cell boxes of CTTGA and CTAGA

C T T G A0 minus2 minus4 minus6 minus8 minus10

C minus2 i ii iii iv vT minus4 vi vii viii ix xA minus6 xi xii xiii xiv xvG minus8 xvi xvii xviii xix xxA minus10 xxi xxii xxiii xxiv xxv

4 International Journal of Mathematics and Mathematical Sciences

415 Pattern Results for Unequal Character Length

(1) )e filled-in cell boxes for column 1 ie(i vi xi and xvi) cell boxes have consistent valueswith that of row 1 ie (i ii iii and iv) up untilwhere they terminate at (v) and at the inconsistent(iii)

(2) Except for where an inconsistency is noted at (v) and(iii) the pivot values for each cell box of row 1 andcolumn 1 remain the same

(3) For each 3 pointed arrows intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box

GTC T A

C

T

A

A

G

0

9

6

ndash2 ndash4

3 1

3

ndash3

ndash4

ndash6

ndash6 ndash8 ndash10

ndash15

ndash1

8 11

ndash8

ndash10

ndash2

10 4

1 7

8

1476 12

ndash3 1254 19

Figure 2 Traceback in the score matrix of CTTGA and CTAGA

Table 2 Filled-in table of CTTGA and CTAGAi DV 5 LV minus4 RV minus4 Pivot 5ii DV minus3 LV 3 RV minus6 Pivot 3iii DV minus5 LV 1 RV minus8 Pivot 1iv DV minus7 LV minus1 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV 3 Pivot 3vii DV 10 LV 1 RV 1 Pivot 10viii DV 8 LV 8 RV minus1 Pivot 8ix DV 0 LV 6 RV minus3 Pivot 6x DV minus2 LV 4 RV minus5 Pivot 4xi DV minus5 LV minus8 RV 1 Pivot 1xii DV 2 LV minus1 RV 8 Pivot 8xiii DV 9 LV 6 RV 6 Pivot 9xiv DV 7 LV 7 RV 4 Pivot 7xv DV 11 LV 5 RV 2 Pivot 11xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 0 LV minus3 RV 6 Pivot 6xviii DV 7 LV 4 RV 7 Pivot 7xix DV 14 LV 5 RV 5 Pivot 14xx DV 6 LV 12 RV 9 Pivot 12xxi DV minus9 LV minus12 RV minus3 Pivot minus3xxii DV minus2 LV minus5 RV 4 Pivot 4xxiii DV 5 LV 2 RV 5 Pivot 5xxiv DV 6 LV 3 RV 12 Pivot 12xxv DV 19 LV 10 RV 10 Pivot 19

GTC T A

C

T

A

A

G

0 ndash2 ndash4

ndash4

ndash6

ndash6 ndash8 ndash10

ndash8

ndash10

ndash25 ndash4 ndash3 ndash6 ndash5 ndash8 ndash7 ndash10 ndash12ndash9

ndash4 ndash3ndash113

3ndash3 10 1 8 ndash1 0 ndash3 ndash2 ndash54

4

6

6

8

98

1

1 2

ndash6

ndash5

ndash8 ndash1ndash1ndash7

ndash10 ndash3ndash3

66 6

7

77

7

5

5

11

912

12 1954

ndash9

ndash12 ndash5ndash2 4 5 5

2 3

6

1010

140

2

Figure 1 Score matrix of the cell boxes of CTTGA and CTAGA

T

C

A

G

A G C T G

0

ndash2

ndash2

ndash4ndash4

ndash4 ndash4ndash4

ndash4

ndash4

ndash4

ndash6

ndash6

ndash6

ndash6

ndash8

ndash8

ndash10

ndash8

ndash8

ndash10

ndash10 ndash9ndash1 ndash12ndash1 ndash3

ndash3ndash3

ndash5

ndash5ndash5

ndash5

ndash5

ndash2 ndash22

2

ndash2

ndash2

ndash2

ndash2ndash2

ndash5

ndash7

ndash7

ndash7 ndash1

ndash1 ndash1

ndash1 ndash1

ndash1

ndash6

6 6

ndash3

ndash3

ndash3

ndash3

ndash3ndash3 ndash3

ndash3

0

0

04

11

Figure 3 Score matrix of AGCTG and TCAG

T

T

C

C

A

A

G

G G

0 ndash10

ndash8

ndash8

ndash6

ndash6

ndash4

ndash4

ndash2

ndash2

i ii iii iv v

vi vii vii ix x

x xii xiii xiv xv

xvi xvii xviii xix xx

Figure 4 Labelling of the cell boxes of AGCTG and TCAG

International Journal of Mathematics and Mathematical Sciences 5

(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value

416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as

AGCTG

T minus CAG(12)

We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence

minus1 minus 2 + 5 minus 1 + 5 6 (13)

which is the same as the optimal value from the score matrixtable )e alignment is thus optimal

42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences

Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column

Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix

Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix

Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix

Definition 7 Define

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

(14)

foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +

σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)

Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)

Definition 8 Define

σ(α(i) β(j)) +5 for α(i) β(j) charactermatch

minus1 for α(i) β(j) charactermismatch1113896

(15)foralli 1 μ1 and forallj 1 μ2

Proposition 2 Filled-in cell boxes for Pm(1 j)forallj

1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli

1 μ1 for equal length comparison of sequences

Proof By Definition 7 we have

Pm(1 j) max

PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))

PR(1 j) Pm(0 j) + gap

PL(1 j) Pm(1 j minus 1) + gap

⎧⎪⎪⎨

⎪⎪⎩

Pm(i 1) max

PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))

PR(i 1) Pm(i minus 1 1) + gap

PL(i 1) Pm(i 0) + gap

⎧⎪⎪⎨

⎪⎪⎩

(16)

TCA G G

T

C

A

G

0

0

0

ndash2 ndash4

ndash3 ndash5

ndash3

ndash3

ndash4

ndash6

ndash6 ndash8 ndash10

ndash1ndash1

ndash1

ndash1 ndash1

ndash8

ndash2

ndash2 ndash2

1 1

2

246 6

Figure 5 Traceback in the score matrix of AGCTG and TCAG

Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6

6 International Journal of Mathematics and Mathematical Sciences

foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6

Pm(i 0) Pm(i minus 1 0) + gap

⟹PL(i 1) Pm(i 0) + gap

Pm(i minus 1 0) + gap( 1113857 + gap

(17)

also

Pm(0 j) Pm(0 j minus 1) + gap

⟹PR(1 j) Pm(0 j) + gap

Pm(0 j minus 1) + gap( 1113857 + gap

(18)

Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4Pm(i 0) Pm(0 j)

(19)

Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof

Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)

Pm(i minus 1 1) + gap and by induction let i 1 j then

PL(1 j) Pm(1 0) + gap

P(0 0) + gap + gap

P(0 0) + 2 gaps

PR(i 1) Pm(0 1) + gap

P(0 0) + gap + gap

P(0 0) + 2gaps

(20)

Assume Pm(k 0) Pm(0 k) then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4PR(i 1) PL(1 j) where i j

(21)

which completes the second part of our proofRecall

PL(i 1) Pm(i minus 1 0) + 2 gaps

PR(1 j) Pm(0 j minus 1) + 2 gaps

PL(i 1) PR(1 j)was established

(22)

It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)

Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))

For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)

and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us

σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)

Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)

Pm(0 j minus 1)

Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa

Proof From Proposition 2 and Definitions 5ndash7 it is clearthat

PL(i 1) PR(1 j)

PR(i 1) PL(1 j)

where i j

(24)

Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences

Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2

Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1

(25)

and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved

Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7

International Journal of Mathematics and Mathematical Sciences 7

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j) max

PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))

PR(i + 1 j) Pm(i j) + gap penalty

PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i j + 1) max

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))

PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j + 1) max

PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))

PR(i + 1 j + 1) Pm(i j + 1) + gap penalty

PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

PR(i + 1 j) Pm(i j) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

(26)

Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write

PR(i j) Pm(i minus 1 j) + gap penalty

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)

Weare left to show that gap penalty lt σ(α(i) β(j+ 1))

Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match

For any k count forallk 1 μ the linear gap penalty gk

decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us

gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score

match diagonal score

foralli j 1 μ

(28)

Hence PR(i j)ltPD(i j + 1)foralli j 1 μ

43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences

Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1

j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof

Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof

8 International Journal of Mathematics and Mathematical Sciences

5 Conclusion

In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs

Data Availability

No data were used to support this study

Conflicts of Interest

)e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005

[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003

[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007

[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005

[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970

[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008

[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005

[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981

[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004

[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002

[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957

[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004

[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018

[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication

company using prometheerdquo Applied Informatics vol 5 no 12018

[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019

[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010

[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003

[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012

[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017

[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009

[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986

International Journal of Mathematics and Mathematical Sciences 9

Page 3: FormulationandAnalysisofPatternsinaScoreMatrixforGlobal ...downloads.hindawi.com/journals/ijmms/2020/3858057.pdf · wise sequence alignment or multiple sequence alignment when any

)us for both the vertical and horizontal arrow posi-tions one character and a gap are written for the alignmentwhere the gap explicitly replaces a character position in thealignment An alignment can only be inferred as the best ifthe optimal value from the score matrix table corresponds tothe alignment score calculation (based on the scoringscheme defined)

232 0e Problem of Aligning Any Two Sequences )eproblem of aligning any two sequences can be simplified asdiscovering the optimal means of aligning any two arbitrarysequences say langαrang and langβrang such that the character ldquominusrdquo notedas a gap is filled into both langαrang and langβrang or either of themwhere

(1) Any single character in langαrang matches a single char-acter in langβrang or a gap

(2) )e final sum of scores from the scoring functionover the aligned pairs and the gap penalties as givenby the function of gap penalty is maximized

233 Letter Choice for Arbitrary Sequences )e letterchoice of this study shall be that of DNA considered as astring of four characters of adeninemdashA guaninemdashGcytosinemdashC and thyminemdashT

3 Scheme for Scoring

Definition 1 )e constant gap penalty v(k) g times k whereg is an assigned constant and k is the sequence length countof i 1 μ1 for the score matrix row and j 1 μ2 forthe score matrix column)is is the penalty awarded to gapsand is also known as the linear gap function

Definition 2 )e affine gap penalty is the penalty awarded togaps where a greatest consecutive sequence of k gaps is givenas v(k) q + gk where q is the penalty charged for openingthe gap and g is the penalty charged for extending it

)us more formally we define

(1) A constant linear gap as

v(k) gk (2)

(2) An affine gap as

v(k) q + gk kge 1

0 k 01113896 (3)

Altschulrsquos theory assign positive scores to an identityand conserved replacements and assign negative scores toless likely replacements

Remark 1 Needleman and Wunsch use the ldquoidentity ma-trixrdquo in scoring with 1 for a match and 0 for a mismatchNeedlemanndashWunschrsquos score was criticised for not reflecting

observations from the nature because purine-purine orpyrimidine-pyrimidine is less prone to be mutated incomparison with mutations of purine-pyrimidine Becausethere is no definite defined score for DNA alignment thechoice of scoring for this study is +5 for a match minus1 for amismatch and minus2 for a gap as proposed in [18] following theabove theory

More formally let

σ(α(i) β(j)) +5 for α(i) β(j) charactermatch

minus1 for α(i) β(j) charactermismatch1113896

(4)

foralli 1 μ1 andforallj 1 μ2Also let the linear gap penalty v(k) gk where g is a

constant of ldquominus2rdquo and k is the sequence length count of i

1 μ1 for the score matrix row and j 1 μ2 for thescore matrix column

4 Results and Discussion

41 Pattern Investigations in the Score Matrix of Needle-manndashWunschAlgorithm )e sequences α and β that will beused are arbitrary sample sequences that were chosen basedon consideration of the following

(1) μ1 μ2 (equal character length of sequences)(2) μ1 ne μ2 (unequal character length of sequences)

411 Equal Character Length (μ1 μ2 μ)

Example 1 Let langαrang CTTGA and langβrang CTAGA Becausethe length of the sequence characters of langαrang and langβrang isμ1 μ2 5 respectively we align α and β in a score matrixfollowing the above formulation

(1) Initialization P(0 0) 0 for score matrix initiali-zation and then for the initial row pivot Pm(i 0)

Pm(i minus 1 0) + gap penalty where forallk i we find thegap penalty as follows

for i 1 Pm(1 0) P(0 0) + gap penalty

⟹ 0 +(minus2) minus2

for i 2 Pm(2 0) Pm(1 0) + gap penalty

⟹ minus 2 +(minus2) minus4

for i 3 Pm(3 0) Pm(2 0) + gap penalty

⟹ minus 4 +(minus2) minus6

for i 4 Pm(4 0) Pm(3 0) + gap penalty

⟹ minus 6 +(minus2) minus8

for i 5 Pm(5 0) Pm(4 0) + gap penalty

⟹ minus 8 +(minus2) minus10

(5)

and for the initial column pivot Pm(0 j) Pm(0 j minus 1) +gap penalty forallk j the gap penalty is obtained as follows

International Journal of Mathematics and Mathematical Sciences 3

for j 1 Pm(0 1) P(0 0) + gap penalty

⟹ 0 +(minus2) minus2

for j 2 Pm(0 2) Pm(0 1) + gap penalty

⟹ minus 2 +(minus2) minus4

for j 3 Pm(0 3) Pm(0 2) + gap penalty

⟹ minus 4 +(minus2) minus6

for j 4 Pm(0 4) Pm(0 3) + gap penalty

⟹ minus 6 +(minus2) minus8

for j 5 Pm(0 5) Pm(0 4) + gap penalty

⟹ minus 8 +(minus2) minus10

(6)

Before following with further calculation some iden-tification is placed on each cell box for easy noting asshown in Table 1

(2) Cell box calculations

Pm(1 1) max

PD(1 1) P(0 0) + σ(C C) 0 + 5 5

PR(1 1) Pm(0 1) + gap minus2 +(minus2) minus4

PL(1 1) Pm(1 0) + gap minus2 +(minus2) minus4

⎧⎪⎪⎨

⎪⎪⎩

(7)

Remark 2 σ(C C) in PD(1 1) was ldquo5rdquo because that is theassigned score for matched characters Pm(1 1) maxPD(1 1) PR(1 1) PL(1 1)1113864 1113865 max 5 minus4 minus41113858 1113859 5

Pm(1 2) max

PD(1 2) Pm(0 1) + σ(C T) minus2 +(minus1) minus3

PR(1 2) Pm(0 2) + gap minus4 +(minus2) minus6

PL(1 2) Pm(1 1) + gap 5 +(minus2) 3

⎧⎪⎪⎨

⎪⎪⎩

(8)

Remark 3 σ(C T) in PD(1 2) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(12)

max PD1113864 (12)PR(12)PL(12) max minus3 minus6 31113858 1113859 3

Pm(1 3) max

PD(1 3) Pm(0 2) + σ(C A) minus4 +(minus1) minus5

PR(1 3) Pm(0 3) + gap minus6 +(minus2) minus8

PL(1 3) Pm(1 2) + gap 3 +(minus2) 1

⎧⎪⎪⎨

⎪⎪⎩

(9)

Remark 4 σ(C A) in PD(1 3) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(1 3)

max PD1113864 (1 3) PR(1 3) PL(1 3) max minus5 minus8 11113858 1113859 1Consequently the recursion follows similarly until all

the cell boxes are filled In the event of a pivot tie in a cell boxone tie value is picked For the tabular display these no-tations are used interchangeably PD(i j) is the same as DVPL(i j) is the same as LV PR(i j) is the same as RV andPm(i j) is the same as the pivot

412 Pattern Results for Equal Character Length Withreference to Tables 1 and 2 and Figure 1

(1) )e filled-in cell boxes for column 1 ie cell boxes(i vi xi xvi and xxi) have values coinciding withthat of row 1 ie cell boxes (i ii iii iv and v)

(2) )e leading value (pivot) of each cell box in both row1 and column 1 remains the same

(3) )e bottom value in column 1 switches to becomethe beside value in row 1 and vice versa

(4) For each 3 pointed arrow intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box

(5) For each cell box the value of the preceding bottomvalue is less than the value of the immediate nextadjacent diagonal value

413 Traceback and Alignment for Equal Character Length)e traceback and alignment stages for equal characterlength of sequences are respectively shown in this section

Figure 2 shows the pivot of each cell box calculation ofthe score matrix table of CTTGA and CTAGA in Figure 1)us Figure 2 is a simpler score matrix table constructedfrom the original score matrix table of CTTGA and CTAGAin Figure 1 )e arrow pointers direct the path from theoptimal value and traceback to the initialization value ofzero Based on the traceback and the diagonal direction ofthe arrow pointers the alignment is written as

CTTGA

CTAGA(10)

To confirm the correctness of the alignment done wecheck using calculations Recall the scoring schemematch+5 mismatch minus1 and gap minus2 We have fouralignment matches C minus C T minus T G minus G and A minus A and onemismatched alignment T minus A Hence

5 + 5 minus 1 + 5 + 5 19 (11)

which is the same as the optimal value from the score matrixtable )e alignment is thus optimal

414 Unequal Character Length (μ1 ne μ2)

Example 2 Suppose langαrang AGCTG and langβrang TCAG then tofill in the score matrix values of the cell boxes the prior-stated recursive formulation is used )is results in Figures 3and 4

Table 1 Identifying cell boxes of CTTGA and CTAGA

C T T G A0 minus2 minus4 minus6 minus8 minus10

C minus2 i ii iii iv vT minus4 vi vii viii ix xA minus6 xi xii xiii xiv xvG minus8 xvi xvii xviii xix xxA minus10 xxi xxii xxiii xxiv xxv

4 International Journal of Mathematics and Mathematical Sciences

415 Pattern Results for Unequal Character Length

(1) )e filled-in cell boxes for column 1 ie(i vi xi and xvi) cell boxes have consistent valueswith that of row 1 ie (i ii iii and iv) up untilwhere they terminate at (v) and at the inconsistent(iii)

(2) Except for where an inconsistency is noted at (v) and(iii) the pivot values for each cell box of row 1 andcolumn 1 remain the same

(3) For each 3 pointed arrows intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box

GTC T A

C

T

A

A

G

0

9

6

ndash2 ndash4

3 1

3

ndash3

ndash4

ndash6

ndash6 ndash8 ndash10

ndash15

ndash1

8 11

ndash8

ndash10

ndash2

10 4

1 7

8

1476 12

ndash3 1254 19

Figure 2 Traceback in the score matrix of CTTGA and CTAGA

Table 2 Filled-in table of CTTGA and CTAGAi DV 5 LV minus4 RV minus4 Pivot 5ii DV minus3 LV 3 RV minus6 Pivot 3iii DV minus5 LV 1 RV minus8 Pivot 1iv DV minus7 LV minus1 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV 3 Pivot 3vii DV 10 LV 1 RV 1 Pivot 10viii DV 8 LV 8 RV minus1 Pivot 8ix DV 0 LV 6 RV minus3 Pivot 6x DV minus2 LV 4 RV minus5 Pivot 4xi DV minus5 LV minus8 RV 1 Pivot 1xii DV 2 LV minus1 RV 8 Pivot 8xiii DV 9 LV 6 RV 6 Pivot 9xiv DV 7 LV 7 RV 4 Pivot 7xv DV 11 LV 5 RV 2 Pivot 11xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 0 LV minus3 RV 6 Pivot 6xviii DV 7 LV 4 RV 7 Pivot 7xix DV 14 LV 5 RV 5 Pivot 14xx DV 6 LV 12 RV 9 Pivot 12xxi DV minus9 LV minus12 RV minus3 Pivot minus3xxii DV minus2 LV minus5 RV 4 Pivot 4xxiii DV 5 LV 2 RV 5 Pivot 5xxiv DV 6 LV 3 RV 12 Pivot 12xxv DV 19 LV 10 RV 10 Pivot 19

GTC T A

C

T

A

A

G

0 ndash2 ndash4

ndash4

ndash6

ndash6 ndash8 ndash10

ndash8

ndash10

ndash25 ndash4 ndash3 ndash6 ndash5 ndash8 ndash7 ndash10 ndash12ndash9

ndash4 ndash3ndash113

3ndash3 10 1 8 ndash1 0 ndash3 ndash2 ndash54

4

6

6

8

98

1

1 2

ndash6

ndash5

ndash8 ndash1ndash1ndash7

ndash10 ndash3ndash3

66 6

7

77

7

5

5

11

912

12 1954

ndash9

ndash12 ndash5ndash2 4 5 5

2 3

6

1010

140

2

Figure 1 Score matrix of the cell boxes of CTTGA and CTAGA

T

C

A

G

A G C T G

0

ndash2

ndash2

ndash4ndash4

ndash4 ndash4ndash4

ndash4

ndash4

ndash4

ndash6

ndash6

ndash6

ndash6

ndash8

ndash8

ndash10

ndash8

ndash8

ndash10

ndash10 ndash9ndash1 ndash12ndash1 ndash3

ndash3ndash3

ndash5

ndash5ndash5

ndash5

ndash5

ndash2 ndash22

2

ndash2

ndash2

ndash2

ndash2ndash2

ndash5

ndash7

ndash7

ndash7 ndash1

ndash1 ndash1

ndash1 ndash1

ndash1

ndash6

6 6

ndash3

ndash3

ndash3

ndash3

ndash3ndash3 ndash3

ndash3

0

0

04

11

Figure 3 Score matrix of AGCTG and TCAG

T

T

C

C

A

A

G

G G

0 ndash10

ndash8

ndash8

ndash6

ndash6

ndash4

ndash4

ndash2

ndash2

i ii iii iv v

vi vii vii ix x

x xii xiii xiv xv

xvi xvii xviii xix xx

Figure 4 Labelling of the cell boxes of AGCTG and TCAG

International Journal of Mathematics and Mathematical Sciences 5

(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value

416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as

AGCTG

T minus CAG(12)

We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence

minus1 minus 2 + 5 minus 1 + 5 6 (13)

which is the same as the optimal value from the score matrixtable )e alignment is thus optimal

42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences

Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column

Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix

Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix

Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix

Definition 7 Define

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

(14)

foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +

σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)

Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)

Definition 8 Define

σ(α(i) β(j)) +5 for α(i) β(j) charactermatch

minus1 for α(i) β(j) charactermismatch1113896

(15)foralli 1 μ1 and forallj 1 μ2

Proposition 2 Filled-in cell boxes for Pm(1 j)forallj

1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli

1 μ1 for equal length comparison of sequences

Proof By Definition 7 we have

Pm(1 j) max

PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))

PR(1 j) Pm(0 j) + gap

PL(1 j) Pm(1 j minus 1) + gap

⎧⎪⎪⎨

⎪⎪⎩

Pm(i 1) max

PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))

PR(i 1) Pm(i minus 1 1) + gap

PL(i 1) Pm(i 0) + gap

⎧⎪⎪⎨

⎪⎪⎩

(16)

TCA G G

T

C

A

G

0

0

0

ndash2 ndash4

ndash3 ndash5

ndash3

ndash3

ndash4

ndash6

ndash6 ndash8 ndash10

ndash1ndash1

ndash1

ndash1 ndash1

ndash8

ndash2

ndash2 ndash2

1 1

2

246 6

Figure 5 Traceback in the score matrix of AGCTG and TCAG

Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6

6 International Journal of Mathematics and Mathematical Sciences

foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6

Pm(i 0) Pm(i minus 1 0) + gap

⟹PL(i 1) Pm(i 0) + gap

Pm(i minus 1 0) + gap( 1113857 + gap

(17)

also

Pm(0 j) Pm(0 j minus 1) + gap

⟹PR(1 j) Pm(0 j) + gap

Pm(0 j minus 1) + gap( 1113857 + gap

(18)

Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4Pm(i 0) Pm(0 j)

(19)

Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof

Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)

Pm(i minus 1 1) + gap and by induction let i 1 j then

PL(1 j) Pm(1 0) + gap

P(0 0) + gap + gap

P(0 0) + 2 gaps

PR(i 1) Pm(0 1) + gap

P(0 0) + gap + gap

P(0 0) + 2gaps

(20)

Assume Pm(k 0) Pm(0 k) then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4PR(i 1) PL(1 j) where i j

(21)

which completes the second part of our proofRecall

PL(i 1) Pm(i minus 1 0) + 2 gaps

PR(1 j) Pm(0 j minus 1) + 2 gaps

PL(i 1) PR(1 j)was established

(22)

It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)

Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))

For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)

and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us

σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)

Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)

Pm(0 j minus 1)

Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa

Proof From Proposition 2 and Definitions 5ndash7 it is clearthat

PL(i 1) PR(1 j)

PR(i 1) PL(1 j)

where i j

(24)

Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences

Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2

Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1

(25)

and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved

Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7

International Journal of Mathematics and Mathematical Sciences 7

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j) max

PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))

PR(i + 1 j) Pm(i j) + gap penalty

PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i j + 1) max

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))

PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j + 1) max

PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))

PR(i + 1 j + 1) Pm(i j + 1) + gap penalty

PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

PR(i + 1 j) Pm(i j) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

(26)

Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write

PR(i j) Pm(i minus 1 j) + gap penalty

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)

Weare left to show that gap penalty lt σ(α(i) β(j+ 1))

Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match

For any k count forallk 1 μ the linear gap penalty gk

decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us

gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score

match diagonal score

foralli j 1 μ

(28)

Hence PR(i j)ltPD(i j + 1)foralli j 1 μ

43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences

Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1

j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof

Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof

8 International Journal of Mathematics and Mathematical Sciences

5 Conclusion

In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs

Data Availability

No data were used to support this study

Conflicts of Interest

)e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005

[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003

[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007

[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005

[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970

[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008

[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005

[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981

[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004

[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002

[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957

[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004

[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018

[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication

company using prometheerdquo Applied Informatics vol 5 no 12018

[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019

[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010

[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003

[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012

[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017

[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009

[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986

International Journal of Mathematics and Mathematical Sciences 9

Page 4: FormulationandAnalysisofPatternsinaScoreMatrixforGlobal ...downloads.hindawi.com/journals/ijmms/2020/3858057.pdf · wise sequence alignment or multiple sequence alignment when any

for j 1 Pm(0 1) P(0 0) + gap penalty

⟹ 0 +(minus2) minus2

for j 2 Pm(0 2) Pm(0 1) + gap penalty

⟹ minus 2 +(minus2) minus4

for j 3 Pm(0 3) Pm(0 2) + gap penalty

⟹ minus 4 +(minus2) minus6

for j 4 Pm(0 4) Pm(0 3) + gap penalty

⟹ minus 6 +(minus2) minus8

for j 5 Pm(0 5) Pm(0 4) + gap penalty

⟹ minus 8 +(minus2) minus10

(6)

Before following with further calculation some iden-tification is placed on each cell box for easy noting asshown in Table 1

(2) Cell box calculations

Pm(1 1) max

PD(1 1) P(0 0) + σ(C C) 0 + 5 5

PR(1 1) Pm(0 1) + gap minus2 +(minus2) minus4

PL(1 1) Pm(1 0) + gap minus2 +(minus2) minus4

⎧⎪⎪⎨

⎪⎪⎩

(7)

Remark 2 σ(C C) in PD(1 1) was ldquo5rdquo because that is theassigned score for matched characters Pm(1 1) maxPD(1 1) PR(1 1) PL(1 1)1113864 1113865 max 5 minus4 minus41113858 1113859 5

Pm(1 2) max

PD(1 2) Pm(0 1) + σ(C T) minus2 +(minus1) minus3

PR(1 2) Pm(0 2) + gap minus4 +(minus2) minus6

PL(1 2) Pm(1 1) + gap 5 +(minus2) 3

⎧⎪⎪⎨

⎪⎪⎩

(8)

Remark 3 σ(C T) in PD(1 2) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(12)

max PD1113864 (12)PR(12)PL(12) max minus3 minus6 31113858 1113859 3

Pm(1 3) max

PD(1 3) Pm(0 2) + σ(C A) minus4 +(minus1) minus5

PR(1 3) Pm(0 3) + gap minus6 +(minus2) minus8

PL(1 3) Pm(1 2) + gap 3 +(minus2) 1

⎧⎪⎪⎨

⎪⎪⎩

(9)

Remark 4 σ(C A) in PD(1 3) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(1 3)

max PD1113864 (1 3) PR(1 3) PL(1 3) max minus5 minus8 11113858 1113859 1Consequently the recursion follows similarly until all

the cell boxes are filled In the event of a pivot tie in a cell boxone tie value is picked For the tabular display these no-tations are used interchangeably PD(i j) is the same as DVPL(i j) is the same as LV PR(i j) is the same as RV andPm(i j) is the same as the pivot

412 Pattern Results for Equal Character Length Withreference to Tables 1 and 2 and Figure 1

(1) )e filled-in cell boxes for column 1 ie cell boxes(i vi xi xvi and xxi) have values coinciding withthat of row 1 ie cell boxes (i ii iii iv and v)

(2) )e leading value (pivot) of each cell box in both row1 and column 1 remains the same

(3) )e bottom value in column 1 switches to becomethe beside value in row 1 and vice versa

(4) For each 3 pointed arrow intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box

(5) For each cell box the value of the preceding bottomvalue is less than the value of the immediate nextadjacent diagonal value

413 Traceback and Alignment for Equal Character Length)e traceback and alignment stages for equal characterlength of sequences are respectively shown in this section

Figure 2 shows the pivot of each cell box calculation ofthe score matrix table of CTTGA and CTAGA in Figure 1)us Figure 2 is a simpler score matrix table constructedfrom the original score matrix table of CTTGA and CTAGAin Figure 1 )e arrow pointers direct the path from theoptimal value and traceback to the initialization value ofzero Based on the traceback and the diagonal direction ofthe arrow pointers the alignment is written as

CTTGA

CTAGA(10)

To confirm the correctness of the alignment done wecheck using calculations Recall the scoring schemematch+5 mismatch minus1 and gap minus2 We have fouralignment matches C minus C T minus T G minus G and A minus A and onemismatched alignment T minus A Hence

5 + 5 minus 1 + 5 + 5 19 (11)

which is the same as the optimal value from the score matrixtable )e alignment is thus optimal

414 Unequal Character Length (μ1 ne μ2)

Example 2 Suppose langαrang AGCTG and langβrang TCAG then tofill in the score matrix values of the cell boxes the prior-stated recursive formulation is used )is results in Figures 3and 4

Table 1 Identifying cell boxes of CTTGA and CTAGA

C T T G A0 minus2 minus4 minus6 minus8 minus10

C minus2 i ii iii iv vT minus4 vi vii viii ix xA minus6 xi xii xiii xiv xvG minus8 xvi xvii xviii xix xxA minus10 xxi xxii xxiii xxiv xxv

4 International Journal of Mathematics and Mathematical Sciences

415 Pattern Results for Unequal Character Length

(1) )e filled-in cell boxes for column 1 ie(i vi xi and xvi) cell boxes have consistent valueswith that of row 1 ie (i ii iii and iv) up untilwhere they terminate at (v) and at the inconsistent(iii)

(2) Except for where an inconsistency is noted at (v) and(iii) the pivot values for each cell box of row 1 andcolumn 1 remain the same

(3) For each 3 pointed arrows intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box

GTC T A

C

T

A

A

G

0

9

6

ndash2 ndash4

3 1

3

ndash3

ndash4

ndash6

ndash6 ndash8 ndash10

ndash15

ndash1

8 11

ndash8

ndash10

ndash2

10 4

1 7

8

1476 12

ndash3 1254 19

Figure 2 Traceback in the score matrix of CTTGA and CTAGA

Table 2 Filled-in table of CTTGA and CTAGAi DV 5 LV minus4 RV minus4 Pivot 5ii DV minus3 LV 3 RV minus6 Pivot 3iii DV minus5 LV 1 RV minus8 Pivot 1iv DV minus7 LV minus1 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV 3 Pivot 3vii DV 10 LV 1 RV 1 Pivot 10viii DV 8 LV 8 RV minus1 Pivot 8ix DV 0 LV 6 RV minus3 Pivot 6x DV minus2 LV 4 RV minus5 Pivot 4xi DV minus5 LV minus8 RV 1 Pivot 1xii DV 2 LV minus1 RV 8 Pivot 8xiii DV 9 LV 6 RV 6 Pivot 9xiv DV 7 LV 7 RV 4 Pivot 7xv DV 11 LV 5 RV 2 Pivot 11xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 0 LV minus3 RV 6 Pivot 6xviii DV 7 LV 4 RV 7 Pivot 7xix DV 14 LV 5 RV 5 Pivot 14xx DV 6 LV 12 RV 9 Pivot 12xxi DV minus9 LV minus12 RV minus3 Pivot minus3xxii DV minus2 LV minus5 RV 4 Pivot 4xxiii DV 5 LV 2 RV 5 Pivot 5xxiv DV 6 LV 3 RV 12 Pivot 12xxv DV 19 LV 10 RV 10 Pivot 19

GTC T A

C

T

A

A

G

0 ndash2 ndash4

ndash4

ndash6

ndash6 ndash8 ndash10

ndash8

ndash10

ndash25 ndash4 ndash3 ndash6 ndash5 ndash8 ndash7 ndash10 ndash12ndash9

ndash4 ndash3ndash113

3ndash3 10 1 8 ndash1 0 ndash3 ndash2 ndash54

4

6

6

8

98

1

1 2

ndash6

ndash5

ndash8 ndash1ndash1ndash7

ndash10 ndash3ndash3

66 6

7

77

7

5

5

11

912

12 1954

ndash9

ndash12 ndash5ndash2 4 5 5

2 3

6

1010

140

2

Figure 1 Score matrix of the cell boxes of CTTGA and CTAGA

T

C

A

G

A G C T G

0

ndash2

ndash2

ndash4ndash4

ndash4 ndash4ndash4

ndash4

ndash4

ndash4

ndash6

ndash6

ndash6

ndash6

ndash8

ndash8

ndash10

ndash8

ndash8

ndash10

ndash10 ndash9ndash1 ndash12ndash1 ndash3

ndash3ndash3

ndash5

ndash5ndash5

ndash5

ndash5

ndash2 ndash22

2

ndash2

ndash2

ndash2

ndash2ndash2

ndash5

ndash7

ndash7

ndash7 ndash1

ndash1 ndash1

ndash1 ndash1

ndash1

ndash6

6 6

ndash3

ndash3

ndash3

ndash3

ndash3ndash3 ndash3

ndash3

0

0

04

11

Figure 3 Score matrix of AGCTG and TCAG

T

T

C

C

A

A

G

G G

0 ndash10

ndash8

ndash8

ndash6

ndash6

ndash4

ndash4

ndash2

ndash2

i ii iii iv v

vi vii vii ix x

x xii xiii xiv xv

xvi xvii xviii xix xx

Figure 4 Labelling of the cell boxes of AGCTG and TCAG

International Journal of Mathematics and Mathematical Sciences 5

(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value

416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as

AGCTG

T minus CAG(12)

We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence

minus1 minus 2 + 5 minus 1 + 5 6 (13)

which is the same as the optimal value from the score matrixtable )e alignment is thus optimal

42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences

Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column

Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix

Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix

Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix

Definition 7 Define

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

(14)

foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +

σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)

Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)

Definition 8 Define

σ(α(i) β(j)) +5 for α(i) β(j) charactermatch

minus1 for α(i) β(j) charactermismatch1113896

(15)foralli 1 μ1 and forallj 1 μ2

Proposition 2 Filled-in cell boxes for Pm(1 j)forallj

1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli

1 μ1 for equal length comparison of sequences

Proof By Definition 7 we have

Pm(1 j) max

PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))

PR(1 j) Pm(0 j) + gap

PL(1 j) Pm(1 j minus 1) + gap

⎧⎪⎪⎨

⎪⎪⎩

Pm(i 1) max

PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))

PR(i 1) Pm(i minus 1 1) + gap

PL(i 1) Pm(i 0) + gap

⎧⎪⎪⎨

⎪⎪⎩

(16)

TCA G G

T

C

A

G

0

0

0

ndash2 ndash4

ndash3 ndash5

ndash3

ndash3

ndash4

ndash6

ndash6 ndash8 ndash10

ndash1ndash1

ndash1

ndash1 ndash1

ndash8

ndash2

ndash2 ndash2

1 1

2

246 6

Figure 5 Traceback in the score matrix of AGCTG and TCAG

Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6

6 International Journal of Mathematics and Mathematical Sciences

foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6

Pm(i 0) Pm(i minus 1 0) + gap

⟹PL(i 1) Pm(i 0) + gap

Pm(i minus 1 0) + gap( 1113857 + gap

(17)

also

Pm(0 j) Pm(0 j minus 1) + gap

⟹PR(1 j) Pm(0 j) + gap

Pm(0 j minus 1) + gap( 1113857 + gap

(18)

Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4Pm(i 0) Pm(0 j)

(19)

Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof

Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)

Pm(i minus 1 1) + gap and by induction let i 1 j then

PL(1 j) Pm(1 0) + gap

P(0 0) + gap + gap

P(0 0) + 2 gaps

PR(i 1) Pm(0 1) + gap

P(0 0) + gap + gap

P(0 0) + 2gaps

(20)

Assume Pm(k 0) Pm(0 k) then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4PR(i 1) PL(1 j) where i j

(21)

which completes the second part of our proofRecall

PL(i 1) Pm(i minus 1 0) + 2 gaps

PR(1 j) Pm(0 j minus 1) + 2 gaps

PL(i 1) PR(1 j)was established

(22)

It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)

Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))

For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)

and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us

σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)

Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)

Pm(0 j minus 1)

Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa

Proof From Proposition 2 and Definitions 5ndash7 it is clearthat

PL(i 1) PR(1 j)

PR(i 1) PL(1 j)

where i j

(24)

Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences

Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2

Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1

(25)

and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved

Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7

International Journal of Mathematics and Mathematical Sciences 7

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j) max

PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))

PR(i + 1 j) Pm(i j) + gap penalty

PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i j + 1) max

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))

PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j + 1) max

PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))

PR(i + 1 j + 1) Pm(i j + 1) + gap penalty

PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

PR(i + 1 j) Pm(i j) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

(26)

Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write

PR(i j) Pm(i minus 1 j) + gap penalty

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)

Weare left to show that gap penalty lt σ(α(i) β(j+ 1))

Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match

For any k count forallk 1 μ the linear gap penalty gk

decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us

gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score

match diagonal score

foralli j 1 μ

(28)

Hence PR(i j)ltPD(i j + 1)foralli j 1 μ

43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences

Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1

j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof

Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof

8 International Journal of Mathematics and Mathematical Sciences

5 Conclusion

In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs

Data Availability

No data were used to support this study

Conflicts of Interest

)e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005

[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003

[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007

[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005

[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970

[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008

[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005

[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981

[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004

[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002

[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957

[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004

[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018

[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication

company using prometheerdquo Applied Informatics vol 5 no 12018

[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019

[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010

[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003

[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012

[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017

[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009

[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986

International Journal of Mathematics and Mathematical Sciences 9

Page 5: FormulationandAnalysisofPatternsinaScoreMatrixforGlobal ...downloads.hindawi.com/journals/ijmms/2020/3858057.pdf · wise sequence alignment or multiple sequence alignment when any

415 Pattern Results for Unequal Character Length

(1) )e filled-in cell boxes for column 1 ie(i vi xi and xvi) cell boxes have consistent valueswith that of row 1 ie (i ii iii and iv) up untilwhere they terminate at (v) and at the inconsistent(iii)

(2) Except for where an inconsistency is noted at (v) and(iii) the pivot values for each cell box of row 1 andcolumn 1 remain the same

(3) For each 3 pointed arrows intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box

GTC T A

C

T

A

A

G

0

9

6

ndash2 ndash4

3 1

3

ndash3

ndash4

ndash6

ndash6 ndash8 ndash10

ndash15

ndash1

8 11

ndash8

ndash10

ndash2

10 4

1 7

8

1476 12

ndash3 1254 19

Figure 2 Traceback in the score matrix of CTTGA and CTAGA

Table 2 Filled-in table of CTTGA and CTAGAi DV 5 LV minus4 RV minus4 Pivot 5ii DV minus3 LV 3 RV minus6 Pivot 3iii DV minus5 LV 1 RV minus8 Pivot 1iv DV minus7 LV minus1 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV 3 Pivot 3vii DV 10 LV 1 RV 1 Pivot 10viii DV 8 LV 8 RV minus1 Pivot 8ix DV 0 LV 6 RV minus3 Pivot 6x DV minus2 LV 4 RV minus5 Pivot 4xi DV minus5 LV minus8 RV 1 Pivot 1xii DV 2 LV minus1 RV 8 Pivot 8xiii DV 9 LV 6 RV 6 Pivot 9xiv DV 7 LV 7 RV 4 Pivot 7xv DV 11 LV 5 RV 2 Pivot 11xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 0 LV minus3 RV 6 Pivot 6xviii DV 7 LV 4 RV 7 Pivot 7xix DV 14 LV 5 RV 5 Pivot 14xx DV 6 LV 12 RV 9 Pivot 12xxi DV minus9 LV minus12 RV minus3 Pivot minus3xxii DV minus2 LV minus5 RV 4 Pivot 4xxiii DV 5 LV 2 RV 5 Pivot 5xxiv DV 6 LV 3 RV 12 Pivot 12xxv DV 19 LV 10 RV 10 Pivot 19

GTC T A

C

T

A

A

G

0 ndash2 ndash4

ndash4

ndash6

ndash6 ndash8 ndash10

ndash8

ndash10

ndash25 ndash4 ndash3 ndash6 ndash5 ndash8 ndash7 ndash10 ndash12ndash9

ndash4 ndash3ndash113

3ndash3 10 1 8 ndash1 0 ndash3 ndash2 ndash54

4

6

6

8

98

1

1 2

ndash6

ndash5

ndash8 ndash1ndash1ndash7

ndash10 ndash3ndash3

66 6

7

77

7

5

5

11

912

12 1954

ndash9

ndash12 ndash5ndash2 4 5 5

2 3

6

1010

140

2

Figure 1 Score matrix of the cell boxes of CTTGA and CTAGA

T

C

A

G

A G C T G

0

ndash2

ndash2

ndash4ndash4

ndash4 ndash4ndash4

ndash4

ndash4

ndash4

ndash6

ndash6

ndash6

ndash6

ndash8

ndash8

ndash10

ndash8

ndash8

ndash10

ndash10 ndash9ndash1 ndash12ndash1 ndash3

ndash3ndash3

ndash5

ndash5ndash5

ndash5

ndash5

ndash2 ndash22

2

ndash2

ndash2

ndash2

ndash2ndash2

ndash5

ndash7

ndash7

ndash7 ndash1

ndash1 ndash1

ndash1 ndash1

ndash1

ndash6

6 6

ndash3

ndash3

ndash3

ndash3

ndash3ndash3 ndash3

ndash3

0

0

04

11

Figure 3 Score matrix of AGCTG and TCAG

T

T

C

C

A

A

G

G G

0 ndash10

ndash8

ndash8

ndash6

ndash6

ndash4

ndash4

ndash2

ndash2

i ii iii iv v

vi vii vii ix x

x xii xiii xiv xv

xvi xvii xviii xix xx

Figure 4 Labelling of the cell boxes of AGCTG and TCAG

International Journal of Mathematics and Mathematical Sciences 5

(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value

416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as

AGCTG

T minus CAG(12)

We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence

minus1 minus 2 + 5 minus 1 + 5 6 (13)

which is the same as the optimal value from the score matrixtable )e alignment is thus optimal

42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences

Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column

Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix

Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix

Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix

Definition 7 Define

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

(14)

foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +

σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)

Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)

Definition 8 Define

σ(α(i) β(j)) +5 for α(i) β(j) charactermatch

minus1 for α(i) β(j) charactermismatch1113896

(15)foralli 1 μ1 and forallj 1 μ2

Proposition 2 Filled-in cell boxes for Pm(1 j)forallj

1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli

1 μ1 for equal length comparison of sequences

Proof By Definition 7 we have

Pm(1 j) max

PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))

PR(1 j) Pm(0 j) + gap

PL(1 j) Pm(1 j minus 1) + gap

⎧⎪⎪⎨

⎪⎪⎩

Pm(i 1) max

PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))

PR(i 1) Pm(i minus 1 1) + gap

PL(i 1) Pm(i 0) + gap

⎧⎪⎪⎨

⎪⎪⎩

(16)

TCA G G

T

C

A

G

0

0

0

ndash2 ndash4

ndash3 ndash5

ndash3

ndash3

ndash4

ndash6

ndash6 ndash8 ndash10

ndash1ndash1

ndash1

ndash1 ndash1

ndash8

ndash2

ndash2 ndash2

1 1

2

246 6

Figure 5 Traceback in the score matrix of AGCTG and TCAG

Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6

6 International Journal of Mathematics and Mathematical Sciences

foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6

Pm(i 0) Pm(i minus 1 0) + gap

⟹PL(i 1) Pm(i 0) + gap

Pm(i minus 1 0) + gap( 1113857 + gap

(17)

also

Pm(0 j) Pm(0 j minus 1) + gap

⟹PR(1 j) Pm(0 j) + gap

Pm(0 j minus 1) + gap( 1113857 + gap

(18)

Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4Pm(i 0) Pm(0 j)

(19)

Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof

Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)

Pm(i minus 1 1) + gap and by induction let i 1 j then

PL(1 j) Pm(1 0) + gap

P(0 0) + gap + gap

P(0 0) + 2 gaps

PR(i 1) Pm(0 1) + gap

P(0 0) + gap + gap

P(0 0) + 2gaps

(20)

Assume Pm(k 0) Pm(0 k) then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4PR(i 1) PL(1 j) where i j

(21)

which completes the second part of our proofRecall

PL(i 1) Pm(i minus 1 0) + 2 gaps

PR(1 j) Pm(0 j minus 1) + 2 gaps

PL(i 1) PR(1 j)was established

(22)

It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)

Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))

For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)

and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us

σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)

Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)

Pm(0 j minus 1)

Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa

Proof From Proposition 2 and Definitions 5ndash7 it is clearthat

PL(i 1) PR(1 j)

PR(i 1) PL(1 j)

where i j

(24)

Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences

Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2

Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1

(25)

and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved

Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7

International Journal of Mathematics and Mathematical Sciences 7

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j) max

PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))

PR(i + 1 j) Pm(i j) + gap penalty

PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i j + 1) max

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))

PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j + 1) max

PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))

PR(i + 1 j + 1) Pm(i j + 1) + gap penalty

PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

PR(i + 1 j) Pm(i j) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

(26)

Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write

PR(i j) Pm(i minus 1 j) + gap penalty

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)

Weare left to show that gap penalty lt σ(α(i) β(j+ 1))

Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match

For any k count forallk 1 μ the linear gap penalty gk

decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us

gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score

match diagonal score

foralli j 1 μ

(28)

Hence PR(i j)ltPD(i j + 1)foralli j 1 μ

43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences

Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1

j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof

Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof

8 International Journal of Mathematics and Mathematical Sciences

5 Conclusion

In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs

Data Availability

No data were used to support this study

Conflicts of Interest

)e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005

[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003

[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007

[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005

[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970

[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008

[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005

[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981

[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004

[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002

[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957

[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004

[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018

[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication

company using prometheerdquo Applied Informatics vol 5 no 12018

[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019

[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010

[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003

[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012

[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017

[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009

[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986

International Journal of Mathematics and Mathematical Sciences 9

Page 6: FormulationandAnalysisofPatternsinaScoreMatrixforGlobal ...downloads.hindawi.com/journals/ijmms/2020/3858057.pdf · wise sequence alignment or multiple sequence alignment when any

(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value

416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as

AGCTG

T minus CAG(12)

We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence

minus1 minus 2 + 5 minus 1 + 5 6 (13)

which is the same as the optimal value from the score matrixtable )e alignment is thus optimal

42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences

Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column

Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix

Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix

Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix

Definition 7 Define

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

(14)

foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +

σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)

Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)

Definition 8 Define

σ(α(i) β(j)) +5 for α(i) β(j) charactermatch

minus1 for α(i) β(j) charactermismatch1113896

(15)foralli 1 μ1 and forallj 1 μ2

Proposition 2 Filled-in cell boxes for Pm(1 j)forallj

1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli

1 μ1 for equal length comparison of sequences

Proof By Definition 7 we have

Pm(1 j) max

PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))

PR(1 j) Pm(0 j) + gap

PL(1 j) Pm(1 j minus 1) + gap

⎧⎪⎪⎨

⎪⎪⎩

Pm(i 1) max

PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))

PR(i 1) Pm(i minus 1 1) + gap

PL(i 1) Pm(i 0) + gap

⎧⎪⎪⎨

⎪⎪⎩

(16)

TCA G G

T

C

A

G

0

0

0

ndash2 ndash4

ndash3 ndash5

ndash3

ndash3

ndash4

ndash6

ndash6 ndash8 ndash10

ndash1ndash1

ndash1

ndash1 ndash1

ndash8

ndash2

ndash2 ndash2

1 1

2

246 6

Figure 5 Traceback in the score matrix of AGCTG and TCAG

Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6

6 International Journal of Mathematics and Mathematical Sciences

foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6

Pm(i 0) Pm(i minus 1 0) + gap

⟹PL(i 1) Pm(i 0) + gap

Pm(i minus 1 0) + gap( 1113857 + gap

(17)

also

Pm(0 j) Pm(0 j minus 1) + gap

⟹PR(1 j) Pm(0 j) + gap

Pm(0 j minus 1) + gap( 1113857 + gap

(18)

Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4Pm(i 0) Pm(0 j)

(19)

Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof

Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)

Pm(i minus 1 1) + gap and by induction let i 1 j then

PL(1 j) Pm(1 0) + gap

P(0 0) + gap + gap

P(0 0) + 2 gaps

PR(i 1) Pm(0 1) + gap

P(0 0) + gap + gap

P(0 0) + 2gaps

(20)

Assume Pm(k 0) Pm(0 k) then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4PR(i 1) PL(1 j) where i j

(21)

which completes the second part of our proofRecall

PL(i 1) Pm(i minus 1 0) + 2 gaps

PR(1 j) Pm(0 j minus 1) + 2 gaps

PL(i 1) PR(1 j)was established

(22)

It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)

Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))

For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)

and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us

σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)

Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)

Pm(0 j minus 1)

Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa

Proof From Proposition 2 and Definitions 5ndash7 it is clearthat

PL(i 1) PR(1 j)

PR(i 1) PL(1 j)

where i j

(24)

Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences

Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2

Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1

(25)

and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved

Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7

International Journal of Mathematics and Mathematical Sciences 7

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j) max

PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))

PR(i + 1 j) Pm(i j) + gap penalty

PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i j + 1) max

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))

PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j + 1) max

PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))

PR(i + 1 j + 1) Pm(i j + 1) + gap penalty

PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

PR(i + 1 j) Pm(i j) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

(26)

Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write

PR(i j) Pm(i minus 1 j) + gap penalty

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)

Weare left to show that gap penalty lt σ(α(i) β(j+ 1))

Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match

For any k count forallk 1 μ the linear gap penalty gk

decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us

gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score

match diagonal score

foralli j 1 μ

(28)

Hence PR(i j)ltPD(i j + 1)foralli j 1 μ

43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences

Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1

j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof

Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof

8 International Journal of Mathematics and Mathematical Sciences

5 Conclusion

In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs

Data Availability

No data were used to support this study

Conflicts of Interest

)e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005

[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003

[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007

[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005

[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970

[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008

[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005

[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981

[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004

[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002

[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957

[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004

[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018

[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication

company using prometheerdquo Applied Informatics vol 5 no 12018

[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019

[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010

[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003

[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012

[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017

[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009

[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986

International Journal of Mathematics and Mathematical Sciences 9

Page 7: FormulationandAnalysisofPatternsinaScoreMatrixforGlobal ...downloads.hindawi.com/journals/ijmms/2020/3858057.pdf · wise sequence alignment or multiple sequence alignment when any

foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6

Pm(i 0) Pm(i minus 1 0) + gap

⟹PL(i 1) Pm(i 0) + gap

Pm(i minus 1 0) + gap( 1113857 + gap

(17)

also

Pm(0 j) Pm(0 j minus 1) + gap

⟹PR(1 j) Pm(0 j) + gap

Pm(0 j minus 1) + gap( 1113857 + gap

(18)

Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4Pm(i 0) Pm(0 j)

(19)

Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof

Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)

Pm(i minus 1 1) + gap and by induction let i 1 j then

PL(1 j) Pm(1 0) + gap

P(0 0) + gap + gap

P(0 0) + 2 gaps

PR(i 1) Pm(0 1) + gap

P(0 0) + gap + gap

P(0 0) + 2gaps

(20)

Assume Pm(k 0) Pm(0 k) then

Pm(k + 1 0) Pm(k 0) + gap

Pm(0 k) + gap

Pm(0 k + 1)

there4PR(i 1) PL(1 j) where i j

(21)

which completes the second part of our proofRecall

PL(i 1) Pm(i minus 1 0) + 2 gaps

PR(1 j) Pm(0 j minus 1) + 2 gaps

PL(i 1) PR(1 j)was established

(22)

It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)

Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))

For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)

and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us

σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)

Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)

Pm(0 j minus 1)

Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa

Proof From Proposition 2 and Definitions 5ndash7 it is clearthat

PL(i 1) PR(1 j)

PR(i 1) PL(1 j)

where i j

(24)

Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences

Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2

Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1

(25)

and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved

Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7

International Journal of Mathematics and Mathematical Sciences 7

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j) max

PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))

PR(i + 1 j) Pm(i j) + gap penalty

PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i j + 1) max

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))

PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j + 1) max

PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))

PR(i + 1 j + 1) Pm(i j + 1) + gap penalty

PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

PR(i + 1 j) Pm(i j) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

(26)

Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write

PR(i j) Pm(i minus 1 j) + gap penalty

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)

Weare left to show that gap penalty lt σ(α(i) β(j+ 1))

Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match

For any k count forallk 1 μ the linear gap penalty gk

decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us

gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score

match diagonal score

foralli j 1 μ

(28)

Hence PR(i j)ltPD(i j + 1)foralli j 1 μ

43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences

Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1

j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof

Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof

8 International Journal of Mathematics and Mathematical Sciences

5 Conclusion

In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs

Data Availability

No data were used to support this study

Conflicts of Interest

)e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005

[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003

[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007

[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005

[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970

[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008

[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005

[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981

[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004

[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002

[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957

[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004

[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018

[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication

company using prometheerdquo Applied Informatics vol 5 no 12018

[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019

[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010

[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003

[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012

[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017

[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009

[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986

International Journal of Mathematics and Mathematical Sciences 9

Page 8: FormulationandAnalysisofPatternsinaScoreMatrixforGlobal ...downloads.hindawi.com/journals/ijmms/2020/3858057.pdf · wise sequence alignment or multiple sequence alignment when any

Pm(i j) max

PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))

PR(i j) Pm(i minus 1 j) + gap penalty

PL(i j) Pm(i j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j) max

PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))

PR(i + 1 j) Pm(i j) + gap penalty

PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i j + 1) max

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))

PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

Pm(i + 1 j + 1) max

PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))

PR(i + 1 j + 1) Pm(i j + 1) + gap penalty

PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty

⎧⎪⎪⎨

⎪⎪⎩

PR(i + 1 j) Pm(i j) + gap penalty

PL(i j + 1) Pm(i j) + gap penalty

(26)

Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write

PR(i j) Pm(i minus 1 j) + gap penalty

PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)

Weare left to show that gap penalty lt σ(α(i) β(j+ 1))

Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match

For any k count forallk 1 μ the linear gap penalty gk

decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us

gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score

match diagonal score

foralli j 1 μ

(28)

Hence PR(i j)ltPD(i j + 1)foralli j 1 μ

43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences

Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box

Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1

j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof

Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box

Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof

8 International Journal of Mathematics and Mathematical Sciences

5 Conclusion

In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs

Data Availability

No data were used to support this study

Conflicts of Interest

)e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005

[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003

[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007

[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005

[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970

[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008

[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005

[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981

[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004

[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002

[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957

[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004

[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018

[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication

company using prometheerdquo Applied Informatics vol 5 no 12018

[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019

[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010

[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003

[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012

[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017

[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009

[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986

International Journal of Mathematics and Mathematical Sciences 9

Page 9: FormulationandAnalysisofPatternsinaScoreMatrixforGlobal ...downloads.hindawi.com/journals/ijmms/2020/3858057.pdf · wise sequence alignment or multiple sequence alignment when any

5 Conclusion

In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs

Data Availability

No data were used to support this study

Conflicts of Interest

)e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005

[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003

[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007

[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005

[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970

[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008

[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005

[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981

[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004

[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002

[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957

[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004

[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018

[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication

company using prometheerdquo Applied Informatics vol 5 no 12018

[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019

[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010

[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003

[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012

[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017

[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009

[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986

International Journal of Mathematics and Mathematical Sciences 9