introduction to bioinformatics tutorial 2. local alignment tutorial 2

44
Introduction To Bioinformatics Tutorial 2

Post on 21-Dec-2015

241 views

Category:

Documents


2 download

TRANSCRIPT

Introduction To BioinformaticsTutorial 2

Local Alignment

Tutorial 2

• Usage: Spelling,..

• Different Types:

– Hamming

– Levenshtein

• Algorithm

– Naïve solution

– Dynamic programming

Edit Distance

• Richard Bellman (1940)

• “Program”

– Computer program?

– Optimal Schedule

Dynamic Programming

• Conditions

– Division to sub-problems possible

– (Optimal) Sub-problem solution usable (many

times?)

– “Bottom-up” approach

Dynamic Programming

• Examples

– Shortest path

– Fibonacci

Dynamic Programming

• Usage: Spelling, Biology,…

• Compare sequences

• Similar sequence

Ancestral origin

Function…

Edit Distance

• Dynamic Programming algorithm for finding local

matches between two sequences.

• What is a local match?:

– It is a best-matching, highest-scoring region between

two sequences.

– It is a well conserved region between two sequences.

Local Alignment

Alignment

N1Nn

M1

Mm

Alignment

N1Nn

M1

Mm

[I,J ]Best alignment M1..I, N1..J

Alignment

All possible alignments encoded as path in matrix

The differences:

1.We can start a new match instead of extending a previous alignment.

2.Instead of looking only at the far corner, we look anywhere in the table for the best score

Global vs Local

Global Local

Scoring System

Match : +1Mismatch: -2Indel : -1

Local Alignment

Scoring System– Match : +1 Ni=Mj– Mismatch: -1 Ni=Mj– Indel : -2

N1Nn

M1

Mm

Local Alignment

Scoring System

– Match : +1 Ni=Mj

– Mismatch: -1 Ni=Mj– Indel : -2

N1Nn

M1

Mm

Local Alignment

Scoring System– Match : +1– Mismatch : -1

– Indel : -2

N1Nn

M1

Mm

Local Alignment

Scoring System– Match : +1– Mismatch : -1

– Indel : -2

N1Nn

M1

Mm

N1

-

Local Alignment

Scoring System– Match : +1– Mismatch : -1

– Indel : -2

N1Nn

M1

Mm

-

M1

Local Alignment

Scoring System– Match : +1– Mismatch: -1

– Indel : -2 N1N2Nn

M1

M2

Mm

N1..- M1M2..

Local Alignment

Fill:1.We fill the table like in global alignment, but we don’t

allow negative numbers (turn every negative number to 0)2.No arrows coming out from cells with a 0.

Scoring System– Match : +1– Mismatch: -1– Indel : -2

+1 if M2=N2; -1 if M2=N2

-2

N1N2Nn

M1

M2

Mm

N1N2..M1M2..

N1 ..-M1M2..

N1N2..M1 ..-

Local Alignment

Trace:

We trace back from the highest scoring cells.

+1 if M2=N2; -1 if M2=N2

-2

N1N2Nn

M1

M2

Mm

N1N2..M1M2..

N1 ..-M1M2..

N1N2..M1 ..-

Local Alignment

Question:

Will there be gaps at the start/end?

N1N2Nn

M1

M2

Mm

0

T

1

A

2

C

3

T

4

A

5

A

6

0 0

T 1 

A 2 

A 3 

T 4 

A 5 

0

T

1

A

2

C

3

T

4

A

5

A

6

0 0000000

T 1 

A 2 

A 3 

T 4 

A 5 

0

T

1

A

2

C

3

T

4

A

5

A

6

0 0000000

T 1 0

A 2 0

A 3 0

T 4 0

A 5 0

0

T

1

A

2

C

3

T

4

A

5

A

6

0 0000000

T 1 0

A 2 0

A 3 0

T 4 0

A 5 0

-T

0

T

1

A

2

C

3

T

4

A

5

A

6

0 0000000

T 1 0

A 2 0

A 3 0

T 4 0

A 5 0

T-

0

T

1

A

2

C

3

T

4

A

5

A

6

0 0000000

T 1 0?

A 2 0

A 3 0

T 4 0

A 5 0

0

T

1

A

2

C

3

T

4

A

5

A

6

0 0000000

T 1 0?

A 2 0

A 3 0

T 4 0

A 5 0

-TT-

-T-T

T--T

+1

-2-2

0

T

1

A

2

C

3

T

4

A

5

A

6

0 0000000

T 1 01

A 2 0

A 3 0

T 4 0

A 5 0

0

T

1

A

2

C

3

T

4

A

5

A

6

0 0000000

T 1 010

A 2 0

A 3 0

T 4 0

A 5 0

0A 5 

0T 4 

0A 3 

0A 2 

0010010T 1 

00000000 

A

6

A

5

T

4

C

3

A

2

T

10

0A 5 

0T 4 

0A 3 

1200200A 2 

0010010T 1 

00000000 

A

6

A

5

T

4

C

3

A

2

T

10

0A 5 

0T 4 

3101100A 3 

1200200A 2 

0010010T 1 

00000000 

A

6

A

5

T

4

C

3

A

2

T

10

0A 5 

1020010T 4 

3101100A 3 

1200200A 2 

0010010T 1 

00000000 

A

6

A

5

T

4

C

3

A

2

T

10

1300200A 5 

1020010T 4 

3101100A 3 

1200200A 2 

0010010T 1 

00000000 

A

6

A

5

T

4

C

3

A

2

T

10

0

T

1

A

2

C

3

T

4

A

5

A

6

0 000000

T 1 010010

A 2 0020021

A 3 0011013T 4 0100201

A 5 0020031

0

T

1

A

2

C

3

T

4

A

5

A

6

0 000000

T 1 010010

A 2 0020021

A 3 0011013

T 4 0100201

A 5 0020031

Leave only paths from highest score

TAATAA

TACTATAATA

1300200A 5

1020010T 4

3101100A 3

1200200A 2

010010T 1

0000000

A

6

A

5

T

4

C

3

A

2

T

10

1300200A 5

1020010T 4

3101100A 3

1200200A 2

010010T 1

0000000

A

6

A

5

T

4

C

3

A

2

T

10

And Now… Global Alignment

1.We keep negative numbers.2.Arrows coming out from any cell.3.We trace back from right-bottom to left-top of the table.

Scoring System– Match : +1– Mismatch: -1– Indel : -2

+1 if M2=N2; -1 if M2=N2

-2

N1N2Nn

M1

M2

Mm

N1N2..M1M2..

N1 ..-M1M2..

N1N2..M1 ..-

A 5 

T 4 

A 3 

A 2 

T 1 

A

6

A

5

T

4

C

3

A

2

T

10

Match: +1

Mismatch:-1

Indel: -2

-12-10-8-6-4-2

-10

-8

-6

-4

-2

0

-9-7-5-3-11

130-3-4-7

-202-1-2-5

-3-1-110-3

-6-4-202-1

A 5 

T 4 

A 3 

A 2 

T 1 

A

6

A

5

T

4

C

3

A

2

T

10

Match: +1

Mismatch:-1

Indel: -2

-12-10-8-6-4-2

-10

-8

-6

-4

-2

0

-9-7-5-3-11

130-3-4-7

-202-1-2-5

-3-1-110-3

-6-4-202-1

130-3-1-4-5A 5 

-102-1-2-2-4T 4 

1-1-110-3-3A 3 

-20-202-1-2A 2 

-6-4-2-3-11-1T 1 

-6-5-4-3-2-100 

A

6

A

5

T

4

C

3

A

2

T

10

TACTAATAATA-

TACTAATAAT-A

130-3-1-4-5A 5

-102-1-2-2-4T 4

1-1-110-3-3A 3

-20-202-1-2A 2

-6-4-2-3-11-1T 1

-6-5-4-3-2-100

A

6

A

5

T

4

C

3

A

2

T

10

A 5 

T 4 

A 3 

A 2 

T 1 

A

6

A

5

T

4

C

3

A

2

T

10