spring 2007 bioinformatiatics ch. 2 - sequence alignment
Post on 21-Dec-2015
219 views
TRANSCRIPT
![Page 1: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/1.jpg)
Spring 2007 BioinformatiaticsBioinformatiatics
Ch. 2 - Sequence Alignment
![Page 2: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/2.jpg)
C 12 G -3 5 P -3 -1 6 S 0 1 1 1 A -2 1 1 1 2 T -2 0 0 1 1 3 D -5 1 -1 0 0 0 4 E -5 0 -1 0 0 0 3 4 N -4 0 -1 1 0 0 2 1 2 Q -5 -1 0 -1 0 -1 2 2 1 4 H -3 -2 0 -1 -1 -1 1 1 2 3 6 K -5 -2 -1 0 -1 0 0 0 1 1 0 5 R -4 -3 0 0 -2 -1 -1 -1 0 1 2 3 6 V -2 -1 -1 -1 0 0 -2 -2 -2 -2 -2 -2 -2 4 M -5 -3 -2 -2 -1 -1 -3 -2 0 -1 -2 0 0 2 6 I -2 -3 -2 -1 -1 0 -2 -2 -2 -2 -2 -2 -2 4 2 5 L -6 -4 -3 -3 -2 -2 -4 -3 -3 -2 -2 -3 -3 2 4 2 6 F -4 -5 -5 -3 -4 -3 -6 -5 -4 -5 -2 -5 -4 -1 0 1 2 9 Y 0 -5 -5 -3 -3 -3 -4 -4 -2 -4 0 -4 -5 -2 -2 -1 -1 7 10 W -8 -7 -6 -2 -6 -5 -7 -7 -4 -5 -3 -3 2 -6 -4 -5 -2 0 0 17 C G P S A T D E N Q H K R V M I L F Y W 1
PAM 250 Matrix
![Page 3: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/3.jpg)
BLOSUM Matrix 62
![Page 4: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/4.jpg)
![Page 5: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/5.jpg)
SSU Secondary StructureSSU Secondary Structure
![Page 6: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/6.jpg)
66
![Page 7: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/7.jpg)
Cytochrome C
![Page 8: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/8.jpg)
~82,000,000 DNA sequences as of April 2008
![Page 9: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/9.jpg)
When is a database hit significant?
• Problem:
– Even unrelated sequences can be aligned (yielding a low score)
– How do we know if a database hit is meaningful?
– When is an alignment score sufficiently high?
• Solution:
– Determine the range of alignment scores you would expect to get for random reasons (i.e., when aligning unrelated sequences).
– Compare actual scores to the distribution of random scores.
– Is the real score much higher than you’d expect by chance?
![Page 10: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/10.jpg)
Random alignment scores follow extreme value distributions
The exact shape and location of the distribution depends on the exact nature of the database and the query sequence
Searching a database of unrelated sequences result in scores following an extreme value distribution
No.
of
Sequen
ces
Alignment Score
![Page 11: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/11.jpg)
Significance of a hit: one possible solution(1) Align query sequence to all sequences in database, note scores
(2) Fit actual scores to a mixture of two sub-distributions: (a) an extreme value distribution and (b) a normal distribution
(3) Use fitted extreme-value distribution to predict how many random hits to expect for any given score (the “E-value”)
No.
of
Sequen
ces
Alignment Score
![Page 12: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/12.jpg)
Significance of a hit: exampleSearch against a database of 10,000 sequences.
An extreme-value distribution (blue) is fitted to the distribution of all scores.
It is found that 99.9% of the blue distribution has a score below 112.
This means that when searching a database of 10,000 sequences you’d expect to get 0.1% * 10,000 = 10 hits with a score of 112 or better for random reasons
10 is the E-value of a hit with score 112. You want E-values well below 1!
No.
of
Sequen
ces
Alignment Score
![Page 13: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/13.jpg)
Example of Blast E-values
agtgaagtacgtgcgttaatgcgatgagtacggtaaaaagaccggcgtgctttatgggtgagcgggagtttgtgccagcgaagcgtccttggacttagagagtgtcgggttcgggacgtccggctacagaatagtaaa
•Semi-random sequence
Blast these sequences
agcggaccggtacttaagcgcggaccggcgtgtccttggacttagagagtggggacgtccggcttcggagcgggagtgttcgttgtgccagcgactaaaaagagaattaaatatgggtga
•Non-random sequence
![Page 14: Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d565503460f94a34fa7/html5/thumbnails/14.jpg)