sequence similarity search glance to the protein world
Post on 21-Dec-2015
219 views
TRANSCRIPT
![Page 1: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/1.jpg)
Sequence similarity search
Glance to the protein world
![Page 2: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/2.jpg)
WHATS TODAY?
• BLASTing Proteins
- Similarity scores for protein sequences
- Advanced BLAST (PSI BLAST)
![Page 3: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/3.jpg)
Protein Sequence AlignmentRule of thumb:Rule of thumb:Proteins are homologous if 25% identical (Proteins are homologous if 25% identical (length >100length >100))DNA sequences are homologous if 70% identicalDNA sequences are homologous if 70% identical
![Page 4: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/4.jpg)
Protein Pairwise Sequence Alignment
• The alignment tools are similar to the DNA alignment tools• BLASTN for nucleotides • BLASTP for proteins
• Main difference: instead of scoring match (+2) and mismatch (-1) we have similarity scores:• Score s(i,j) > 0 if amino acids i and j have similar
properties • Score s(i,j) is 0 otherwise
• How should we score s(i,j)?
![Page 5: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/5.jpg)
The 20 Amino Acids
![Page 6: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/6.jpg)
Chemical Similarities Between Amino Acids
Acids & Amides DENQ (Asp, Glu, Asn, Gln)
Basic HKR (His, Lys, Arg)
Aromatic FYW (Phe, Tyr, Trp)
Hydrophilic ACGPST (Ala, Cys, Gly, Pro, Ser, Thr)
Hydrophobic ILMV (Ile, Leu, Met, Val)
![Page 7: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/7.jpg)
Sequence Alignment based on AA similarity
TQSPSSLSASVGDTVTITCRASQSISTYLNWYQQKP----GKAPKLLIYAASSSQSGVPS|| + |||| +|| ||| | +| | | | |TQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADS RFSGSGSGTDFTLTINSLQPEDFATYYCQ---------------QSYSTPHFSQGTKLEI | | | +| | | +|+ || || |+ + | | || | + RRSLWDQG-NFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTL
---KRTVAAPSVFIFPPSDEQLKSGTASVVCLLN---------NFYPREAKVQWKVD ++||| | + ++ | | | + ||++|+| TLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKID
| = identity + = similarity
![Page 8: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/8.jpg)
Amino Acid Substitutions Matrices
• When scoring protein sequence alignments it is common to use a matrix of 20 20, representing all pairwise comparisons :
-Score Matrix
-Substitution Matrix
![Page 9: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/9.jpg)
Scoring Matrices
• Scoring Matrix -match/mismatch score – Not bad for similar sequences– Does not show distantly related sequences
• Substitution matrix– Scores residues dependent upon likelihood
substitution is found in nature– More applicable for amino acid sequences
![Page 10: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/10.jpg)
Given an alignment of closely related sequences we can score the relation between amino acidsbased on how frequently they substitute each other
In this column
E & D are found
7/8
M G Y D EM G Y D EM G Y E EM G Y D EM G Y Q EM G Y D EM G Y E EM G Y E E
Substitution Matrix
![Page 11: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/11.jpg)
C H+H3N
COO-
HCH
C
O-O
C H+H3N
C
COO-
HCH
O-O
HCH
Aspartate(Asp, D)
Glutamate(Glu, E)
D / E
![Page 12: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/12.jpg)
PAM - Point Accepted Mutations• Developed by Margaret Dayhoff, 1978.• Analyzed very similar protein sequences “Accepted” mutations – do not negatively affect a
protein’s fitness
• Used global alignment.Counted the number of substitutions (i,j) per amino acidpair: Many i<->j substitutions => high score s(i,j)
![Page 13: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/13.jpg)
Basic matrixnormalized probabilities multiplied by 10000
Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val A R N D C Q E G H I L K M F P S T W Y V A 9867 2 9 10 3 8 17 21 2 6 4 2 6 2 22 35 32 0 2 18 R 1 9913 1 0 1 10 0 0 10 3 1 19 4 1 4 6 1 8 0 1 N 4 1 9822 36 0 4 6 6 21 3 1 13 0 1 2 20 9 1 4 1 D 6 0 42 9859 0 6 53 6 4 1 0 3 0 0 1 5 3 0 0 1 C 1 1 0 0 9973 0 0 0 1 1 0 0 0 0 1 5 1 0 3 2 Q 3 9 4 5 0 9876 27 1 23 1 3 6 4 0 6 2 2 0 0 1 E 10 0 7 56 0 35 9865 4 2 3 1 4 1 0 3 4 2 0 1 2 G 21 1 12 11 1 3 7 9935 1 0 1 2 1 1 3 21 3 0 0 5 H 1 8 18 3 1 20 1 0 9912 0 1 1 0 2 3 1 1 1 4 1 I 2 2 3 1 2 1 2 0 0 9872 9 2 12 7 0 1 7 0 1 33L 3 1 3 0 0 6 1 1 4 22 9947 2 45 13 3 1 3 4 2 15 K 2 37 25 6 0 12 7 2 2 4 1 9926 20 0 3 8 11 0 1 1M 1 1 0 0 0 2 0 0 0 5 8 4 9874 1 0 1 2 0 0 4 F 1 1 1 0 0 0 0 1 2 8 6 0 4 9946 0 2 1 3 28 0 P 13 5 2 1 1 8 3 2 5 1 2 2 1 1 9926 12 4 0 0 2 S 28 11 34 7 11 4 6 16 2 2 1 7 4 3 17 9840 38 5 2 2 T 22 2 13 4 1 3 2 2 1 11 2 8 6 1 5 32 9871 0 2 9 W 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 9976 1 0 Y 1 0 3 0 3 0 1 0 4 1 1 0 0 21 0 1 1 2 9945 1 V 13 2 1 1 3 2 2 3 3 57 11 1 17 1 3 2 10 0 2 9901
![Page 14: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/14.jpg)
Log Odds Matrices
• PAM matrices converted to log-odds matrix– Calculate odds ratio for each substitution
• Taking scores in previous matrix• Divide by frequency of amino acid
– Convert ratio to log10 and multiply by 10– Take average of log odds ratio for converting A to B
and converting B to A– Result: Symmetric matrix
![Page 15: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/15.jpg)
PAM250 Log odds matrix
Entry (i,i) is greater than any entry (i,j), ji.
Entry (i,j): the score of aligning amino acid i against amino acid j.
Simliar aa have high score
![Page 16: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/16.jpg)
Selecting a PAM Matrix
• There are different PAM matrices (PAM 1- PAM250). The matrices are derived from each other by multiplying the PAM1 matrices N times
• Low PAM numbers: short sequences, strong local similarities.
• High PAM numbers: long sequences, weak similarities.– PAM120 recommended for general use (40% identity)– PAM60 for close relations (60% identity)– PAM250 for distant relations (20% identity)
• If uncertain, try several different matrices– PAM40, PAM120, PAM250 recommended
![Page 17: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/17.jpg)
BLOSUM• Blocks Substitution Matrix
– Steven and Jorga G. Henikoff (1992)
• Based on BLOCKS database (www.blocks.fhcrc.org)
– Families of proteins with identical function
– Highly conserved protein domains
• Ungapped local alignment to identify motifs– Each motif is a block of local alignment
– Counts amino acids observed in same column
– Symmetrical model of substitution AABCDA… BBCDA DABCDA. A.BBCBB BBBCDABA.BCCAA AAACDAC.DCBCDB CCBADAB.DBBDCC AAACAA… BBCCC
![Page 18: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/18.jpg)
BLOSUM Matrices
• Different BLOSUMn matrices are calculated independently from BLOCKS
• BLOSUMn is based on blocks that are at most n percent identical.
![Page 19: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/19.jpg)
Selecting a BLOSUM Matrix
• For BLOSUMn, higher n suitable for sequences which are more similar– BLOSUM62 recommended for general use– BLOSUM80 for close relations– BLOSUM45 for distant relations
![Page 20: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/20.jpg)
Summary:
• BLOSUM matrices are based on the replacement patterns found in more highly conserved regions of the sequences without gaps =Loacl alignment
• PAM matrices based on mutations observed throughout a global alignment, includes both highly conserved and highly mutable regions
BLAST uses BLOSUM62 as a defaultREMEMBER !!!! you can always change it
![Page 21: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/21.jpg)
Gap penalty in protein alignments
• Scoring for gap opening & for extension
Depends on the substitution matrix used
• Default gap parameters are given for each matrix:
– PAM30: open=9, extension=1
– PAM250: open=14, extension=2
![Page 22: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/22.jpg)
Remote homologues
• Sometimes BLAST isn’t enough.
• Large protein family, and BLAST only gives close members. We want more distant members
PSI-BLAST
![Page 23: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/23.jpg)
PSI-BLAST
[1] Select a query and search it against a protein database
[2] PSI-BLAST constructs a multiple sequence alignmentthen creates a “profile” or specialized position-specificscoring matrix (PSSM)
Page 138
![Page 24: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/24.jpg)
R,I,K C D,E,T K,R,T N,L,Y,G
![Page 25: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/25.jpg)
A R N D C Q E G H I L K M F P S T W Y V 1 M -1 -2 -2 -3 -2 -1 -2 -3 -2 1 2 -2 6 0 -3 -2 -1 -2 -1 1 2 K -1 1 0 1 -4 2 4 -2 0 -3 -3 3 -2 -4 -1 0 -1 -3 -2 -3 3 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 4 V 0 -3 -3 -4 -1 -3 -3 -4 -4 3 1 -3 1 -1 -3 -2 0 -3 -1 4 5 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 6 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 7 L -2 -2 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 1 8 L -1 -3 -3 -4 -1 -3 -3 -4 -3 2 2 -3 1 3 -3 -2 -1 -2 0 3 9 L -1 -3 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 2 10 L -2 -2 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 1 11 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 12 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 13 W -2 -3 -4 -4 -2 -2 -3 -4 -3 1 4 -3 2 1 -3 -3 -2 7 0 0 14 A 3 -2 -1 -2 -1 -1 -2 4 -2 -2 -2 -1 -2 -3 -1 1 -1 -3 -3 -1 15 A 2 -1 0 -1 -2 2 0 2 -1 -3 -3 0 -2 -3 -1 3 0 -3 -2 -2 16 A 4 -2 -1 -2 -1 -1 -1 3 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 -1 ... 37 S 2 -1 0 -1 -1 0 0 0 -1 -2 -3 0 -2 -3 -1 4 1 -3 -2 -2 38 G 0 -3 -1 -2 -3 -2 -2 6 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 39 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -3 -2 0 40 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 41 Y -2 -2 -2 -3 -3 -2 -2 -3 2 -2 -1 -2 -1 3 -3 -2 -2 2 7 -1 42 A 4 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0
![Page 26: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/26.jpg)
A R N D C Q E G H I L K M F P S T W Y V 1 M -1 -2 -2 -3 -2 -1 -2 -3 -2 1 2 -2 6 0 -3 -2 -1 -2 -1 1 2 K -1 1 0 1 -4 2 4 -2 0 -3 -3 3 -2 -4 -1 0 -1 -3 -2 -3 3 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 4 V 0 -3 -3 -4 -1 -3 -3 -4 -4 3 1 -3 1 -1 -3 -2 0 -3 -1 4 5 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 6 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 7 L -2 -2 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 1 8 L -1 -3 -3 -4 -1 -3 -3 -4 -3 2 2 -3 1 3 -3 -2 -1 -2 0 3 9 L -1 -3 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 2 10 L -2 -2 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 1 11 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 12 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 13 W -2 -3 -4 -4 -2 -2 -3 -4 -3 1 4 -3 2 1 -3 -3 -2 7 0 0 14 A 3 -2 -1 -2 -1 -1 -2 4 -2 -2 -2 -1 -2 -3 -1 1 -1 -3 -3 -1 15 A 2 -1 0 -1 -2 2 0 2 -1 -3 -3 0 -2 -3 -1 3 0 -3 -2 -2 16 A 4 -2 -1 -2 -1 -1 -1 3 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 -1 ... 37 S 2 -1 0 -1 -1 0 0 0 -1 -2 -3 0 -2 -3 -1 4 1 -3 -2 -2 38 G 0 -3 -1 -2 -3 -2 -2 6 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 39 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -3 -2 0 40 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 41 Y -2 -2 -2 -3 -3 -2 -2 -3 2 -2 -1 -2 -1 3 -3 -2 -2 2 7 -1 42 A 4 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0
![Page 27: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/27.jpg)
PSI-BLAST
[1] Select a query and search it against a protein database
[2] PSI-BLAST constructs a multiple sequence alignmentthen creates a “profile” or specialized position-specificscoring matrix (PSSM)
[3] The PSSM is used as a query against the database
[4] PSI-BLAST estimates statistical significance (E values)
[5] Repeat steps [3] and [4] iteratively, typically 5 times.At each new search, a new profile is used as the query.Page 138
![Page 28: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/28.jpg)
Searching for remote homology using PSI-BLAST
![Page 29: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/29.jpg)
The universe of lipocalins (each dot is a protein)
retinol-binding protein
odorant-binding protein
apolipoprotein D
Retinol binding Protein
B-lactoglubolin
![Page 30: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/30.jpg)
Score = 46.2 bits (108), Expect = 2e-04Identities = 40/150 (26%), Positives = 70/150 (46%), Gaps = 37/150 (24%)
Query: 27 VKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWDVC 86 V+ENFD ++ G WY + +K P + I A +S+ E G + K ++ Sbjct: 33 VQENFDVKKYLGRWYEI-EKIPASFEKGNCIQANYSLMENGNIEVLNK---------ELS 82
Query: 87 ADMVGTF---------TDTEDPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAVQYSCR 137 D GT ++ +PAK +++++ + +WI+ TDY+ YA+ YSC Sbjct: 83 PD--GTMNQVKGEAKQSNVSEPAKLEVQFFPLMP-----PAPYWILATDYENYALVYSCT 135
Query: 138 ----LLNLDGTCADSYSFVFSRDPNGLPPE 163 L ++D + ++ R+P LPPESbjct: 136 TFFWLFHVD------FFWILGRNPY-LPPE 158
PSI-BLAST alignment of RBP (retinol binding protein)and -lactoglobulin: iteration 1
Example is taken from Bioinformatics and Functional Genomicsby Jonathan Pevsner (ISBN 0-471-21004-8). Copyright © 2003 by John Wiley & Sons, Inc.
![Page 31: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/31.jpg)
PSI-BLAST alignment of RBP and -lactoglobulin: iteration 2
Score = 140 bits (353), Expect = 1e-32Identities = 45/176 (25%), Positives = 78/176 (43%), Gaps = 33/176 (18%)
Query: 4 VWALLLLAAWAAAERDCRVSSF--------RVKENFDKARFSGTWYAMAKKDPEGLFLQD 55 V L+ LA A + +F V+ENFD ++ G WY + +K P +Sbjct: 2 VTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEI-EKIPASFEKGN 60
Query: 56 NIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMV---GTFTDTEDPAKFKMKYWGVASF 112 I A +S+ E G + K + D + V ++ +PAK +++++ + Sbjct: 61 CIQANYSLMENGNIEVLNKEL-----SPDGTMNQVKGEAKQSNVSEPAKLEVQFFPL--- 112
Query: 113 LQKGNDDHWIVDTDYDTYAVQYSCR----LLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC L ++D + ++ R+P LPPE Sbjct: 113 --MPPAPYWILATDYENYALVYSCTTFFWLFHVD------FFWILGRNPY-LPPET 159
![Page 32: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/32.jpg)
PSI-BLAST alignment of RBP and -lactoglobulin: iteration 3
Score = 159 bits (404), Expect = 1e-38Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
![Page 33: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/33.jpg)
Score = 159 bits (404), Expect = 1e-38Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
Score = 46.2 bits (108), Expect = 2e-04Identities = 40/150 (26%), Positives = 70/150 (46%), Gaps = 37/150 (24%)
Query: 27 VKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWDVC 86 V+ENFD ++ G WY + +K P + I A +S+ E G + K ++ Sbjct: 33 VQENFDVKKYLGRWYEI-EKIPASFEKGNCIQANYSLMENGNIEVLNK---------ELS 82
Query: 87 ADMVGTF---------TDTEDPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAVQYSCR 137 D GT ++ +PAK +++++ + +WI+ TDY+ YA+ YSC Sbjct: 83 PD--GTMNQVKGEAKQSNVSEPAKLEVQFFPLMP-----PAPYWILATDYENYALVYSCT 135
Query: 138 ----LLNLDGTCADSYSFVFSRDPNGLPPE 163 L ++D + ++ R+P LPPESbjct: 136 TFFWLFHVD------FFWILGRNPY-LPPE 158
1
3
![Page 34: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/34.jpg)
The universe of lipocalins (each dot is a protein)
retinol-binding protein
odorant-binding protein
apolipoprotein D
![Page 35: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/35.jpg)
Scoring matrices let you focus on the big (or small) picture
retinol-binding protein
![Page 36: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/36.jpg)
Scoring matrices let you focus on the big (or small) picture
retinol-binding proteinretinol-binding
protein
PAM250
PAM30
Blosum45
Blosum80
![Page 37: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/37.jpg)
PSI-BLAST generates scoring matrices more powerful than PAM or BLOSUM
retinol-binding protein
retinol-binding protein
![Page 38: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/38.jpg)
PSI-BLAST-PSI-BLAST is useful to detect weak but biologicallymeaningful relationships between proteins.
-The main source of false positives is the spuriousamplification of sequences not related to the query.
-Once even a single spurious protein is includedin a PSI-BLAST search above threshold, it will notgo away.
Page 144
![Page 39: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/39.jpg)
PSI-BLASTThree approaches to prevent false positive results:
[1] Apply filtering
[2] Adjust E value to a lower value
[3] Visually inspect the output from each iteration. Remove suspicious hits.
Page 144
![Page 40: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/40.jpg)
PHI-BLASTSearching a specific sequence pattern with local alignments surrounding the match.
Page 145
PHI-BLAST may be preferable to just searching for pattern occurrences because it filters out those cases where the pattern occurrence is probably random and not indicative of homology.
EXAMPLE:Search for a short sequence motif in the lipocalin family
![Page 41: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/41.jpg)
PHI-BLAST
Given 1) protein sequence S2) pattern P occurring in S, PHI-BLAST helps answer the question: What other protein sequences both contain an occurrence of P and are homologous to S in the vicinity of the pattern occurrences?
Page 145
![Page 42: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/42.jpg)
1 50ecblc MRLLPLVAAA TAAFLVVACS SPTPPRGVTV VNNFDAKRYL GTWYEIARFD vc MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL GKWYEVARLDhsrbp ~~~MKWVWAL LLLAAWAAAE RDCRVSSFRV KENFDKARFS GTWYAMAKKD
Align three lipocalins (RBP and two bacterial lipocalins)
![Page 43: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/43.jpg)
1 50ecblc MRLLPLVAAA TAAFLVVACS SPTPPRGVTV VNNFDAKRYL GTWYEIARFD vc MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL GKWYEVARLDhsrbp ~~~MKWVWAL LLLAAWAAAE RDCRVSSFRV KENFDKARFS GTWYAMAKKD
GTWYEI K AV M
Concentrate on the conserved region of interest and see which amino acid residues are used
![Page 44: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/44.jpg)
1 50ecblc MRLLPLVAAA TAAFLVVACS SPTPPRGVTV VNNFDAKRYL GTWYEIARFD vc MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL GKWYEVARLDhsrbp ~~~MKWVWAL LLLAAWAAAE RDCRVSSFRV KENFDKARFS GTWYAMAKKD
GTWYEI K AV M
GXW[YF][EA][IVLM]
Create a pattern using the appropriate syntax
![Page 45: Sequence similarity search Glance to the protein world](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6b5503460f94a49954/html5/thumbnails/45.jpg)
Results