advanced blast searching
DESCRIPTION
Advanced BLAST Searching. Courtesy of Jonathan Pevsner Johns Hopkins U. Outline of today’s lecture. Organism-specific BLAST sites Specialized BLAST-related algorithms BLAST-like tools for genomic DNA searches PSI-BLAST PHI-BLAST Find-a-gene. Specialized BLAST servers. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/1.jpg)
Advanced BLAST Searching
Courtesy of Jonathan Pevsner Johns Hopkins U.
![Page 2: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/2.jpg)
Outline of today’s lecture
Organism-specific BLAST sites
Specialized BLAST-related algorithms
BLAST-like tools for genomic DNA searches
PSI-BLAST
PHI-BLAST
Find-a-gene
![Page 3: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/3.jpg)
Specialized BLAST servers
Organism-specific BLAST sites
![Page 4: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/4.jpg)
![Page 5: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/5.jpg)
Ensembl BLAST output includes an ideogram
![Page 6: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/6.jpg)
TIGR BLAST
![Page 7: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/7.jpg)
![Page 8: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/8.jpg)
![Page 9: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/9.jpg)
BLAST output of ProDom server: graphical view of domains
![Page 10: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/10.jpg)
Specialized BLAST servers
Molecule-specific BLAST sites
Species-specific BLAST sites
Specialized algorithms (WU-BLAST 2.0)
![Page 11: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/11.jpg)
![Page 12: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/12.jpg)
![Page 13: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/13.jpg)
FASTA server at the University of Virginia
![Page 14: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/14.jpg)
Conserved Domain Database(CDD)
![Page 15: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/15.jpg)
![Page 16: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/16.jpg)
![Page 17: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/17.jpg)
![Page 18: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/18.jpg)
BLAST-related tools for genomic DNA
The analysis of genomic DNA presents special challenges:• There are exons (protein-coding sequence) and introns (intervening sequences).• There may be sequencing errors or polymorphisms• The comparison may be between related species (e.g. human and mouse)
![Page 19: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/19.jpg)
BLAST-related tools for genomic DNA
Recently developed tools include:
• MegaBLAST at NCBI.
• BLAT (BLAST-like alignment tool). BLAT parses an entire genomic DNA database into words (11mers), then searches them against a query. Thus it is a mirror image of the BLAST strategy. See http://genome.ucsc.edu
• SSAHA at Ensembl uses a similar strategy as BLAT. See http://www.ensembl.org
![Page 20: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/20.jpg)
To access BLAT, visit http://genome.ucsc.edu
![Page 21: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/21.jpg)
Paste DNA or protein sequencehere in the FASTA format
![Page 22: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/22.jpg)
BLAT output includes browser and other formats
![Page 23: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/23.jpg)
Position specific iterated BLAST: PSI-BLAST
The purpose of PSI-BLAST is to look deeperinto the database for matches to your queryprotein sequence by employing a scoringmatrix that is customized to your query.
![Page 24: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/24.jpg)
PSI-BLAST is performed in five steps
[1] Select a query and search it against a protein database
![Page 25: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/25.jpg)
PSI-BLAST is performed in five steps
[1] Select a query and search it against a protein database
[2] PSI-BLAST constructs a multiple sequence alignmentthen creates a “profile” or specialized position-specificscoring matrix (PSSM)
![Page 26: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/26.jpg)
R,I,K C D,E,T K,R,T N,L,Y,G
![Page 27: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/27.jpg)
A R N D C Q E G H I L K M F P S T W Y V 1 M -1 -2 -2 -3 -2 -1 -2 -3 -2 1 2 -2 6 0 -3 -2 -1 -2 -1 1 2 K -1 1 0 1 -4 2 4 -2 0 -3 -3 3 -2 -4 -1 0 -1 -3 -2 -3 3 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 4 V 0 -3 -3 -4 -1 -3 -3 -4 -4 3 1 -3 1 -1 -3 -2 0 -3 -1 4 5 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 6 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 7 L -2 -2 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 1 8 L -1 -3 -3 -4 -1 -3 -3 -4 -3 2 2 -3 1 3 -3 -2 -1 -2 0 3 9 L -1 -3 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 2 10 L -2 -2 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 1 11 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 12 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 13 W -2 -3 -4 -4 -2 -2 -3 -4 -3 1 4 -3 2 1 -3 -3 -2 7 0 0 14 A 3 -2 -1 -2 -1 -1 -2 4 -2 -2 -2 -1 -2 -3 -1 1 -1 -3 -3 -1 15 A 2 -1 0 -1 -2 2 0 2 -1 -3 -3 0 -2 -3 -1 3 0 -3 -2 -2 16 A 4 -2 -1 -2 -1 -1 -1 3 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 -1 ... 37 S 2 -1 0 -1 -1 0 0 0 -1 -2 -3 0 -2 -3 -1 4 1 -3 -2 -2 38 G 0 -3 -1 -2 -3 -2 -2 6 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 39 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -3 -2 0 40 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 41 Y -2 -2 -2 -3 -3 -2 -2 -3 2 -2 -1 -2 -1 3 -3 -2 -2 2 7 -1 42 A 4 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0
![Page 28: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/28.jpg)
A R N D C Q E G H I L K M F P S T W Y V 1 M -1 -2 -2 -3 -2 -1 -2 -3 -2 1 2 -2 6 0 -3 -2 -1 -2 -1 1 2 K -1 1 0 1 -4 2 4 -2 0 -3 -3 3 -2 -4 -1 0 -1 -3 -2 -3 3 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 4 V 0 -3 -3 -4 -1 -3 -3 -4 -4 3 1 -3 1 -1 -3 -2 0 -3 -1 4 5 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 6 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 7 L -2 -2 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 1 8 L -1 -3 -3 -4 -1 -3 -3 -4 -3 2 2 -3 1 3 -3 -2 -1 -2 0 3 9 L -1 -3 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 2 10 L -2 -2 -4 -4 -1 -2 -3 -4 -3 2 4 -3 2 0 -3 -3 -1 -2 -1 1 11 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 12 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 13 W -2 -3 -4 -4 -2 -2 -3 -4 -3 1 4 -3 2 1 -3 -3 -2 7 0 0 14 A 3 -2 -1 -2 -1 -1 -2 4 -2 -2 -2 -1 -2 -3 -1 1 -1 -3 -3 -1 15 A 2 -1 0 -1 -2 2 0 2 -1 -3 -3 0 -2 -3 -1 3 0 -3 -2 -2 16 A 4 -2 -1 -2 -1 -1 -1 3 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 -1 ... 37 S 2 -1 0 -1 -1 0 0 0 -1 -2 -3 0 -2 -3 -1 4 1 -3 -2 -2 38 G 0 -3 -1 -2 -3 -2 -2 6 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 39 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -3 -2 0 40 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 41 Y -2 -2 -2 -3 -3 -2 -2 -3 2 -2 -1 -2 -1 3 -3 -2 -2 2 7 -1 42 A 4 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0
![Page 29: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/29.jpg)
PSI-BLAST is performed in five steps
[1] Select a query and search it against a protein database
[2] PSI-BLAST constructs a multiple sequence alignmentthen creates a “profile” or specialized position-specificscoring matrix (PSSM)
[3] The PSSM is used as a query against the database
[4] PSI-BLAST estimates statistical significance (E values)
![Page 30: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/30.jpg)
![Page 31: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/31.jpg)
PSI-BLAST is performed in five steps
[1] Select a query and search it against a protein database
[2] PSI-BLAST constructs a multiple sequence alignmentthen creates a “profile” or specialized position-specificscoring matrix (PSSM)
[3] The PSSM is used as a query against the database
[4] PSI-BLAST estimates statistical significance (E values)
[5] Repeat steps [3] and [4] iteratively, typically 5 times.At each new search, a new profile is used as the query.
![Page 32: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/32.jpg)
Results of a PSI-BLAST search
# hitsIteration # hits > threshold
1 104 492 173 963 236 1784 301 2405 344 2836 342 2987 378 3108 382 320
![Page 33: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/33.jpg)
Score = 46.2 bits (108), Expect = 2e-04Identities = 40/150 (26%), Positives = 70/150 (46%), Gaps = 37/150 (24%)
Query: 27 VKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWDVC 86 V+ENFD ++ G WY + +K P + I A +S+ E G + K ++ Sbjct: 33 VQENFDVKKYLGRWYEI-EKIPASFEKGNCIQANYSLMENGNIEVLNK---------ELS 82
Query: 87 ADMVGTF---------TDTEDPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAVQYSCR 137 D GT ++ +PAK +++++ + +WI+ TDY+ YA+ YSC Sbjct: 83 PD--GTMNQVKGEAKQSNVSEPAKLEVQFFPLMP-----PAPYWILATDYENYALVYSCT 135
Query: 138 ----LLNLDGTCADSYSFVFSRDPNGLPPE 163 L ++D + ++ R+P LPPESbjct: 136 TFFWLFHVD------FFWILGRNPY-LPPE 158
PSI-BLAST alignment of RBP and -lactoglobulin: iteration 1
![Page 34: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/34.jpg)
PSI-BLAST alignment of RBP and -lactoglobulin: iteration 2
Score = 140 bits (353), Expect = 1e-32Identities = 45/176 (25%), Positives = 78/176 (43%), Gaps = 33/176 (18%)
Query: 4 VWALLLLAAWAAAERDCRVSSF--------RVKENFDKARFSGTWYAMAKKDPEGLFLQD 55 V L+ LA A + +F V+ENFD ++ G WY + +K P +Sbjct: 2 VTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEI-EKIPASFEKGN 60
Query: 56 NIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMV---GTFTDTEDPAKFKMKYWGVASF 112 I A +S+ E G + K + D + V ++ +PAK +++++ + Sbjct: 61 CIQANYSLMENGNIEVLNKEL-----SPDGTMNQVKGEAKQSNVSEPAKLEVQFFPL--- 112
Query: 113 LQKGNDDHWIVDTDYDTYAVQYSCR----LLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC L ++D + ++ R+P LPPE Sbjct: 113 --MPPAPYWILATDYENYALVYSCTTFFWLFHVD------FFWILGRNPY-LPPET 159
![Page 35: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/35.jpg)
PSI-BLAST alignment of RBP and -lactoglobulin: iteration 3
Score = 159 bits (404), Expect = 1e-38Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
![Page 36: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/36.jpg)
Score = 159 bits (404), Expect = 1e-38Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
Score = 46.2 bits (108), Expect = 2e-04Identities = 40/150 (26%), Positives = 70/150 (46%), Gaps = 37/150 (24%)
Query: 27 VKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWDVC 86 V+ENFD ++ G WY + +K P + I A +S+ E G + K ++ Sbjct: 33 VQENFDVKKYLGRWYEI-EKIPASFEKGNCIQANYSLMENGNIEVLNK---------ELS 82
Query: 87 ADMVGTF---------TDTEDPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAVQYSCR 137 D GT ++ +PAK +++++ + +WI+ TDY+ YA+ YSC Sbjct: 83 PD--GTMNQVKGEAKQSNVSEPAKLEVQFFPLMP-----PAPYWILATDYENYALVYSCT 135
Query: 138 ----LLNLDGTCADSYSFVFSRDPNGLPPE 163 L ++D + ++ R+P LPPESbjct: 136 TFFWLFHVD------FFWILGRNPY-LPPE 158
1
3
![Page 37: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/37.jpg)
The universe of lipocalins (each dot is a protein)
retinol-binding protein
odorant-binding protein
apolipoprotein D
![Page 38: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/38.jpg)
Scoring matrices let you focus on the big (or small) picture
retinol-binding protein
your RBP query
![Page 39: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/39.jpg)
Scoring matrices let you focus on the big (or small) picture
retinol-binding proteinretinol-binding
protein
PAM250
PAM30
Blosum45
Blosum80
![Page 40: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/40.jpg)
PSI-BLAST generates scoring matrices more powerful than PAM or BLOSUM
retinol-binding protein
retinol-binding protein
![Page 41: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/41.jpg)
PSI-BLAST: performance assessment
Evaluate PSI-BLAST results using a database in which protein structures have been solved and allproteins in a group share < 40% amino acid identity.
![Page 42: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/42.jpg)
PSI-BLAST: the problem of corruption
PSI-BLAST is useful to detect weak but biologicallymeaningful relationships between proteins.
The main source of false positives is the spuriousamplification of sequences not related to the query.For instance, a query with a coiled-coil motif maydetect thousands of other proteins with this motifthat are not homologous.
Once even a single spurious protein is includedin a PSI-BLAST search above threshold, it will notgo away.
![Page 43: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/43.jpg)
PSI-BLAST: the problem of corruption
Corruption is defined as the presence of at least onefalse positive alignment with an E value < 10-4
after five iterations.
Three approaches to stopping corruption:
[1] Apply filtering of biased composition regions
[2] Adjust E value from 0.001 (default) to a lower value such as E = 0.0001.
[3] Visually inspect the output from each iteration. Remove suspicious hits by unchecking the box.
![Page 44: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/44.jpg)
![Page 45: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/45.jpg)
See problem 5-1 (page 153): create an artificial proteinconsisting of RBP4 and protein kinase C. How doesPSI-BLAST perform?
![Page 46: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/46.jpg)
![Page 47: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/47.jpg)
PHI-BLAST: Pattern hit initiated BLAST
Launches from the same page as PSI-BLAST
Combines matching of regular expressions with local alignments surrounding the match.
![Page 48: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/48.jpg)
PHI-BLAST: Pattern hit intiated BLAST
Launches from the same page as PSI-BLAST
Combines matching of regular expressions with local alignments surrounding the match.
Given a protein sequence S and a regular expression pattern P occurring in S, PHI-BLAST helps answer the question: What other protein sequences both contain an occurrence of P and are homologous to S in the vicinity of the pattern occurrences? PHI-BLAST may be preferable to just searching for pattern occurrences because it filters out those cases where the pattern occurrence is probably random and not indicative of homology.
![Page 49: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/49.jpg)
1 50ecblc MRLLPLVAAA TAAFLVVACS SPTPPRGVTV VNNFDAKRYL GTWYEIARFD vc MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL GKWYEVARLDhsrbp ~~~MKWVWAL LLLAAWAAAE RDCRVSSFRV KENFDKARFS GTWYAMAKKD
Align three lipocalins (RBP and two bacterial lipocalins)
![Page 50: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/50.jpg)
1 50ecblc MRLLPLVAAA TAAFLVVACS SPTPPRGVTV VNNFDAKRYL GTWYEIARFD vc MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL GKWYEVARLDhsrbp ~~~MKWVWAL LLLAAWAAAE RDCRVSSFRV KENFDKARFS GTWYAMAKKD
GTWYEI K AV M
Pick a small, conserved region and see which amino acidresidues are used
![Page 51: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/51.jpg)
1 50ecblc MRLLPLVAAA TAAFLVVACS SPTPPRGVTV VNNFDAKRYL GTWYEIARFD vc MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL GKWYEVARLDhsrbp ~~~MKWVWAL LLLAAWAAAE RDCRVSSFRV KENFDKARFS GTWYAMAKKD
GTWYEI K AV M
GXW[YF][EA][IVLM]
Create a pattern using the appropriate syntax
![Page 52: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/52.jpg)
![Page 53: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/53.jpg)
![Page 54: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/54.jpg)
Syntax rules for PHI-BLAST
The syntax for patterns in PHI-BLAST follows the conventions of PROSITE (protein lecture, Chapter 8).
When using the stand-alone program, it is permissible to have multiple patterns. When using the Web-page only one pattern is allowed per query.
[ ] means any one of the characters enclosed in the bracketse.g., [LFYT] means one occurrence of L or F or Y or T
- means nothing (spacer character)
x(5) means 5 positions in which any residue is allowed
x(2,4) means 2 to 4 positions where any residue is allowed
![Page 55: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/55.jpg)
BLAST for gene discovery
You can use BLAST to find a “novel” gene
![Page 56: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/56.jpg)
Start with the sequence of a known protein
Search a DNA database(e.g. HTGS, dbEST, or genomic sequencefrom a specific organism)
tblastn
Search your DNA or protein against a protein database (nr) to confirm you haveidentified a novel gene
blastxor
blastpnr
Find matches…[1] to DNA encoding known proteins[2] to DNA encoding related (novel!) proteins[3] to false positives
inspect
![Page 57: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/57.jpg)
![Page 58: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/58.jpg)
![Page 59: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/59.jpg)
![Page 60: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/60.jpg)
this is a good candidatefor a novel gene/protein
![Page 61: Advanced BLAST Searching](https://reader035.vdocument.in/reader035/viewer/2022062409/56814c47550346895db94af6/html5/thumbnails/61.jpg)
A blastp nr search confirms thatthe Salmonella query is closelyrelated to other lipocalins