tweaking blast although you normally see blast as a web page with boxes to place data in and tick...
TRANSCRIPT
Tweaking BLASTAlthough you normally see BLAST as a web page with boxes to place data in and tick boxes etc it is actually a command line program that can be running just by typing the right command and options eg
gtblastall ndashp blastn ndashI my_sequencefasta ndashd refseq
Which is the simplest form where the basic program lsquoblastallrsquo takes a number of different options or parameters indicated by the ndashx and followed by its value -p ltwhich blast flavour to rungt-I ltfile with query sequence ingt-d ltpre-indexed database namegt
There are many other parameters and if not listed explicitly will use a default value most appropriate to the blast flavour requested Eg for ndashW ltword sizegt blastn uses ndashW 11 where blastx uses ndashW 3
There are also some options that appear on the web pages that which are not really parameters but manage the job in some way One of the most useful of these is on the NCBI blast pages where you can use Entrez queries or pick from an organism list to modify your search
The Many Parameters of BLASTThere are almost literally hundreds of parameters but most are way too obscure even for die-hard techies like me Very few of them are regularly useful in any but their default value but just occasionally they are very necessary
Here are some of the ones that I have used
-e max expected value -m output format (graphical or tabularspreadsheet)-F filter query sequence for low complexity (default TRUE)-U use only upper case regions of query (default FALSE)-G gap opening cost-E gap extension cost-q nucleotide mismatch penalty (BLASTx uses matrices)-r nucleotide match reward-b number of matching sequences to report-g allow gaps (default TRUE)-W word size-z effective database size (removes effect of actual database size)-S query strands to search (default both directions)-l restrict database sequences to given list of lsquogilsquo numbers
BLAST Parameters Exercises1 BLASTn vs BLASTp
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtblastn-vs-blastp and go to the NCBI BLAST Home PageThis is a Xenopus tropicalis cDNA sequence
Go to NUCLEOTIDE BLAST sectionRun BLASTn against the nr nucleotide database using all default optionsThen hit [format] to wait for the results in a new page
Now repeat but go to the TRANSLATED BLAST section and BLAST against the nr protein database using BLASTx
How might the different results help us view the presence of this gene in other vertebrates
BLAST Parameters Exercises2 Low complexity filtering
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtlow-complexity-filtering-A and go to the NCBI BLAST Home Page
Go to the TRANSLATED BLAST section BLASTxCarefully UNTICK the ldquoChoose filter [ ] Low complexityrdquo BOX in the second section And then run BLASTx against the nr database
What do you feel about these alignmentsRe-run but leave the low-complexity filter ON this timeDoes this change our view of the protein matches
Now continue with gtlow-complexity-filtering-B and ndashCC is an especially interesting case ndash what can we deduce about the cDNA sequence Annotators beware
BLAST Parameters Exercises1 BLASTn vs tBLASTx and nucleotide mismatch penalties
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlAlso open the NCBI BLAST Home Page and go to the SPECIAL ndash Align two sequences section
There are several Xenopus tropicalis cyclinsCopy the sequence gtcyclin-A1-Xt to the Sequence 1 windowCopy the sequence gtcyclin-A2-Xt to the Sequence 2 windowRun the default comparison should be BLASTn Note the alignment
Now run again using tBLASTx ndash what does this do to our understanding of the relationship between these two sequences Are they homologs orthologs or paralogs ndash or none of these
Revert to BLASTn and try varying the values for mismatch penalties and gapping ndash start by reducing the mismatch penalty to -1Can we learn anything from this
Now repeat the first parts of the exercise with cyclin-D1 in place of cyclin-A2hellip
BLAST Parameters Exercises4 Limit Entrez query
Entrez queries can be used in the NCBI BLAST web page to restrict the search to more specific items For instance to find only matching in fruit fly proteins enter lsquoDrosophila melanogaster[ORGN]rsquo in the Limit by entrez query box in the second section (you can also select the organism from the adjacent drop-down list) To combine items use logical AND OR or NOT
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtcyclin-D1-Xt and go to the NCBI BLAST Home PageGo to the TRANSLATED BLAST section BLASTx and paste the sequence
Use an Entrez query to find all rodent sequences (rat and mouse) with a good match to cyclin-D1 At what E-value do we expect we are no longer looking at cyclins Try running the search again with that E-value as a limithellip
BLAST Parameters Exercises5 Word Size
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtmorpholino go to the NCBI BLAST Home PageGo to the NUCLEOTIDE BLAST section BLASTn and paste the sequence
Check OFF the low complexity filter and then run the search
Now re-run the search setting the following parameters
Low complexity OFFExpect 100Word Size 7Other advanced -q-1 (mismatch penalty -1 instead of default -3)
What difference does this make
- Tweaking BLAST
- The Many Parameters of BLAST
- Slide 3
- BLAST Parameters Exercises
- Slide 5
- Slide 6
- Slide 7
- Slide 8
-
The Many Parameters of BLASTThere are almost literally hundreds of parameters but most are way too obscure even for die-hard techies like me Very few of them are regularly useful in any but their default value but just occasionally they are very necessary
Here are some of the ones that I have used
-e max expected value -m output format (graphical or tabularspreadsheet)-F filter query sequence for low complexity (default TRUE)-U use only upper case regions of query (default FALSE)-G gap opening cost-E gap extension cost-q nucleotide mismatch penalty (BLASTx uses matrices)-r nucleotide match reward-b number of matching sequences to report-g allow gaps (default TRUE)-W word size-z effective database size (removes effect of actual database size)-S query strands to search (default both directions)-l restrict database sequences to given list of lsquogilsquo numbers
BLAST Parameters Exercises1 BLASTn vs BLASTp
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtblastn-vs-blastp and go to the NCBI BLAST Home PageThis is a Xenopus tropicalis cDNA sequence
Go to NUCLEOTIDE BLAST sectionRun BLASTn against the nr nucleotide database using all default optionsThen hit [format] to wait for the results in a new page
Now repeat but go to the TRANSLATED BLAST section and BLAST against the nr protein database using BLASTx
How might the different results help us view the presence of this gene in other vertebrates
BLAST Parameters Exercises2 Low complexity filtering
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtlow-complexity-filtering-A and go to the NCBI BLAST Home Page
Go to the TRANSLATED BLAST section BLASTxCarefully UNTICK the ldquoChoose filter [ ] Low complexityrdquo BOX in the second section And then run BLASTx against the nr database
What do you feel about these alignmentsRe-run but leave the low-complexity filter ON this timeDoes this change our view of the protein matches
Now continue with gtlow-complexity-filtering-B and ndashCC is an especially interesting case ndash what can we deduce about the cDNA sequence Annotators beware
BLAST Parameters Exercises1 BLASTn vs tBLASTx and nucleotide mismatch penalties
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlAlso open the NCBI BLAST Home Page and go to the SPECIAL ndash Align two sequences section
There are several Xenopus tropicalis cyclinsCopy the sequence gtcyclin-A1-Xt to the Sequence 1 windowCopy the sequence gtcyclin-A2-Xt to the Sequence 2 windowRun the default comparison should be BLASTn Note the alignment
Now run again using tBLASTx ndash what does this do to our understanding of the relationship between these two sequences Are they homologs orthologs or paralogs ndash or none of these
Revert to BLASTn and try varying the values for mismatch penalties and gapping ndash start by reducing the mismatch penalty to -1Can we learn anything from this
Now repeat the first parts of the exercise with cyclin-D1 in place of cyclin-A2hellip
BLAST Parameters Exercises4 Limit Entrez query
Entrez queries can be used in the NCBI BLAST web page to restrict the search to more specific items For instance to find only matching in fruit fly proteins enter lsquoDrosophila melanogaster[ORGN]rsquo in the Limit by entrez query box in the second section (you can also select the organism from the adjacent drop-down list) To combine items use logical AND OR or NOT
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtcyclin-D1-Xt and go to the NCBI BLAST Home PageGo to the TRANSLATED BLAST section BLASTx and paste the sequence
Use an Entrez query to find all rodent sequences (rat and mouse) with a good match to cyclin-D1 At what E-value do we expect we are no longer looking at cyclins Try running the search again with that E-value as a limithellip
BLAST Parameters Exercises5 Word Size
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtmorpholino go to the NCBI BLAST Home PageGo to the NUCLEOTIDE BLAST section BLASTn and paste the sequence
Check OFF the low complexity filter and then run the search
Now re-run the search setting the following parameters
Low complexity OFFExpect 100Word Size 7Other advanced -q-1 (mismatch penalty -1 instead of default -3)
What difference does this make
- Tweaking BLAST
- The Many Parameters of BLAST
- Slide 3
- BLAST Parameters Exercises
- Slide 5
- Slide 6
- Slide 7
- Slide 8
-
BLAST Parameters Exercises1 BLASTn vs BLASTp
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtblastn-vs-blastp and go to the NCBI BLAST Home PageThis is a Xenopus tropicalis cDNA sequence
Go to NUCLEOTIDE BLAST sectionRun BLASTn against the nr nucleotide database using all default optionsThen hit [format] to wait for the results in a new page
Now repeat but go to the TRANSLATED BLAST section and BLAST against the nr protein database using BLASTx
How might the different results help us view the presence of this gene in other vertebrates
BLAST Parameters Exercises2 Low complexity filtering
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtlow-complexity-filtering-A and go to the NCBI BLAST Home Page
Go to the TRANSLATED BLAST section BLASTxCarefully UNTICK the ldquoChoose filter [ ] Low complexityrdquo BOX in the second section And then run BLASTx against the nr database
What do you feel about these alignmentsRe-run but leave the low-complexity filter ON this timeDoes this change our view of the protein matches
Now continue with gtlow-complexity-filtering-B and ndashCC is an especially interesting case ndash what can we deduce about the cDNA sequence Annotators beware
BLAST Parameters Exercises1 BLASTn vs tBLASTx and nucleotide mismatch penalties
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlAlso open the NCBI BLAST Home Page and go to the SPECIAL ndash Align two sequences section
There are several Xenopus tropicalis cyclinsCopy the sequence gtcyclin-A1-Xt to the Sequence 1 windowCopy the sequence gtcyclin-A2-Xt to the Sequence 2 windowRun the default comparison should be BLASTn Note the alignment
Now run again using tBLASTx ndash what does this do to our understanding of the relationship between these two sequences Are they homologs orthologs or paralogs ndash or none of these
Revert to BLASTn and try varying the values for mismatch penalties and gapping ndash start by reducing the mismatch penalty to -1Can we learn anything from this
Now repeat the first parts of the exercise with cyclin-D1 in place of cyclin-A2hellip
BLAST Parameters Exercises4 Limit Entrez query
Entrez queries can be used in the NCBI BLAST web page to restrict the search to more specific items For instance to find only matching in fruit fly proteins enter lsquoDrosophila melanogaster[ORGN]rsquo in the Limit by entrez query box in the second section (you can also select the organism from the adjacent drop-down list) To combine items use logical AND OR or NOT
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtcyclin-D1-Xt and go to the NCBI BLAST Home PageGo to the TRANSLATED BLAST section BLASTx and paste the sequence
Use an Entrez query to find all rodent sequences (rat and mouse) with a good match to cyclin-D1 At what E-value do we expect we are no longer looking at cyclins Try running the search again with that E-value as a limithellip
BLAST Parameters Exercises5 Word Size
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtmorpholino go to the NCBI BLAST Home PageGo to the NUCLEOTIDE BLAST section BLASTn and paste the sequence
Check OFF the low complexity filter and then run the search
Now re-run the search setting the following parameters
Low complexity OFFExpect 100Word Size 7Other advanced -q-1 (mismatch penalty -1 instead of default -3)
What difference does this make
- Tweaking BLAST
- The Many Parameters of BLAST
- Slide 3
- BLAST Parameters Exercises
- Slide 5
- Slide 6
- Slide 7
- Slide 8
-
BLAST Parameters Exercises2 Low complexity filtering
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtlow-complexity-filtering-A and go to the NCBI BLAST Home Page
Go to the TRANSLATED BLAST section BLASTxCarefully UNTICK the ldquoChoose filter [ ] Low complexityrdquo BOX in the second section And then run BLASTx against the nr database
What do you feel about these alignmentsRe-run but leave the low-complexity filter ON this timeDoes this change our view of the protein matches
Now continue with gtlow-complexity-filtering-B and ndashCC is an especially interesting case ndash what can we deduce about the cDNA sequence Annotators beware
BLAST Parameters Exercises1 BLASTn vs tBLASTx and nucleotide mismatch penalties
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlAlso open the NCBI BLAST Home Page and go to the SPECIAL ndash Align two sequences section
There are several Xenopus tropicalis cyclinsCopy the sequence gtcyclin-A1-Xt to the Sequence 1 windowCopy the sequence gtcyclin-A2-Xt to the Sequence 2 windowRun the default comparison should be BLASTn Note the alignment
Now run again using tBLASTx ndash what does this do to our understanding of the relationship between these two sequences Are they homologs orthologs or paralogs ndash or none of these
Revert to BLASTn and try varying the values for mismatch penalties and gapping ndash start by reducing the mismatch penalty to -1Can we learn anything from this
Now repeat the first parts of the exercise with cyclin-D1 in place of cyclin-A2hellip
BLAST Parameters Exercises4 Limit Entrez query
Entrez queries can be used in the NCBI BLAST web page to restrict the search to more specific items For instance to find only matching in fruit fly proteins enter lsquoDrosophila melanogaster[ORGN]rsquo in the Limit by entrez query box in the second section (you can also select the organism from the adjacent drop-down list) To combine items use logical AND OR or NOT
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtcyclin-D1-Xt and go to the NCBI BLAST Home PageGo to the TRANSLATED BLAST section BLASTx and paste the sequence
Use an Entrez query to find all rodent sequences (rat and mouse) with a good match to cyclin-D1 At what E-value do we expect we are no longer looking at cyclins Try running the search again with that E-value as a limithellip
BLAST Parameters Exercises5 Word Size
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtmorpholino go to the NCBI BLAST Home PageGo to the NUCLEOTIDE BLAST section BLASTn and paste the sequence
Check OFF the low complexity filter and then run the search
Now re-run the search setting the following parameters
Low complexity OFFExpect 100Word Size 7Other advanced -q-1 (mismatch penalty -1 instead of default -3)
What difference does this make
- Tweaking BLAST
- The Many Parameters of BLAST
- Slide 3
- BLAST Parameters Exercises
- Slide 5
- Slide 6
- Slide 7
- Slide 8
-
BLAST Parameters Exercises1 BLASTn vs tBLASTx and nucleotide mismatch penalties
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlAlso open the NCBI BLAST Home Page and go to the SPECIAL ndash Align two sequences section
There are several Xenopus tropicalis cyclinsCopy the sequence gtcyclin-A1-Xt to the Sequence 1 windowCopy the sequence gtcyclin-A2-Xt to the Sequence 2 windowRun the default comparison should be BLASTn Note the alignment
Now run again using tBLASTx ndash what does this do to our understanding of the relationship between these two sequences Are they homologs orthologs or paralogs ndash or none of these
Revert to BLASTn and try varying the values for mismatch penalties and gapping ndash start by reducing the mismatch penalty to -1Can we learn anything from this
Now repeat the first parts of the exercise with cyclin-D1 in place of cyclin-A2hellip
BLAST Parameters Exercises4 Limit Entrez query
Entrez queries can be used in the NCBI BLAST web page to restrict the search to more specific items For instance to find only matching in fruit fly proteins enter lsquoDrosophila melanogaster[ORGN]rsquo in the Limit by entrez query box in the second section (you can also select the organism from the adjacent drop-down list) To combine items use logical AND OR or NOT
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtcyclin-D1-Xt and go to the NCBI BLAST Home PageGo to the TRANSLATED BLAST section BLASTx and paste the sequence
Use an Entrez query to find all rodent sequences (rat and mouse) with a good match to cyclin-D1 At what E-value do we expect we are no longer looking at cyclins Try running the search again with that E-value as a limithellip
BLAST Parameters Exercises5 Word Size
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtmorpholino go to the NCBI BLAST Home PageGo to the NUCLEOTIDE BLAST section BLASTn and paste the sequence
Check OFF the low complexity filter and then run the search
Now re-run the search setting the following parameters
Low complexity OFFExpect 100Word Size 7Other advanced -q-1 (mismatch penalty -1 instead of default -3)
What difference does this make
- Tweaking BLAST
- The Many Parameters of BLAST
- Slide 3
- BLAST Parameters Exercises
- Slide 5
- Slide 6
- Slide 7
- Slide 8
-
BLAST Parameters Exercises4 Limit Entrez query
Entrez queries can be used in the NCBI BLAST web page to restrict the search to more specific items For instance to find only matching in fruit fly proteins enter lsquoDrosophila melanogaster[ORGN]rsquo in the Limit by entrez query box in the second section (you can also select the organism from the adjacent drop-down list) To combine items use logical AND OR or NOT
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtcyclin-D1-Xt and go to the NCBI BLAST Home PageGo to the TRANSLATED BLAST section BLASTx and paste the sequence
Use an Entrez query to find all rodent sequences (rat and mouse) with a good match to cyclin-D1 At what E-value do we expect we are no longer looking at cyclins Try running the search again with that E-value as a limithellip
BLAST Parameters Exercises5 Word Size
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtmorpholino go to the NCBI BLAST Home PageGo to the NUCLEOTIDE BLAST section BLASTn and paste the sequence
Check OFF the low complexity filter and then run the search
Now re-run the search setting the following parameters
Low complexity OFFExpect 100Word Size 7Other advanced -q-1 (mismatch penalty -1 instead of default -3)
What difference does this make
- Tweaking BLAST
- The Many Parameters of BLAST
- Slide 3
- BLAST Parameters Exercises
- Slide 5
- Slide 6
- Slide 7
- Slide 8
-
BLAST Parameters Exercises5 Word Size
Go to informaticsgurdoncamacukonlineworkshopsuseful-web-siteshtmlOpen blast-parameter-sequenceshtmlCopy the sequence gtmorpholino go to the NCBI BLAST Home PageGo to the NUCLEOTIDE BLAST section BLASTn and paste the sequence
Check OFF the low complexity filter and then run the search
Now re-run the search setting the following parameters
Low complexity OFFExpect 100Word Size 7Other advanced -q-1 (mismatch penalty -1 instead of default -3)
What difference does this make
- Tweaking BLAST
- The Many Parameters of BLAST
- Slide 3
- BLAST Parameters Exercises
- Slide 5
- Slide 6
- Slide 7
- Slide 8
-