exclude blast hits - university at...

7
1 What do I do if my blast searches seem to have all the top hits from the same genus or species? If the bacterial species you are using to annotate is clinically significant or of great research interest, you may find that when you perform blast searches (particularly in nr) that you seemingly only get hits that are different strains or isolates of the same species. This obviously doesn’t give you much information about how well conserved the protein on which you are working is compared to proteins in other genera. There is a method to modify blast to let you exclude such hits from your searches. I will use a gene from Clostridium botulinum as an example to illustrate this using the protein sequence of the gene with the locus tag CLJ_B3418. Figure 1 shows the top nr blast hits for this protein. You can easily see that all of the hits but one are from Clostridium botulinum with very high levels of coverage and identities. They are essentially all the same protein from different isolates of Clostridium botulinum. Figure 1. The blast results using a non-filtered nr blast search for CLJ_B3418. The blast search can be set up slightly differently to prevent this problem from occurring. As noted in figure 2, we can set the search up to exclude, in this case, the taxid: 1485 (Clostridium). The taxid number stands for the NCBI Taxonomy ID number. By excluding the taxid number 1485, all blast hits in that taxonomic classification will not be included. To do this we type the genus name Clostridium in the Organism textbox below the sequence input box. As you type a pulldown menu of options will appear which you can subsequently just click on to

Upload: others

Post on 20-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exclude blast hits - University at Buffaloubwp.buffalo.edu/.../uploads/sites/5/2017/02/Exclude_blast_hits.pdf · exclude such hits from your searches. I will use a gene from Clostridium

1

WhatdoIdoifmyblastsearchesseemtohaveallthetophitsfromthesamegenusorspecies?Ifthebacterialspeciesyouareusingtoannotateisclinicallysignificantorofgreatresearchinterest,youmayfindthatwhenyouperformblastsearches(particularlyinnr)thatyouseeminglyonlygethitsthataredifferentstrainsorisolatesofthesamespecies.Thisobviouslydoesn’tgiveyoumuchinformationabouthowwellconservedtheproteinonwhichyouareworkingiscomparedtoproteinsinothergenera.Thereisamethodtomodifyblasttoletyouexcludesuchhitsfromyoursearches.IwilluseagenefromClostridiumbotulinumasanexampletoillustratethisusingtheproteinsequenceofthegenewiththelocustagCLJ_B3418.Figure1showsthetopnrblasthitsforthisprotein.YoucaneasilyseethatallofthehitsbutonearefromClostridiumbotulinumwithveryhighlevelsofcoverageandidentities.TheyareessentiallyallthesameproteinfromdifferentisolatesofClostridiumbotulinum.

Figure1.Theblastresultsusinganon-filterednrblastsearchforCLJ_B3418.Theblastsearchcanbesetupslightlydifferentlytopreventthisproblemfromoccurring.Asnotedinfigure2,wecansetthesearchuptoexclude,inthiscase,thetaxid:1485(Clostridium).ThetaxidnumberstandsfortheNCBITaxonomyIDnumber.Byexcludingthetaxidnumber1485,allblasthitsinthattaxonomicclassificationwillnotbeincluded.TodothiswetypethegenusnameClostridiumintheOrganismtextboxbelowthesequenceinputbox.Asyoutypeapulldownmenuofoptionswillappearwhichyoucansubsequentlyjustclickonto

Page 2: Exclude blast hits - University at Buffaloubwp.buffalo.edu/.../uploads/sites/5/2017/02/Exclude_blast_hits.pdf · exclude such hits from your searches. I will use a gene from Clostridium

2

select(seehighlightedmenuiteminFigure2).ThesimplyclicktheExcludecheckboxnexttotheorganismnameandthenselectblast.

Figure2.SettingupablastsearchtoexcludetheCostridiumtaxid:1485.Figure3showstheresultsoftheblastresultforthesameproteinAFTERexcludingtheClostridiumtaxid1485.Notethedifferentnamesappearinginthesearchresults.However,alsonotethatthetophitisnolongertheonethatmatchestheproteinunderinvestigationinthespeciesyouareworkingon.ThusyouwouldtaketheFIRSTnrhitasthetophitinthiscaseinsteadofskippingoverthefirstone.NotealsothatifyouusethisblastresulttoselectsequencesfortheT-CoffeealignmentthatyouwillsubsequentlydointheSequenceBasedSimilarityModule,thatyouwillneedtoaddtheFASTAformattedsequenceoftheproteinunderinvestigationtothetopofthelistbeforeconstructingthealignment.Studentsshouldalsoaddacommentintheirtextbookofwhichtaxidnumberwasexcludedfromtheirsearch.Experimentwithdifferentlevelsofexclusion(onlyonespecies)oraddmultipleoptionsforexclusion(i.e.,thegenus)orsomewhereinbetween(differentspecificspeciesexcludedbyaddingadditionalorganismboxesinwhichtoenterchoicesbyusingthe+optiontotherightoftheexcludecheckboxtoaddanother).

Page 3: Exclude blast hits - University at Buffaloubwp.buffalo.edu/.../uploads/sites/5/2017/02/Exclude_blast_hits.pdf · exclude such hits from your searches. I will use a gene from Clostridium

3

Figure3.ThenrblastresultsforCLJ_B3418AFTERexcludingtheCostridiumtaxid:1485.Notethedifferentgenusandspeciesnamesofthetophits.YoucanalsousetheNCBITaxonomyBrowsertofinddifferentlevelsoftaxidtouseinyourexclusionsearches,especiallyifitisnotclearwhatyoushouldchoosefromthepulldownmenuinBLAST.TheuseoftheTaxonomyBrowserisdescribedingeneraltermsintheHorizontalGeneTransfersectionoftheprojectmanual.Briefly,goto:https://www.ncbi.nlm.nih.gov/taxonomyandenterthenameofyourorganism’sgenusinthesearchwindow(Clostridiumbotulinumistheexampleusedbelow)andclickonSearch.AresultsimilartoFigure4willdisplay.Clickontheorganismhyperlinkinblue,andyouwillbetakentothefulllineageoftheorganism(nextfigure).

Figure4.ResultsofsearchingforClostridiumbotulinumintheNCBITaxonomybrowser.

Page 4: Exclude blast hits - University at Buffaloubwp.buffalo.edu/.../uploads/sites/5/2017/02/Exclude_blast_hits.pdf · exclude such hits from your searches. I will use a gene from Clostridium

4

Figure5belowshowsaportionoftheC.botulinumresults.Inthelineageline,thelastentryisthegenus(Clostridum),butyoucanhoverthecursoroveranyofthelevelsoftaxonomyandseethenameofthelevel(i.e.,family,orderetc.).Figure6showswhatwilldisplaywhentheClostridiumhyperlinkisselected.

Figure5.TheClostridiumbotulinumresultsfromNCBITaxonomybrowser(notcomplete).OfinterestintheresultsfromclickingontheClostridiumhyperlinkdisplayeddisplayedinFigure6istheTaxonomyIDof1485(exactlytheonewefoundbylimitingtheBLASTresultsfromwithintheBLASTtool).Youcouldusethisinformationtosimplytype“Clostridium(taxid:1485)”–donot,however,includethequotationmarks-intheorganismwindowoftheBLASTsearchandclickexcludeasbefore.

Figure6.TheClostridiumgenustaxidinformation.

Page 5: Exclude blast hits - University at Buffaloubwp.buffalo.edu/.../uploads/sites/5/2017/02/Exclude_blast_hits.pdf · exclude such hits from your searches. I will use a gene from Clostridium

5

Wecanalsogofurther“up”intaxonomicwindowtoexcludemorethanonegenus(thoughyoushouldnothavetodothatroutinely).Forexample,Figure7showsthedisplaythatwouldcomeupifweclickedontheClostridiaceae(i.e.,thefamilytowhichthegenusClostridumbelongs)hyperlinkinsteadoftheClostridiumhyperlink.Differentgenerawillappearthatarepartofthisfamily.ClickingontheClostridiaceaelinkfromthispagewillresultintheinformationshownintheFigure8.

Figure7.ThedisplayresultingfromselectionoftheFamilyClostridiaceaeinthetaxonomybrowser.

Page 6: Exclude blast hits - University at Buffaloubwp.buffalo.edu/.../uploads/sites/5/2017/02/Exclude_blast_hits.pdf · exclude such hits from your searches. I will use a gene from Clostridium

6

HereweseethattheClostridiaceaefamilyhastheTaxonomyIDof31979.ToexcludethisFamilyfromtheBLASTresults,wewouldsimplytypein“Clostridiaceae(taxid:31979)”intotheorganismboxintheBLASTsearchandclickexclude.Thenextimagewillshowhowtheautofilloptionwillhighlightoncewepasteinthetaxid.

Figure8.ThetaxonomyidentificationnumberoftheFamilyClostridiaceae.

Figure9.ABLASTsearchsetuptoexcludemembersoftheFamilyClostridiaceaefromthesearchresults.Finally,Figure10showstheBLASTresultsfromdoingtheexclusionatthislevel.

Page 7: Exclude blast hits - University at Buffaloubwp.buffalo.edu/.../uploads/sites/5/2017/02/Exclude_blast_hits.pdf · exclude such hits from your searches. I will use a gene from Clostridium

7

Figure10.BLASTresultsafterexcludingtheFamilyClostridiaceaefromthesearch.