exclude blast hits - university at...
TRANSCRIPT
1
WhatdoIdoifmyblastsearchesseemtohaveallthetophitsfromthesamegenusorspecies?Ifthebacterialspeciesyouareusingtoannotateisclinicallysignificantorofgreatresearchinterest,youmayfindthatwhenyouperformblastsearches(particularlyinnr)thatyouseeminglyonlygethitsthataredifferentstrainsorisolatesofthesamespecies.Thisobviouslydoesn’tgiveyoumuchinformationabouthowwellconservedtheproteinonwhichyouareworkingiscomparedtoproteinsinothergenera.Thereisamethodtomodifyblasttoletyouexcludesuchhitsfromyoursearches.IwilluseagenefromClostridiumbotulinumasanexampletoillustratethisusingtheproteinsequenceofthegenewiththelocustagCLJ_B3418.Figure1showsthetopnrblasthitsforthisprotein.YoucaneasilyseethatallofthehitsbutonearefromClostridiumbotulinumwithveryhighlevelsofcoverageandidentities.TheyareessentiallyallthesameproteinfromdifferentisolatesofClostridiumbotulinum.
Figure1.Theblastresultsusinganon-filterednrblastsearchforCLJ_B3418.Theblastsearchcanbesetupslightlydifferentlytopreventthisproblemfromoccurring.Asnotedinfigure2,wecansetthesearchuptoexclude,inthiscase,thetaxid:1485(Clostridium).ThetaxidnumberstandsfortheNCBITaxonomyIDnumber.Byexcludingthetaxidnumber1485,allblasthitsinthattaxonomicclassificationwillnotbeincluded.TodothiswetypethegenusnameClostridiumintheOrganismtextboxbelowthesequenceinputbox.Asyoutypeapulldownmenuofoptionswillappearwhichyoucansubsequentlyjustclickonto
2
select(seehighlightedmenuiteminFigure2).ThesimplyclicktheExcludecheckboxnexttotheorganismnameandthenselectblast.
Figure2.SettingupablastsearchtoexcludetheCostridiumtaxid:1485.Figure3showstheresultsoftheblastresultforthesameproteinAFTERexcludingtheClostridiumtaxid1485.Notethedifferentnamesappearinginthesearchresults.However,alsonotethatthetophitisnolongertheonethatmatchestheproteinunderinvestigationinthespeciesyouareworkingon.ThusyouwouldtaketheFIRSTnrhitasthetophitinthiscaseinsteadofskippingoverthefirstone.NotealsothatifyouusethisblastresulttoselectsequencesfortheT-CoffeealignmentthatyouwillsubsequentlydointheSequenceBasedSimilarityModule,thatyouwillneedtoaddtheFASTAformattedsequenceoftheproteinunderinvestigationtothetopofthelistbeforeconstructingthealignment.Studentsshouldalsoaddacommentintheirtextbookofwhichtaxidnumberwasexcludedfromtheirsearch.Experimentwithdifferentlevelsofexclusion(onlyonespecies)oraddmultipleoptionsforexclusion(i.e.,thegenus)orsomewhereinbetween(differentspecificspeciesexcludedbyaddingadditionalorganismboxesinwhichtoenterchoicesbyusingthe+optiontotherightoftheexcludecheckboxtoaddanother).
3
Figure3.ThenrblastresultsforCLJ_B3418AFTERexcludingtheCostridiumtaxid:1485.Notethedifferentgenusandspeciesnamesofthetophits.YoucanalsousetheNCBITaxonomyBrowsertofinddifferentlevelsoftaxidtouseinyourexclusionsearches,especiallyifitisnotclearwhatyoushouldchoosefromthepulldownmenuinBLAST.TheuseoftheTaxonomyBrowserisdescribedingeneraltermsintheHorizontalGeneTransfersectionoftheprojectmanual.Briefly,goto:https://www.ncbi.nlm.nih.gov/taxonomyandenterthenameofyourorganism’sgenusinthesearchwindow(Clostridiumbotulinumistheexampleusedbelow)andclickonSearch.AresultsimilartoFigure4willdisplay.Clickontheorganismhyperlinkinblue,andyouwillbetakentothefulllineageoftheorganism(nextfigure).
Figure4.ResultsofsearchingforClostridiumbotulinumintheNCBITaxonomybrowser.
4
Figure5belowshowsaportionoftheC.botulinumresults.Inthelineageline,thelastentryisthegenus(Clostridum),butyoucanhoverthecursoroveranyofthelevelsoftaxonomyandseethenameofthelevel(i.e.,family,orderetc.).Figure6showswhatwilldisplaywhentheClostridiumhyperlinkisselected.
Figure5.TheClostridiumbotulinumresultsfromNCBITaxonomybrowser(notcomplete).OfinterestintheresultsfromclickingontheClostridiumhyperlinkdisplayeddisplayedinFigure6istheTaxonomyIDof1485(exactlytheonewefoundbylimitingtheBLASTresultsfromwithintheBLASTtool).Youcouldusethisinformationtosimplytype“Clostridium(taxid:1485)”–donot,however,includethequotationmarks-intheorganismwindowoftheBLASTsearchandclickexcludeasbefore.
Figure6.TheClostridiumgenustaxidinformation.
5
Wecanalsogofurther“up”intaxonomicwindowtoexcludemorethanonegenus(thoughyoushouldnothavetodothatroutinely).Forexample,Figure7showsthedisplaythatwouldcomeupifweclickedontheClostridiaceae(i.e.,thefamilytowhichthegenusClostridumbelongs)hyperlinkinsteadoftheClostridiumhyperlink.Differentgenerawillappearthatarepartofthisfamily.ClickingontheClostridiaceaelinkfromthispagewillresultintheinformationshownintheFigure8.
Figure7.ThedisplayresultingfromselectionoftheFamilyClostridiaceaeinthetaxonomybrowser.
6
HereweseethattheClostridiaceaefamilyhastheTaxonomyIDof31979.ToexcludethisFamilyfromtheBLASTresults,wewouldsimplytypein“Clostridiaceae(taxid:31979)”intotheorganismboxintheBLASTsearchandclickexclude.Thenextimagewillshowhowtheautofilloptionwillhighlightoncewepasteinthetaxid.
Figure8.ThetaxonomyidentificationnumberoftheFamilyClostridiaceae.
Figure9.ABLASTsearchsetuptoexcludemembersoftheFamilyClostridiaceaefromthesearchresults.Finally,Figure10showstheBLASTresultsfromdoingtheexclusionatthislevel.
7
Figure10.BLASTresultsafterexcludingtheFamilyClostridiaceaefromthesearch.