enriching workflow tools with termite
TRANSCRIPT
EnrichingWorkflowToolsWithTERMiteDataworkflow/pipeliningtoolssuchasPipelinePilot[http://accelrys.com/products/collaborative-science/biovia-pipeline-pilot/]andKNIME[https://www.knime.org/]enjoyastrongusercommunitywithinthelifescienceindustry.Workflowtoolsenabledata-savvyscientiststoperformcomplexanalysiswithouttheneedtolearnacomplexprogramminglanguageandalsoaidscientificreproducibility–providingamechanismtorepeatinsilicoexperimentsusingexactlythesameconditions.Naturally,thereisastrongusecaseforconnectingTERMitewiththesetoolsviaasimple,easytouseprocess.ThisdocumentwillreviewTERMite’ssupportforthesetoolsandsomeoftheuse-casestowhichtheyhavebeenapplied.PipelinePilotSupportOutofthebox,TERMiteshipswithapipelinepilotcollectionthatmakeusingthesoftwareinPipelinePilotveryeasy.FeedtextintotheTERMiteAnnotatorcomponent,andthat’sit.Thecomponentwilltakethetextfromanysource(Medlineshownhere)andprovidearichannotationlayerthatcanbeusedforanytext-miningprojectOfcourse,morecomplexworkflowsarepossible,suchastheexamplebelowwhichsearchesfordrug-generelationships(Inphrasessuchas:“TheGTPase,RhoB,wassynergisticallyup-regulatedincellstreatedwithixabepiloneandsunitinib”).
Thisprotocolcanbebrokendownintothefollowingstages,1.CollectarticlesfromtheMedlinedatabasementioningaparticulardrug(Ixabepilone)2.AnnotatethecorpususingtheSciBiteVOCabs.3.Filterpaperstoremoveanythatdon’tfocusonIxabepiloneusingSciBite’srelevancyalgorithm,whichidentifiesthemostimportanttopicswithinanyarticle.4.UsingtheTExpressmoduleidentifyspecificsemanticpatternswithinasentencesuchasGene-Verb-Drugandextracttheseintoatable.Oncetheprotocolisbuiltandtested,itisarelativelysimpleprocesstorepeatforothervariablesordatasetsthroughtheclickofabutton.Acollectionofprotocolscanbecreatedtoexplorethevariousnuancesofaparticulartopicextractingvaluableinsightfromtextandservetheneedsofawiderteamwithouttheirneedtobeexpertsintextminingorprogramming.SupportForKNIMEKNIMEhasgainedalotofsupportduetoitsopen,Java-basedframeworkthat’senrichedbyathrivingcommunityofscientistsanddevelopers.ManyofSciBitecustomersareKNIMEusersandassuch,itwasimportantweservethiscommunitytoasimilarlevel.
Regardlessoftheworkflowsoftwareused,it’sstillasimpleprocesstobringtextannotationintoyourprotocols.Here,theprotocol
1. Readsafileoftext2. PassesittotheCallTERMitenodetoexecuteontheserver
3. TheresultissenttoParseTERMiteJsontotransformtheresultsintoanextensivedatatable.
TheCallTERMiteandParseTERMiteJsonnodeareseparatedasusersmaywishtocustomisethe“parse”componenttofilterthecomprehensivesetofresultsfromtheCallTERMitenode(whichtermswereused,wheretheywerefound,withwhatconfidenceetc.).
AswithPipelinePilot,itiseasytohookthisintoadatabasesuchasMedline.Herewe’vetakenthearticlesontheproteinBRCA2andaskedwhicharethemostfrequentco-occurringproteinsintheliterature.Ofcourse,BRCA1isthereatthetop,butyoucanstarttoseethelandscapeofthedifferentplayersinBRCA2biology.
Thesesimpleexamplesarejustthetipofaniceberg,buthopefullytheydemonstratehoweasyitistoconnectTERMitetothetwomostpopularworkflowtoolsinlifesciencetoday.UseCasesWhiletheexamplesabovearedeliberatelysimple,ourcustomersareperformingavarietyofdeepdataminingactivitiesusingTERMiteincombinationwithworkflowtools.Currentprojectsconcerntopicssuchas:
• CompetitiveIntelligence-Target-specificreportsandalerting• Dataintegration-aligningdifferentcommercialdatatoacommonontology• Pharmacovigilance-Miningelectronichealthrecordsfordrug-adverseevent
relationships
• Annotationasaservice–Centralresearchteamsservingtherapeuticteamsusingcollectionsofprotocolstoanalyseanyformofscientifictext.
• Creatingprotein-proteininteractionnetworks• Geneexpressionworkflows,enrichingwithdatafromtheliterature
Ifyou’dliketoknowmoreaboutusingTERMiteinyourworkflowsorwantedtodiscussaparticularusecaseinmoredetail,getintouch!