biomas: a multi-agent system for automated genomic annotation

of 33 /33
BioMAS: A Multi-Agent System for Automated Genomic Annotation Keith Decker Department of Computer and Information Sciences University of Delaware Salim Khan, Ravi Makkena, Gang Situ Computer & Information Sciences Dr. Carl Schmidt, Heebal Kim Animal & Food Sciences

Author: wyanet

Post on 11-Feb-2016

31 views

Category:

Documents


0 download

Embed Size (px)

DESCRIPTION

BioMAS: A Multi-Agent System for Automated Genomic Annotation. Keith Decker Department of Computer and Information Sciences University of Delaware. Salim Khan, Ravi Makkena, Gang Situ Computer & Information Sciences. Dr. Carl Schmidt, Heebal Kim Animal & Food Sciences. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

  • BioMAS: A Multi-Agent System for Automated Genomic AnnotationKeith DeckerDepartment of Computer and Information Sciences University of DelawareSalim Khan, Ravi Makkena, Gang SituComputer & Information SciencesDr. Carl Schmidt, Heebal KimAnimal & Food Sciences

  • OutlineGeneral class of problems and MAS solution approachBioMAS: Automated Genomic AnnotationHVDB: HerpesVirus DatabaseChickDB: Gallus Gallus DatabaseGOFigure!CoPrDomSignal Transduction Pathway Discovery

  • What problems are we addressing?Huge, dynamic Primary Source DatabasesHighly distributed, overlappingHeterogeneous content, structure, curationMultitude of analysis algorithmsDifferent interfaces, output formatsCreate contingent process plans chaining many analyses together Individual PIs, working on non-model organismsLearn, then hand-navigate sea of DBs and analysis toolsEasily overwhelmed by new sequence and EST dataStruggle to make results available usefully to others

  • Approach: Multi-Agent Information Gathering Software agents for information retrieval, filtering, integration, analysis, and displayEmbody heterogeneous database technology (wrappers, mediators, )Deal with dynamic data and changing data sourcesEfficient and robust distributed computation (for both info retrieval and analysis)Deal with issues of data organization and ownershipNatural approach to providing integrated information To humans via webTo other agents via semantic markup [XML/OIL/DAML]

  • Example: Multi-Agent System for Automated Herpesvirus AnnotationInput raw sequence dataOutput: an annotated database that allows fairly complex queriesBLAST homologsMotifsProtein domains [Prodomain records]PSORT sub-cellular location predictionsGO [Gene Ontology] electronic annotationShow me all the genes in Mareks Disease virus with a tyrosine phosphorylation motif and a transmembrane domain value 2

  • How does this help?Automates collection of information from various primary source databasesIf the info changes, can be updated automatically. PI can be notified.Allows various analyses to be done automaticallyCan encode complex (contingent) sequences of info retrieval and linked analyses, report interesting results onlyNew data sources, annotation, analyses can be applied as they are developed, automatically (open system)Made available on internet to others, or private dataMuch more sophisticated queries than keyword searchDynamic menu of keysConcept hierarchies (ontology) allow more concise queriesQuery planning (e.g., time, resource usage)Can search across multiple databases (i.e., from other researchers)

  • How does it work?RETSINA-styleMulti-Agent Organization

  • DECAF: A multi-agent system toolkitFocus on programming agents, not designing internal architectureProgramming at the multi-agent levelValue-added architectureSupport for persistent, flexible, robust actions

  • DECAFFocus on programming agents, not designing internal architectureAvoiding the API approachDECAF as agent operating system, programmers have strictly limited accessCommunication, planning, scheduling, [coordination], executionGraphical dataflow plan editor

  • DECAFProgramming at the multi-agent levelStandardized, domain-independent, reusable middle agentsAgent Name Server (white pages)Matchmaker (yellow pages/directory service)Brokers (managers)Information extraction (learning [STALKER] + knowledgebase [PARKA])Proxy (web interfaces)[Agent Management Agent (debugging, demos, external control)]Note: heterogeneous architectures are OK!

  • DECAFValue-added architectureTaking care of details (social/individual)ANS registration/dereg (eventually MM)Standard behaviors (AMA, error, FIPA, libraries)Message dispatching (ontology, conversation)Coordination (GPGP)Efficient use of computational resourcesHighly threaded: internally + domain actionsMemory efficient (ran systems for weeks, hundreds of thousands of messages)

  • DECAFSupport for persistent, flexible, robust actionsHTN-style programmingTask alternatives and contingenciesRETSINA-style dataflowProvisions/Parameters determine task activationMultiple outcomes, LoopsTMS-style task network annotationsDynamic overall utility: Quality, cost, duration task characteristicsExplicit representation of non-local tasksExample: Time/Quality tradeoff

  • DECAF ArchitecturePlan fileIncoming KQML/FIPA messagesDomain Facts and BeliefsOutgoing KQML/FIPA messagesIncoming Message QueueObjectives QueueTask QueueAgendaQueueTask TemplatesHash TablePendingAction QueueAction Results QueueAgentInitializationDispatcherPlannerSchedulerExecutor[concurrent]

  • Plan Editor

  • Expanding the Genomic Annotation System

  • Functional Annotation SuborganizationGene Ontology Consortiumwww.geneontology.org Biological process Molecular Function Cellular Component

  • Co-present Domain Networks (CoPrDom)Proteins can be viewed as conserved sets of domainsVertex = domain, edge = co-present in some protein, edge weight = # of proteins co-present inNetwork constructed from InterPro domain markup of proteins in 10 species (human, drosophila, c. elegans, s. cerevisiae among them)Functional characterization via InterPro to GO mappingNetwork constructed per organism per functional group, eg: apoptosis regulation in human

  • Uses for COPRDOM Functional characterization of unknown domainsIdentification of core domains/groups in a functional groupTracking domain evolution through species evolutionPredicting protein-protein interaction by identifying evolutionary merging of domain groups

  • Biological Pathway Discovery thru AI Planning TechniquesAI planning is a computational method to develop complex plans of action using the representation of the initial states, the actions which manipulate these states to achieve the goal states specified.Initial States: The initial state representation of objects in the "plan world" Actions: Logical descriptions of preconditions and effectsGoals: The end states desired HTN (Hierarchical Task Network) Planning proceeds by task decompostion of networks, and a successful is one that satisfies a task network.

  • Uses of the Signal Transduction PlannerTo produce computer interpretable plans capturing relevant qualitative information regarding signal transduction pathways.To produce testable hypotheses regarding gaps in knowledge of the pathway, and drive future signal transduction research in an ordered manner.To identify key nodes where many pathways are regulated by a node with only 1 functional protein serving as a critical checkpoint.To perform in silico experiments of hyper expression and deletion mutation.To enable pathway vizualization tools by providing human- and machine-readable pathway description.

  • Advantages of PlanningOperator schema: Abstracted axiomatic definitions of sub-cellular processes, understandable to human + computerTask abstraction: Decomposition of complex task into simpler, interchangeable actions.Reduces search space, conflictsModeling of pathways at different levels of biochemical detailSearch conducted in Plan Space: Most planners perform bi-directional search (vs. Pathway Tools, Prolog implementations, etc.)Partial-order Planning: Succinct representation of multiple pathways helps identify key causal relationships

  • Advantages of Planning (contd.)Conditional effects can be used to model special cases ("exceptions") when applying operator schemaResource Utilization can be used to model quantitative aspects such as amplification of a signal, feedback and feed-forward loopsPlan re-use: Old plans can be successfully inserted into new ones (if initial and final conditions are met )without additional computation

  • (ontologically driven) Operator Schema Example: Transport(action: transport :parameters (?mol - macromolecule, ?compfrom, ?compto - compartment)

    :condition (and (in ?mol ?compfrom) (open ?compfrom ?compto))

    :effects (and (in ?mol ?compto) (not (in ?mol ?compfrom)))

  • RTK-MAPK pathwayActivation of Ras following binding of a hormone (eg. EGF) to a receptor

  • RTK-MAPK pathway step: O-Plan OutputPhosphorylation of GRB2 at domain Sh2 by the RTK receptor

  • SummaryBioinformatics has many features amenable to multi-agent information gathering approachBioMAS: Automated Analysis: EST processing to functional annotation ontologies DECAF / RETSINA / TMSGOFigure! And electronic GO annotationCoPrDom Co-Present Domain AnalysisSignal Transduction Pathway Discovery

  • BioMAS Future WorkSophisticated queries are possible, but how to make available to Biologists??Show me all glycoproteins in Mareks Disease virus with a tyrosine phosphorylation motif and a transmembrane domain value 2 that are expressed in feather folliclesRobustness, efficiency, scale, data materialization issuesAutomating and integrating more complex analysis processes (using existing software!)Estimating physical location of genes by syntenyIntegrate new data sourcesMicroarray and other gene expression dataAnd thus, more analyses: QTL mapping, metabolic pathway learningNew off-site organism databases and analysis agentshttp://www.cis.udel.edu/~decaf/http://udgenome.ags.udel.edu/