transparent access to multiple bioinformatics information sources (tambis)

28
Transparent access to multiple bioinformatics information sources (TAMBIS) Goble, C.A. et al. (2001) IBM Systems Journal 40(2), 532-551 Genome Analysis Paper Presentation March 24, 2005

Upload: alana-battle

Post on 02-Jan-2016

33 views

Category:

Documents


2 download

DESCRIPTION

Transparent access to multiple bioinformatics information sources (TAMBIS). Goble, C.A. et al. (2001) IBM Systems Journal 40(2), 532-551 Genome Analysis Paper Presentation March 24, 2005. Presentation Overview. Why the need to integrate Definitions (“MW”s) Biologists’ burden - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Transparent access to multiple bioinformatics information sources (TAMBIS)

Transparent access to multiple bioinformatics information sources (TAMBIS)

Goble, C.A. et al. (2001)IBM Systems Journal40(2), 532-551

Genome AnalysisPaper Presentation

March 24, 2005

Page 2: Transparent access to multiple bioinformatics information sources (TAMBIS)

Presentation Overview Why the need to integrate Definitions (“MW”s) Biologists’ burden What is TAMBIS The TaO Brains of TAMBIS What makes TAMBIS “service-oriented”? GRAIL TAMBIS Architecture What can you do at TAMBIS? Related Work More current Work Ongoing challenges for integration

Page 3: Transparent access to multiple bioinformatics information sources (TAMBIS)

Why the need to Integrate? The Molecular Biology Database Collection has 500+

resources 719 in 2005 NAR DB issue Adding ~150 in the past two years

Independent development and differing scopes heterogeneous formats, interfaces, input, outputs

Most popular resources : DNA and Protein sequences (GenBank, Swiss-Prot) Genome data (ACeDB) Protein structure and motifs (PDB, PROSITE) Similarity searching (BLAST)

Page 4: Transparent access to multiple bioinformatics information sources (TAMBIS)

Definitions (MW*) Extensional coverage :

number of entries / instances covered by the source

Intensional coverage :number of information fields /meta-data in each source

Description Logic :A family of knowledge representation languages which can be used to represent the terminological knowledge of an application domain in a structured and formally well-understood way.

CPL (Collection Programming Language) :A functional multidatabase language; models complex data types such as lists, sets, and variants with drivers (wrappers) that execute requests over data sources

* MW = “misunderstood word” (from a Montessori class)

Page 5: Transparent access to multiple bioinformatics information sources (TAMBIS)

Definitions (MW*) Terminology server :

Encapsulates the reasoning services associated with the Description Logic, supporting concept reasoning, role sanctioning, thesaurus, extrinsics services

Sanctioning :Capability of inferring more (biological) concepts by way of compositional constraints encompassed in the ontology

Ontology :An explicit formal specification of how to represent the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them.

* MW = “misunderstood word” (from a Montessori class)

Page 6: Transparent access to multiple bioinformatics information sources (TAMBIS)

Biologists’ burden Construct a view of the meta-data Resolve structural and semantic differences in

the information Locate and communicate with the sources Interoperate between resources Transformation process

…. “fragile” process…. undoubtably specialized

Page 7: Transparent access to multiple bioinformatics information sources (TAMBIS)

TAMBIS A prototype mediation system

Designed to lessen the burden as described previously Service-oriented Based on an extensive source-independent global

ontology of molecular biology and bioinformatics Represented in a Description Logic Managed by a terminology server

A mixed top-down and bottom-up iterative methodology

Providing a single access point for biological information sources around the world

Page 8: Transparent access to multiple bioinformatics information sources (TAMBIS)

Emphasis of TAMBIS High transparency Read-only access Retrieval-oriented architecture

Efficiency and correctness Heterogeneity management Visual query interface

Page 9: Transparent access to multiple bioinformatics information sources (TAMBIS)

Features of TAMBIS Very rich domain ontology (1,800 biological

concepts) Web-based…

Query formation Ontology browsing

Query translation and planning process

More than GO, more than SRS

Page 10: Transparent access to multiple bioinformatics information sources (TAMBIS)

The TaO Aim is to capture biological and bioinformatics

knowledge in a logical conceptual framework Constraints… or features…

Only biologically sensible concepts classify correctly

Can encompass different user views Makes biological concepts and their

relationships computationally accessible

Could have used another ontology but this one was developed concurrently for TAMBIS

Page 11: Transparent access to multiple bioinformatics information sources (TAMBIS)

The TaO

Page 12: Transparent access to multiple bioinformatics information sources (TAMBIS)

Current state of TaO Big Model

Covers proteins, nucleic acids, their components, function, location, publishing

Baby model (Baby TaO) Covers only the protein subset of the big model Used for the “fully functional version” of

TAMBIS Reconciled model

Merged version of the big and baby TAMBIS ontologies

Page 13: Transparent access to multiple bioinformatics information sources (TAMBIS)

Brains of TAMBIS … Query translation and planning process

“A concept formed as a query is resolved when its extension is retrieved”Sample query,

Protein which hasFunction Receptor

Takes a query phrased in terms of the conceptual layer and converts it into an executable plan in terms of the classes and methods of the physical layer.

Plans an efficient way of executing a queryi.e., evaluates the alternatives paths

The various resources do not need to provide query language interfaces

Page 14: Transparent access to multiple bioinformatics information sources (TAMBIS)

(Definitions revisited)

concept

relationship

Page 15: Transparent access to multiple bioinformatics information sources (TAMBIS)

What makes TAMBIS “service-oriented”? Reasoning services for description logics

Subsumption Classification Satisfiability Retrieval

Sanctioned term construction Querying Terminology Services

Page 16: Transparent access to multiple bioinformatics information sources (TAMBIS)

(Definitions revisited)

sanctionsubsumption

Page 17: Transparent access to multiple bioinformatics information sources (TAMBIS)

An example of subsumption

Page 18: Transparent access to multiple bioinformatics information sources (TAMBIS)

GRAIL A concept modelling language A Description Logic in the KL-ONE family….

In this case, used to describe biological concepts

Two major services provided : Supporting transitive roles, role hierarchies, a

powerful set of concept assertion axioms Novel multilayered sanctioning mechanism

Page 19: Transparent access to multiple bioinformatics information sources (TAMBIS)

TAMBIS Architecture Three layers (“models”)

Physical Conceptual Mapping

Five components Ontology of biological

terms (A) Knowledge-driven query

formulation interface (B) Sources and Services

Model linking the biological ontology with the source schemas (C)

Query transformation rewriting process (D)

Wrapper service dealing with external sources (E)

Page 20: Transparent access to multiple bioinformatics information sources (TAMBIS)

Query translation

Page 21: Transparent access to multiple bioinformatics information sources (TAMBIS)

What can you do at TAMBIS? Browse the ontology Build a query with a visual interface and

reference to an ontology Give values to concepts (for a query) Identify desired concepts as results Bookmark your queries

Page 22: Transparent access to multiple bioinformatics information sources (TAMBIS)

Ontology browser

Page 23: Transparent access to multiple bioinformatics information sources (TAMBIS)

Specific questions for TAMBIS Find human homologues of yeast receptor

proteins Find rat proteins that have a domain with a

seven-propeller domain architecture Find the binding sites of human enzymes with

zinc cofactors

…. How many sources are involved per question?…. How difficult to find these answers without integration?.... For someone unfamiliar with the resources?

Page 24: Transparent access to multiple bioinformatics information sources (TAMBIS)

TAMBIS OverviewNatural language :Select motifs for antigenic human proteins that participate in apoptosis and are homologous to the lymphocyte associated receptor of death (also known as lard).

TAMBIS Translation :Select patterns in the proteins that invoke an immunological response and participate in programmed cell death that are similar in their sequence of amino acids to the protein that is associate with triggering cell death in the white cells of the immune system.

Concept expression in GRAIL :Motif which<isComponentOf (Protein which

<hasOrganismClassification Species FunctionsInProcess Apoptosis HasFunction Antigen isHomologousTo Protein which

<hasName ProteinName>)>)>(Species given value “human” and ProteinName given value “lard”)

Page 25: Transparent access to multiple bioinformatics information sources (TAMBIS)

Related Work Closest work : Object-Protocol Model (OPM)

No source transparency SRS, Entrez, BioNavigator

Does not handle as complex queries TAMBIS is query based, these are clicking-

based BioKleisli, DiscoveryLink

Middleware solutions, TAMBIS sits on top of this Carnot

General rather than detailed ontology

Page 26: Transparent access to multiple bioinformatics information sources (TAMBIS)

More current work DAML + OIL (new DL for TAMBIS)

DARPA Agent Markup Language – provides a rich set of constructs to create ontologies and to markup information so that it is machine-readable

CPL/BioKleisli (wrapper language) replaced by DiscoveryHub (commercial)

GO – more completely and widely used Protégé OWL

Ontology editor for the Semantic Web BioMOBY, BioConductor

Complementary systems

Page 27: Transparent access to multiple bioinformatics information sources (TAMBIS)

Ongoing challenges to integration Evaluation

Technical efficiency User usability

Changing underlying resources Resources disappear Changes in popularity MAINTENANCE

…. Widespread acceptance and use?

Page 28: Transparent access to multiple bioinformatics information sources (TAMBIS)

References Goble, C.A. et al. (2001) “Transparent access to

multiple bioinformatics information resources.” IBM Systems Journal. 40(2), 532-551.

Baker, P.G. et al. (1999) “An ontology for bioinformatics applications.” Bioinformatics. 15(6), 510-520.

Ontology definition : dli.grainger.uiuc.edu/glossary.htm Description Logic defn :

www.absoluteastronomy.com/encyclopedia/D/De/Description_Logic.htm

TAMBIS website :http://imgproj.cs.man.ac.uk/tambis/