managing, querying and extracting biomedical knowledge trabajo de investigación extracción de...

33
MANAGING, QUERYING AND EXTRACTING BIOMEDICAL KNOWLEDGE Trabajo de Investigación Extracción de Conocimiento para la Web Semántica (1241119) Sistemas Informáticos Avanzados Ernesto Jiménez-Ruiz Supervisor: Rafael Berlanga

Post on 18-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

MANAGING, QUERYINGAND EXTRACTING BIOMEDICAL KNOWLEDGE

Trabajo de Investigación

Extracción de Conocimiento para la Web Semántica (1241119)

Sistemas Informáticos Avanzados

Ernesto Jiménez-Ruiz

Supervisor: Rafael Berlanga

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 2

Outline

Context and Motivation Application of Ontologies Biomedical issues and Health-e-Project

Proposed Methodologies

Ontology Management System

Conclusions and Future Work

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 3

Application of Ontologies

«An ontology is a formal specification of a shared conceptualization» (Borst (1997))

Applications: E-Commerce

Web Browsers

Digital Libraries

Biomedicine

etc…

Context and Motivation

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 4

Health-e-Child Project

Aims to develop an integrated healthcare platform for European pediatrics, achieving a comprehensive view of children’s health

Grid Architecture

Main Upper Level Applications: KDS, DSS

Our tasks: Integration of biomedical data, information, and knowledge.

Context and Motivation

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 5

Health-e-Child Project

The biomedical information sources will cover six distinct levels (vertical levels): Molecular Cellular Tissue Organ Individual Population

And will focus on three representative diseases (inside paediatrics): Heart diseases Inflammatory diseases Brain tumours.

Context and Motivation

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 6

Application of current Ontologiesin HeC

HeC vertical levels expressed by Ontologies Available several large biomedical ontologies and

taxonomies, e.g: GO, GALEN, FMA, NCI-Thesurus, Tambis, BioPax, etc.

Difficult too apply in concrete applications like HeC: Scalability in reasoning. Specificity: local view of the domain Visualization and treatment

Context and Motivation

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 7

View Mechanism

Operation through views or used defined fragments/modules.

Working on the development of OntoPath

Future: to formalize the extracted views

Context and Motivation

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 8

Outline

Context and Motivation

Proposed Methodologies Development of Ontologies in a Collaborative Way From Domain Ontologies to Application Views

Ontology Management System

Conclusions and Future Work

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 9

Development Requirements

Development Methodologies with new dimensions: Dynamism distribution of team structure and partially controlled development.

Proposed Requirements: Modularization/Particionamiento Local Adaptation Knowledge Abstraction User-defined modules (Views) Argumentation and Consensus

Development of Ontologies in a Collaborative Way

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 10

Development Phases

We distinguish 5 different phases:RequirementsDevelopmentPublication and ArgumentationEvaluation and MaintenanceApplication

Development of Ontologies in a Collaborative Way

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 11

Knowledge Spaces

Development of Ontologies in a Collaborative Way

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 12

View Derivation Hierarchy

Development of Ontologies in a Collaborative Way

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 13

Outline

Context and Motivation

Proposed Methodologies Development of Ontologies in a Collaborative Way From Domain Ontologies to Application Views

Ontology Management System

Conclusions and Future Work

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 14

‘Current’ Biomedical Sources

Creation of large and important Biomedical Ontologies and Taxonomies: GALEN, FMA, NCI-Thesurus, Tambis, BioPax, etc Open Biomedical Ontologies (OBO)

Metathesaurus, Dictionaries and Lexicons: Unified Medical Language System (UMLS) MesH (Medical Subject Headings) SNOMED

Bioinformatics public databases (OMIM, UNIPROT, DrugBank, etc.)

Hospital Resources (databases, texts, forms, images, etc.)

From Domain Ontologies to Applications

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 15

New Issues and Challenges

Many domain ontologies in Biomedicine do not cover completely the requirements of specific applications.

Concepts may involve different abstraction levels (e.g. molecular, organ, disease, etc.) that can be in the same or in different domain ontologies.

Domain ontologies are normally rather large: Users find them hard to use for annotating and querying information

sources Only a subset of those is used by system applications.

From Domain Ontologies to Applications

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 16

New Issues and Challenges

The work to be presented mainly focuses on this issues:

Do not cover requirements Integration and Enrichment

Involve different abstraction levels Integration, enrichment and definition of user-defined modules

(Views)

Are rather large (hard to use and only a subset are used) User-defined modules (Views)

From Domain Ontologies to Applications

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 17

From Ontologies to Applications

From Domain Ontologies to Applications

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 18

Enrichment from Textual Sources

Automatic Instance Generation (Danger (2004)) Look for aggregation paths (i.e.: concept-relation-concept…) in texts. Necessity of a well created and consistent ontology,

UMLS-based Biomedical Entity Recognizer A treated version of UMLS as a dictionary Entity relation co-occurrences Problem: How to discover named relations between entities. Technical Report: Jimeno-Yepes and Jiménez-Ruiz et al. (2007)

From Domain Ontologies to Applications

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 19

Outline

Context and Motivation

Proposed Methodologies

Ontology Management System Parser and Storage in G database OntoPath and Builder Plugin-Protégé

Conclusions and Future Work

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 20

System Architecture

Ontology Management System

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 21

OWL Parser

Greater flexibility in the OWL treatment and storage capabilities (e.g. indexes)

The OWL parser creates from the OWL file a set of structures for classes, properties, nominal and individuals.

These structures will be stored in the graph-based database G.

Ontology Management System

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 22

G Semi-structured Database

Backend to store, index and retrieve the OWL ontologies as graphs.

Four database object types are needed: ontology, property, concept, and enumeration (nominals) O=ontology(name=’Simple.owl’, rootConcept=C1,

rootProperty=P1) C1=concept(name=’Thing’) P1= property (name=’PropertyThing’) C2=concept(name=’Person’, subClassOf=C1) P2= property(name=’hasFriend’, range=C2, domain=C2,

subPropertyOf=P1)

Ontology Management System

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 23

OWL Parser: DL Treatment (I)

Inferred parents: C ≡C1 … Cn ⊓ ⊓ C.subClassOf=C1,…,Cn

Inferred children: C ≡C1 … Cn ⊔ ⊔ C1.subClassOf.append(C) , …,

Cn.subClassOf.append(C)

Inferred domains: C ⊓R.D i-property(name=R, domain=C, range=D).

Ontology Management System

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 24

OWL Parser: DL Treatment (II)

Creation of new classes: C ⊑ R.(D ⊓S.E) C and D named classes D’ is created, with D’.subClassOf=D. It is also created: i-property(name=R, domain=C, range=D’). D’.name=D_with_S_E.

Nominals: (C ⊓R.{i1, i2…, in})… i-property(name=R, domain=C,

enumeration=E).(C ⊓R.{ i1})… i-property(name=R, domain=C, enumeration=E). E=enumeration(name=R-nominals, list-of-values=i1,…, in)

Ontology Management System

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 25

OWL Parser: DL Treatment (III)

ResistanceToInsulin isWithReferenceTo (presence ⊓isPresenceAbsenceOf.Insulin)

C = concept(name=ResistanceToInsulin) P = concept(name=presence, …) I = concept(name=insulin,…) A = concept(name=presence_with_isPresenceAbsenceOf_Insulin,

subClassOf=P) P1 = i-property(name=isWithReferenceTo, domain=C, range=A, …) P2 = i-property(name=isPresenceAbsenceOf, domain=A, range=I,…)

Ontology Management System

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 26

OWL Parser: DL Treatment (IV)

Not properly stored (and queried) : Some Union cases Negation

Expressivity a subset of SHIF(D), The closest DL underlying OWL-Lite. OWL-DL Ontologies uses a subset of the DL SHOIN(D)

And it will produce approximate answers to OntoPath queries.

Ontology Management System

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 27

OntoPath Query Language

To retrieve consistent fragments (personalized modules or views) from domain ontologies.

Syntax simple like XPath.

Results as a new OWL Ontology

Example: Disease / related_to / Rheumatoid_Factor

Ontology Management System

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 28

Protégé Extensions

Storing Ontologies

Retrieving full ontologies or fragments

Representation in a definition hierarchy

Connection with Python codes

Ontology Management System

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 29

Storing Ontologies

Ontology Management System

OWL File

Selection

References to

other Ontologies

(Views)

Biomedical (HeC) Coverage

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 30

Retrieving full ontologies or fragments

Ontology Management System

Several Fragments

Source Ontology

Set of OntoPath Queries

Metadata

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 31

Representation in a definition hierarchy

Ontology Management System

Organization of Views in a Definition Hierarchy

Classification by Biomedical Level

New Tab Created

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 32

Conclusions and Future Work

The system is still work in progress Adaptation to new versions of Protégé Further tests to the OntoPath language Formalizations of connections between fragments and

source knowledge. e-connections? Manchester

Enrichment by text mining techniques Work at EBI: from text to ontologies

Apply the ontologies in HeC: evaluation and validation

PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 33

Questions and Feedback

Thank you very much!!!

Ernesto Jimé[email protected]://www3.uji.es/~ejimenez http://krono.act.uji.es/people/Ernesto/