managing, querying and extracting biomedical knowledge trabajo de investigación extracción de...
Post on 18-Dec-2015
212 views
TRANSCRIPT
MANAGING, QUERYINGAND EXTRACTING BIOMEDICAL KNOWLEDGE
Trabajo de Investigación
Extracción de Conocimiento para la Web Semántica (1241119)
Sistemas Informáticos Avanzados
Ernesto Jiménez-Ruiz
Supervisor: Rafael Berlanga
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 2
Outline
Context and Motivation Application of Ontologies Biomedical issues and Health-e-Project
Proposed Methodologies
Ontology Management System
Conclusions and Future Work
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 3
Application of Ontologies
«An ontology is a formal specification of a shared conceptualization» (Borst (1997))
Applications: E-Commerce
Web Browsers
Digital Libraries
Biomedicine
etc…
Context and Motivation
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 4
Health-e-Child Project
Aims to develop an integrated healthcare platform for European pediatrics, achieving a comprehensive view of children’s health
Grid Architecture
Main Upper Level Applications: KDS, DSS
Our tasks: Integration of biomedical data, information, and knowledge.
Context and Motivation
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 5
Health-e-Child Project
The biomedical information sources will cover six distinct levels (vertical levels): Molecular Cellular Tissue Organ Individual Population
And will focus on three representative diseases (inside paediatrics): Heart diseases Inflammatory diseases Brain tumours.
Context and Motivation
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 6
Application of current Ontologiesin HeC
HeC vertical levels expressed by Ontologies Available several large biomedical ontologies and
taxonomies, e.g: GO, GALEN, FMA, NCI-Thesurus, Tambis, BioPax, etc.
Difficult too apply in concrete applications like HeC: Scalability in reasoning. Specificity: local view of the domain Visualization and treatment
Context and Motivation
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 7
View Mechanism
Operation through views or used defined fragments/modules.
Working on the development of OntoPath
Future: to formalize the extracted views
Context and Motivation
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 8
Outline
Context and Motivation
Proposed Methodologies Development of Ontologies in a Collaborative Way From Domain Ontologies to Application Views
Ontology Management System
Conclusions and Future Work
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 9
Development Requirements
Development Methodologies with new dimensions: Dynamism distribution of team structure and partially controlled development.
Proposed Requirements: Modularization/Particionamiento Local Adaptation Knowledge Abstraction User-defined modules (Views) Argumentation and Consensus
Development of Ontologies in a Collaborative Way
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 10
Development Phases
We distinguish 5 different phases:RequirementsDevelopmentPublication and ArgumentationEvaluation and MaintenanceApplication
Development of Ontologies in a Collaborative Way
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 11
Knowledge Spaces
Development of Ontologies in a Collaborative Way
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 12
View Derivation Hierarchy
Development of Ontologies in a Collaborative Way
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 13
Outline
Context and Motivation
Proposed Methodologies Development of Ontologies in a Collaborative Way From Domain Ontologies to Application Views
Ontology Management System
Conclusions and Future Work
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 14
‘Current’ Biomedical Sources
Creation of large and important Biomedical Ontologies and Taxonomies: GALEN, FMA, NCI-Thesurus, Tambis, BioPax, etc Open Biomedical Ontologies (OBO)
Metathesaurus, Dictionaries and Lexicons: Unified Medical Language System (UMLS) MesH (Medical Subject Headings) SNOMED
Bioinformatics public databases (OMIM, UNIPROT, DrugBank, etc.)
Hospital Resources (databases, texts, forms, images, etc.)
From Domain Ontologies to Applications
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 15
New Issues and Challenges
Many domain ontologies in Biomedicine do not cover completely the requirements of specific applications.
Concepts may involve different abstraction levels (e.g. molecular, organ, disease, etc.) that can be in the same or in different domain ontologies.
Domain ontologies are normally rather large: Users find them hard to use for annotating and querying information
sources Only a subset of those is used by system applications.
From Domain Ontologies to Applications
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 16
New Issues and Challenges
The work to be presented mainly focuses on this issues:
Do not cover requirements Integration and Enrichment
Involve different abstraction levels Integration, enrichment and definition of user-defined modules
(Views)
Are rather large (hard to use and only a subset are used) User-defined modules (Views)
From Domain Ontologies to Applications
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 17
From Ontologies to Applications
From Domain Ontologies to Applications
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 18
Enrichment from Textual Sources
Automatic Instance Generation (Danger (2004)) Look for aggregation paths (i.e.: concept-relation-concept…) in texts. Necessity of a well created and consistent ontology,
UMLS-based Biomedical Entity Recognizer A treated version of UMLS as a dictionary Entity relation co-occurrences Problem: How to discover named relations between entities. Technical Report: Jimeno-Yepes and Jiménez-Ruiz et al. (2007)
From Domain Ontologies to Applications
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 19
Outline
Context and Motivation
Proposed Methodologies
Ontology Management System Parser and Storage in G database OntoPath and Builder Plugin-Protégé
Conclusions and Future Work
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 20
System Architecture
Ontology Management System
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 21
OWL Parser
Greater flexibility in the OWL treatment and storage capabilities (e.g. indexes)
The OWL parser creates from the OWL file a set of structures for classes, properties, nominal and individuals.
These structures will be stored in the graph-based database G.
Ontology Management System
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 22
G Semi-structured Database
Backend to store, index and retrieve the OWL ontologies as graphs.
Four database object types are needed: ontology, property, concept, and enumeration (nominals) O=ontology(name=’Simple.owl’, rootConcept=C1,
rootProperty=P1) C1=concept(name=’Thing’) P1= property (name=’PropertyThing’) C2=concept(name=’Person’, subClassOf=C1) P2= property(name=’hasFriend’, range=C2, domain=C2,
subPropertyOf=P1)
Ontology Management System
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 23
OWL Parser: DL Treatment (I)
Inferred parents: C ≡C1 … Cn ⊓ ⊓ C.subClassOf=C1,…,Cn
Inferred children: C ≡C1 … Cn ⊔ ⊔ C1.subClassOf.append(C) , …,
Cn.subClassOf.append(C)
Inferred domains: C ⊓R.D i-property(name=R, domain=C, range=D).
Ontology Management System
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 24
OWL Parser: DL Treatment (II)
Creation of new classes: C ⊑ R.(D ⊓S.E) C and D named classes D’ is created, with D’.subClassOf=D. It is also created: i-property(name=R, domain=C, range=D’). D’.name=D_with_S_E.
Nominals: (C ⊓R.{i1, i2…, in})… i-property(name=R, domain=C,
enumeration=E).(C ⊓R.{ i1})… i-property(name=R, domain=C, enumeration=E). E=enumeration(name=R-nominals, list-of-values=i1,…, in)
Ontology Management System
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 25
OWL Parser: DL Treatment (III)
ResistanceToInsulin isWithReferenceTo (presence ⊓isPresenceAbsenceOf.Insulin)
C = concept(name=ResistanceToInsulin) P = concept(name=presence, …) I = concept(name=insulin,…) A = concept(name=presence_with_isPresenceAbsenceOf_Insulin,
subClassOf=P) P1 = i-property(name=isWithReferenceTo, domain=C, range=A, …) P2 = i-property(name=isPresenceAbsenceOf, domain=A, range=I,…)
Ontology Management System
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 26
OWL Parser: DL Treatment (IV)
Not properly stored (and queried) : Some Union cases Negation
Expressivity a subset of SHIF(D), The closest DL underlying OWL-Lite. OWL-DL Ontologies uses a subset of the DL SHOIN(D)
And it will produce approximate answers to OntoPath queries.
Ontology Management System
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 27
OntoPath Query Language
To retrieve consistent fragments (personalized modules or views) from domain ontologies.
Syntax simple like XPath.
Results as a new OWL Ontology
Example: Disease / related_to / Rheumatoid_Factor
Ontology Management System
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 28
Protégé Extensions
Storing Ontologies
Retrieving full ontologies or fragments
Representation in a definition hierarchy
Connection with Python codes
Ontology Management System
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 29
Storing Ontologies
Ontology Management System
OWL File
Selection
References to
other Ontologies
(Views)
Biomedical (HeC) Coverage
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 30
Retrieving full ontologies or fragments
Ontology Management System
Several Fragments
Source Ontology
Set of OntoPath Queries
Metadata
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 31
Representation in a definition hierarchy
Ontology Management System
Organization of Views in a Definition Hierarchy
Classification by Biomedical Level
New Tab Created
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 32
Conclusions and Future Work
The system is still work in progress Adaptation to new versions of Protégé Further tests to the OntoPath language Formalizations of connections between fragments and
source knowledge. e-connections? Manchester
Enrichment by text mining techniques Work at EBI: from text to ontologies
Apply the ontologies in HeC: evaluation and validation
PhD Research Work - Ernesto Jiménez Ruiz - TKBG Group 33
Questions and Feedback
Thank you very much!!!
Ernesto Jimé[email protected]://www3.uji.es/~ejimenez http://krono.act.uji.es/people/Ernesto/