1 cis607, fall 2006 semantic information integration instructor: dejing dou week 10 (nov. 29)

19
1 CIS607, Fall 2006 CIS607, Fall 2006 Semantic Information Semantic Information Integration Integration Instructor: Dejing Dou Week 10 (Nov. 29)

Post on 20-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

1

CIS607, Fall 2006CIS607, Fall 2006

Semantic Information Semantic Information IntegrationIntegration

Instructor: Dejing Dou

Week 10 (Nov. 29)

2

On the Semantics of Linking and On the Semantics of Linking and Importing in Modular OntologiesImporting in Modular Ontologies

(Bao(BaoEtalEtal@ISWC2006)@ISWC2006) Individuals in each local domain

are private to that domain, and DDL semantics does not take into account if individuals in different local domains may represent the same physical world object.

We need bridge rules.

3

Bridge rulesBridge rules

4

- connections- connections

5

OutlineOutline

Personal Information Management (PIM) Dataspaces

Semantic Integration in PIMMedical Informatics and BioinformaticsSemantic Integration in Biomedical Informatics

6

Personal Information Personal Information

Homepages (HTML, XML) Personal Emails (Text) Spreadsheets (E.g. Microsoft Excel ) Contact Lists (Text) Calendar Publications and Presentations (Word, Latex, PowerPoint) Personal Databases (SQL)……

7

Personal Information Management (PIM)Personal Information Management (PIM) How to organize personal information resources

– They are currently organized by applications and locations. How to integrate and share the data

– Mostly manually (e.g. copy&paste) How to search (query).

– E.g. Prof. Wang wants to know the papers his students presented in the conferences and travel expenses from grants.

Good news: The development of Internet, Web and Wireless communication makes personal information accessible from desktop, laptop, palm and cellphone.

The problems: Different formats and data structures, different contents based on applications.

8

Association(Relationship)-based PIMAssociation(Relationship)-based PIM Organize the personal information resources based on

their associations (relationships).– Emails Contact Lists– Homepage Publications– Calendar Spreadsheets

Use a domain ontology to define those concepts and store associations (relationships) as mappings.

Develop an integration engine to process the data and query based on the domain ontology and mappings.

9

Association(Relationship)-based PIM (cont’d)Association(Relationship)-based PIM (cont’d)

Domain ontology

PersonHomepage

ContactsSpreadSheet

Publications

CalendarEmails

Information Resources (Data)

Integration Engine

User

Personal DBs

SQL

10

Main Topics in Association-based PIMMain Topics in Association-based PIM How to integrate structured data and unstructured data

– Databases and SpreadSheets are structured, XML and Latex are semi-structured.

– Emails, HTML, Contacts, Word are unstructured text.

How to define the domain ontology. The concepts of different resources use different hierarchy.

How to express the mapping (rules) of different information resources. How can integration engine use those mappings to integrate data and answer query.

– Emails Contact Lists– Homepage Publications Personal Databases– Calendar Spreadsheets

11

Bioinformatics and Medical InformaticsBioinformatics and Medical Informatics What it is

The analysis of biological and medical information using computers and statistical techniques; the science of developing and utilizing computer databases and algorithms to accelerate and enhance biological and medical research.

What it can do– In genomics, bioinformatics includes the development of methods to

search databases quickly, to analyze DNA sequence information, and to predict protein sequence and structure from DNA sequence data.

– In neuroscience, medical informatics can analyze the EEG and MRI data to study functions of neurons and human brain.

– In pharmacy, medical informatics can help study drug use and drug interactions.

– In clinical study, medical informatics (e.g. expert system) can help study diseases and treatment of patients.

12

Good news and problemsGood news and problems Good news

– Most biomedical data has been stored in databases. They are structured data.

– Statistics-based data mining techniques has been used successfully to get the pattern of data.

Problems in biomedical data integration. – Most biomedical databases were developed locally and application-

oriented, there is few agreement in their schemas.

– It is difficult for other people, especially people without biomedical

knowledge, to understand the schemas. – Database schemas are not expressive for the meaning (“semantics”)

of data and pattern of data.

13

Integrating Neuronal DatabasesIntegrating Neuronal Databases Cooperation with Yale Medical Informatics Center to

integrate Senselab (Yale) and CNDB (Cornell)’s web-based neuronal databases.– Senselab: model and structure information of a particular class of neurons.

– CNDB:experimental data for individual neurons measured at a particular day.

Researchers in Senselab have marked up their data and database schema with EDSP[Marenco etal03], an XML specification. Cornell’s researchers also have marked up their data and database schema with another XML dialect.

Structure image

Experimental EEG Data

Electroencephalography

14

Integrating Neuronal Databases(cont’d)Integrating Neuronal Databases(cont’d) Get their database schemas from XML files and transform

them to class and property definitions. Find the mapping of these two neuronal database schemas with the help of domain experts, neuroscientists. Merge these two database schemas with bridging axioms. e.g:

(forall (n - neuron)

(if (@cndb:funct_area n hippocampal.CA1) (@senselab:Neurons @senselab:Hippocampus n)))

We have developed some initial semi-automatic tools and GUIs to help domain experts, such as neuroscientists, to map and merge two neuronal database schemas.

15

Interactive Axioms Composition by Interactive Axioms Composition by Domain ExpertsDomain Experts

Ontology Mapping by similarity matching using dictionaries.

e.g. Protein vs. Enzyme Axiom Production: Allow Domain Experts give some concrete

examples about how two symbols in different ontologies (database schemas) are related. Generalize examples to usable bridging axioms, an machine learning approach to generate mapping rules.

Pattern Reuse: Based on the fact a large number of correspondences can usually be sorted into a small set of patterns, allow domain experts to note and reuse these patterns.

Consistency testing: Detect contradiction of generated bridging axioms; Display the bugs to domain experts and allow axioms to be edited.

16

The mappings between EEG and MRI data The mappings between EEG and MRI data

EEG Data acquisition

Magnetic resonance imaging (MRI)

17

Ontology-based Data Analysis (Mining) Ontology-based Data Analysis (Mining) You can consider it as an expert system. At least useful for

training purposes.

DataMDataR Inference Engine

OR OM

EEG, MRI …data

Computational tools

What are the features (patterns) of processed data

What can the patterns tell us (e.g. any function and disease of brain)

18

Ontology-based Genome DB MediationOntology-based Genome DB Mediation

Integrating databases with the domain ontology. The system can process meaningful query and data based on the mapping rules.

…… ……DB2DB1 DB3

Onto1

Domain Ontology (includes GO)

Onto2

Onto3

Query based on domain ontology

e.g. ZFIN e.g. another Zebrafish Lab DB

e.g. Human DB

19

Genotypes + Environment => Phenotypes Genotypes + Environment => Phenotypes

DataPDataG

OG OP

The features (makeup) of Gene

The Observable characteristics produced by

genotype interacting with the environment

DataE

OE

+

Environment Features

GO(gene ontology)

Cellular Component

Molecular Functions

Biological Process

temperaturepressure

light……

Too manyFeatures