model-based mediation with domain maps bertram ludäscher * amarnath gupta * maryann e. martone + *...

30
Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) onal Center for Microscopy and Imaging Research (NC University of California, San Diego (UCSD)

Upload: ashlyn-washington

Post on 21-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Model-Based Mediation with Domain Maps

Bertram Ludäscher* Amarnath Gupta*

Maryann E. Martone+

*San Diego Supercomputer Center (SDSC)+National Center for Microscopy and Imaging Research (NCMIR)

University of California, San Diego (UCSD)

Page 2: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Overview

• Motivation – Problem with current Mediator Architecture

– Complex Scientific Multiple-World Scenarios

• Model-Based Mediation Architecture– Lifting from XML to level of Conceptual Models (CMs)

• Formal Framework– Domain Maps (DMs)

– Generic Conceptual Model GCM

– Integrated View Definition

• Example Query Evaluation• Open Issues

Page 3: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

A Standard Mediator Architecture (MIX -- Mediation of Information using XML, SDSC/UCSD)

MIX MEDIATOR

INTEGRATED VIEW

USER-QueryUSER-Query

Data Sources

DB Files WWW

Lab1 Lab2 Lab3

Wrapper Wrapper Wrapper

XML Q/A

XML Q/A

XML Integrated View DefinitionXMAS/XQuery

XML Q/A

Page 4: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

The Problem: Complex Multiple-World Scenarios

• Current Integration Issues– Structural/Schema Conflicts

• common semistructured data model (XML)

• schema transformations/integration (XML queries & transforms)

– Limited Query Capabilities

• capability based rewriting (e.g., TSIMMIS)

– ... • BUT scenarios are “one-world” (amazon.com vs. bn.com) or

simple multiple world (home buyer)• Problem: No Support for Semantic Mediation

– “complex multiple-world” scenarios (Neuroscience, Geoscience):

• complex, disjoint, seemingly unrelated data

• “hidden semantics” in complex, indirect relationships

Page 5: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

A Neuroscience Question

What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?

How about other rodents?

protein localization(NCMIR)

Wrapper

neurotransmission(SENSELAB)

Wrapper

morphometry(SYNAPSE)

Wrapper

??? Integrated View ???

???Mediator ??????Mediator ?????? Integrated

View Definition ???

Page 6: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Hidden Semantics: Protein Localization (NCMIR)

<protein_localization><neuron type=“purkinje cell” /><protein channel=“red”>

<name>RyR</>….</protein><region h_grid_pos=“1” v_grid_pos=“A”>

<density> <structure fraction=“0.8”>

<name>spine</><amount name=“RyR”>0</>

</> <structure fraction=“0.2”>

<name>branchlet</><amount name=“RyR”>30</>

</>

Molecular layer ofCerebellar Cortex

Purkinje Cell layer ofCerebellar Cortex

Fragment of dendrite

Page 7: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Hidden Semantics: Morphometry (SNYAPSE)

<neuron name=“purkinje cell”><branch level=“10”>

<shaft>…

</shaft> <spine number=“1”>

<attachment x=“5.3” y=“-3.2” z=“8.7” />

<length>12.348</> <min_section>1.93</> <max_section>4.47</> <surface_area>9.884</> <volume>7.930</> <head> <width>4.47</>

<length>1.79</> </head>

</spine> …

Branch level beyond 4 is a branchlet

Must be dendritic because Purkinje cells

don’t have somatic spines

Page 8: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Approach: Model-Based Mediation

• Complex Multiple Worlds Integration Problem– terms not directly joinable– complex, indirect associations– unstated, “hidden” semantics (not just schema conflicts)

• Missing “Semantic Link”=> how to define complex, indirect semantic links?

=> lift mediation to the level of conceptual models (CMs)=> domain expert’s knowledge formalized as rules over CMs=> Model-Based Mediation

Page 9: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

XML-Based vs. Model-Based Mediation

IF THEN IF THEN IF THEN

LogicalDomainConstraints

Integrated-CM :=

CM-QL(Src1-CM,...)

. . ....

....

........ (XML)Objects

Conceptual Models

C2 C3

C1

R

Classes,Relations,is-a, has-a, ...

DOMAIN MAP

Raw DataRaw DataRaw Data

XMLElements

XML Models

Integrated-DTD :=

XQuery(Src1-DTD,...)

No DomainConstraints

A = (B*|C),DB = ...

Structural Constraints (DTDs),Parent, Child, Sibling, ...

Page 10: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Extended Mediator Architecture

• Wrappers export Conceptual Models (CMs)– facts & rules for classes, relationships, ICs, ... – source data is “put into context” (“aboutness” index) by linking

to domain maps (DMs)

• Mediator employs CMs and DMs– ... to define complex semantic relationships on the formalized

domain knowledge

• Generic Conceptual Model (GCM)– as a common target CM – minimal requirements/core expressions:

• instance(O,C), subclass(C1,C2)• method_type(C,M,C’), method_value(O,M,R)• relation_type(R,A1/C1,...,An/Cn)• relation_value(R,a1,...,an)

• Expressiveness, Extensibility – allow inductive properties (inheritance, closures, ...)– employ a declarative rule language (e.g. F-Logic)

Page 11: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Model-Based Mediator Architecture

USER/ClientUSER/Client

S1 S2

S3

XML-Wrapper

CM-Wrapper

XML-Wrapper

CM-Wrapper

XML-Wrapper

CM-Wrapper

GCM

CM S1

GCM

CM S2

GCM

CM S3

CM (Integrated View)

MediatorEngine

FL rule proc.

LP rule proc.

Graph proc.XSB Engine

Domain MapDM

Integrated View Definition IVD

Logic API(capabilities)

CM Queries & Results (exchanged in XML)

CM Plug-In

Page 12: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Formalizing Domain Knowledge:Domain Map for SYNAPSE and NCMIR

A domain map comprises• Description Logic facts ...

- concepts ("classes") - roles ("associations")

• derived properties ...• ... expressed as logic rules

- (e.g. F-logic)

domain map

Purkinje cells and Pyramidal cells have dendritesthat have higher-order branches that contain spines.Dendritic spines are ion (calcium) regulating components.Spines have ion binding proteins. Neurotransmissioninvolves ionic activity (release). Ion-binding proteinscontrol ion activity (propagation) in a cell. Ion-regulatingcomponents of cells affect ionic activity (release).

domain expert knowledge

equivalent Description Logic facts

Page 13: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Domain Map Refinement

In addition to registering (“hanging off”) data, a source may also refine the mediator’s domain map...

... source can register new concepts at the

mediator ...

Page 14: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Definition of Integrated Views (Deja Vu?) ...

• XML/CM-2-FL Translators

<!ELEMENT Studies (Study)*><!ELEMENT Study (study_id, … animal, experiments, experimenters><!ELEMENT experiments (experiment)*><!ELEMENT experiment (description, instrument, parameters)>

studyDB[studies =>> study].study[study_id => string; … animal => animal; experiments =>> experiment; experimenters =>> string].…

• Specification of Domain Knowledge• Subclasses

• Data Classification

• Integrity Constraints

mushroom_spine :: spine

DERIVE S:mushroom_spine FROM S:spine[head_; neck _].

ic1(S):ALERT[type “invalid spine”; object S] IF S:spine[undef ->> {head, neck}].

Page 15: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

... Definition of Integrated Views (Multiple Sources)

• Integrated View Definition

• Schema Reasoning & Dynamic Classes

taxon[subspecies string; species string; genus string; … phylum string; kingdom string; superkingdom string].

subspecies::species::genus:: … kingdom::superkingdomTAXON Rank Hierarchy

DERIVE T:TR, TR::TR1 FROMT: ‘TAXON’.taxon[Taxon_Rank TR, Taxon_Rank1 TR1],Taxon_Rank::Taxon_Rank1.

Create Classes fromTAXON data

DERIVE

protein_distribution(Protein, Organism,Brain_region,Feature_name,Anatom,Value)

FROM

I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}] , % from PROLAB

AS..segments..features[name->Feature_name; value->Value],

NAE:neuro_anatomic_entity[name-> Anatom; % from ANATOM

located_in->>{Brain_region}].

TAXON DB Schema

Page 16: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Query Evaluation Example

push selection

@SENSELAB: X1 := select output from parallel fiber ;determine source context

@MEDIATOR: X2 := “hang off” X1 from Domain Map;

compute region of interest (here: downward closure)

@MEDIATOR: X3 := subregion-closure(X2);

push selection

@NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors);

compute protein distribution

@MEDIATOR: X5 := compute aggregate(X4);

"How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"

Page 17: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

ANATOM Domain Map with Registered Data ANATOM DATA

Page 18: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Deductive Closure of “has_a” with “tc(is_a)”:(YES -- Real Recursive Views!! ;-) ANATOM CLOSURE

Page 19: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Interactive Queries KIND01

Page 20: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Resulting Sub DOMAIN MAP “Browser” PROTLOC

Page 21: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Computed Protein Localization Data PROTLOC

Page 22: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Client-Side Result Visualization(using AxioMap Viewer: Ilya Zaslavsky)

PROTLOC-AxioMap

Page 23: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Comparison & Summary: Model-Based Mediation

(Complex) Single World/ Simple Multiple World

Complex Multiple World

Integration target global schema(common / shared)

1..n shared domain maps

Example scenario suppliers’ catalogs/ home buyer

complex scientific data (neuroscience, geoscience,…)

Schema level overlapInstance level overlap

large / smalllarge / none

none … smallnone

Source correlation direct, instance / schema level indirect, conceptual (knowledge)level

Techniques schema transformations, schemaintegration

“structural” integration

domain maps, formalized domainknowledge (“semantic bridges”)=> model-based (“semantic”)

mediationIntegration languagesExpressiveness

relational, semistructured,queries & transformations

(e.g., SQL, XQuery, XSLT)

conceptual (description logics),object-oriented, deductive features

(e.g., GCM, F-logic)

Integrators DB expert domain expert + KRDB expert

Page 24: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Conclusions and Outlook

• Model-based Mediation Architecture– for complex multiple worlds scenarios (Neuroscience, ...)– sources export CMs (data “lifted” to conceptual level)– mediator employs DMs (“semantic road map”)

• Simple Prototype based on XSB/FLORA– source and result data situated in DM context– domain scientists are excited ...

• Some Open Issues – striking the right balance between complexity and expressiveness of

DMs (e.g. subsumption and satisfiability of DMs should be decidable)– query processing/optimization– modeling query capabilities– semantic annotation tools for “dumb” sources– re-implement ... *sigh* ...– ...

Page 25: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

ADDITIONAL MATERIAL STARTS HERE

Page 26: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

ANATOM Domain Map ANATOM

Page 27: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Model-Based Mediation with DOMAIN MAPS (DMs)

Integrated-CM(Z1,...) := get X1,... from Src1;

get X2,... from Src2;LINK (Xi, Yj);Zj = CM-QL(X1,...,Y1,...)

LINK(X,Y):

X.zip = Y.zip

X.addr in Y.zipX.zip overlaps Y.county...

• “Semantic Road Maps” for situating source data

=> navigational aid (browsing source classes at the conceptual level)

=> basis for integrated views across multiple worlds

=> link points (concepts) and labeled arcs (roles)

=> formal semantics (in FL and/or DLs)

Example: ANATOM DM

= antatomical entities (concepts) + is_a, has_a, overlaps, ... (roles)

=> from syntactic equality to semantic joins

Page 28: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Example Query Evaluation (I)

• Example: protein_distribution– given: organism, protein, brain_region– ANATOM DM:

• recursively traverse the has_a_star paths under brain_region collect all anatomical_entities

– Source PROLAB:• join with anatomical structures and collect the value of attribute

“image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism

– Mediator:• aggregate over all parents up to brain_region• report distribution

Page 29: Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center

Interactive Queries KIND