model-based information integration in a neuroscience mediator system

25
1 Model-Based Information Integration in a Neuroscience Mediator System Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San Diego

Upload: wyatt-castro

Post on 02-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Model-Based Information Integration in a Neuroscience Mediator System. Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San Diego. WWW. DB. A Standard Mediator Architecture ( MIX -- M ediation of I nformation using X ML ). USER-Query. XML Q/A. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Model-Based Information Integration in a Neuroscience Mediator System

1

Model-Based Information Integration in a Neuroscience Mediator System

Bertram Ludaescher

Amarnath Gupta

Maryann E. Martone

University of California San Diego

Page 2: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 2

A Standard Mediator Architecture (MIX -- Mediation of Information using XML)

MIX MEDIATOR

INTEGRATED VIEW

USER-QueryUSER-Query

Data Sources

DB Files WWW

Lab1 Lab2 Lab3

Wrapper Wrapper Wrapper

XML Q/A

XML Q/A

XML Integrated View Definition

XML Q/A

Page 3: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 3

Integration Issues

SEMANTIC Integration???

SYNTACTIC/STRUCTURAL Integration

• Integrated Views (Src-XML => Intgr-XML)

• Schema Integration (DTD =>DTD)

• Wrapping, Data Extraction (Text => XML)

MIX

Mediation of Information using XML

SYSTEM Integration

SR

B/M

CA

T

TCP/IP HTTP CORBAstorage, query capabilities

protocols & services

Dis

trib

ute

dQ

ue

ry P

roce

ssin

g

Page 4: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 4

Integration Issues: Mediating across Multiple-Worlds

• Structural Integration=> common semistructured data model (XML)

=> XML queries & transformations to resolve schema conflicts

• Limited Query Capabilities=> mediator is aware of QCs exported by wrappers

• ...

• Semantic Integration– most work deals with issues for “one-world” scenarios

(e.g., amazon.com vs. bn.com)– what if data comes from a “multiple-world” scenario (like

Neuroscience), where data objects from different sources are not even similar, and only the hidden semantics (known to the domain expert) provides the “semantic link”?

Page 5: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 5

A Neuroscience Question

protein localization

What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?

How about other rodents?

??? Integrated View ???

???Mediator ??????Mediator ???

morphometry neurotransmission

Web

CaBP, Expasy

Wrapper WrapperWrapper Wrapper

??? Integrated View Definition ???

Page 6: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 6

Hidden Semantics: Protein Localization

<protein_localization><neuron type=“purkinje cell” /><protein channel=“red”>

<name>RyR</>….</protein><region h_grid_pos=“1” v_grid_pos=“A”>

<density> <structure fraction=“0.8”>

<name>spine</><amount name=“RyR”>0</>

</> <structure fraction=“0.2”>

<name>branchlet</><amount name=“RyR”>30</>

</>

Molecular layer ofCerebellar Cortex

Purkinje Cell layer ofCerebellar Cortex

Fragment of dendrite

Page 7: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 7

Hidden Semantics: Morphometry

<neuron name=“purkinje cell”><branch level=“10”>

<shaft>…

</shaft> <spine number=“1”>

<attachment x=“5.3” y=“-3.2” z=“8.7” />

<length>12.348</> <min_section>1.93</> <max_section>4.47</> <surface_area>9.884</> <volume>7.930</> <head> <width>4.47</>

<length>1.79</> </head>

</spine> …

Branch level beyond 4 is a branchlet

Must be dendritic because Purkinje cells

don’t have somatic spines

Page 8: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 8

The Problem

• Multiple Worlds Integration– compatible terms not directly joinable– complex, indirect associations among schema elements– unstated integrity constraints

• Why not just use Ontologies?– typical ontologies associate terms along limited number of

dimensions

• What’s needed?– a “theory” under which non-identical terms can be “semantically

joined”

=> lift mediation to the level of conceptual models (CMs)=> domain knowledge, ICs become rules over CMs=> Model-Based Mediation

Page 9: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 9

XML-Based vs. Model-Based Mediation

Raw DataRaw DataRaw Data

IF THEN IF THEN IF THEN

LogicalDomainConstraints

Integrated-CM :=

CM-QL(Src1-CM,...)

. . ....

....

........ (XML)Objects

Conceptual Models

XMLElements

XML Models

C2 C3

C1

R

Classes,Relations,is-a, has-a, ...

DOMAIN MAP

Integrated-DTD :=

XML-QL(Src1-DTD,...)

No DomainConstraints

A = (B*|C),DB = ...

Structural Constraints (DTDs),Parent, Child, Sibling, ...

Page 10: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 10

Extended Mediator Architecture

=> Wrappers export Conceptual Models (CMs), i.e., facts+rules for classes, relationships, ICs, ... )

=> Mediator imports CMs (from sources, auxiliary knowledge bases, and domain maps (DMs)

=> a generic conceptual model (GCM, a subset of F-logic), extensible via rules = common target CM language

=> new CMs can be plugged-in by specifying them in GCM + F-logic rules

=> prototype implementation in FLORA:• global-as-view approach• compiler: F-logic => XSB-Prolog• top-down evaluation => virtual (demand-driven) views• external interfaces (XML, RDBs, DM visualization,...)

Page 11: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 11

Model-Based Mediator Architecture

USER/ClientUSER/Client

S1 S2

S3

XML-Wrapper

CM-Wrapper

XML-Wrapper

CM-Wrapper

XML-Wrapper

CM-Wrapper

GCM

CM S1

GCM

CM S2

GCM

CM S3

CM (Integrated View)

MediatorEngine

FL rule proc.

LP rule proc.

Graph proc.XSB Engine

Domain MapDM

Integrated View Definition IVD

Logic API(capabilities)

CM Queries & Results (exchanged in XML)

CM Plug-In

Page 12: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 12

Definition of Integrated Views ...

• XML-2-FL and CM-2-FL Translators

<!ELEMENT Studies (Study)*><!ELEMENT Study (study_id, … animal, experiments, experimenters><!ELEMENT experiments (experiment)*><!ELEMENT experiment (description, instrument, parameters)>

studyDB[studies =>> study].study[study_id => string; … animal => animal; experiments =>> experiment; experimenters =>> string].…

• Specification of Domain Knowledge• Subclasses

• Rules

• Integrity Constraints

• Integrated View Definition

mushroom_spine :: spine

S:mushroom_spine IF S:spine[head_; neck _].

ic1(S):alert[type “invalid spine”; object S] IF S:spine[undef ->> {head, neck}].

protein_distribution(Protein, Organism, Brain_region, Feature_name, Anatom, Value) IF

I:protein_label_image[ proteins ->> {Protein}; organism -> Organism;

anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}],

NAE:neuro_anatomic_entity[name->Anatom; loccated_in->>{Brain_region}],

AS..segments..features[name->Feature_name; value->Value].

Page 13: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 13

... Definition of Integrated Views (Multiple Sources)

• Creating Mediated Classes

• Reasoning with Schema

animal[MR] IF S:source, S.animal [MR] .

X[taxonT] IF X: ‘PROLAB’.animal[name N], words(N,[W1,W2|_]), T: ‘TAXON’.taxon[genus W1;species W2].

union over all classes

association rule

taxon[subspecies string; species string; genus string; … phylum string; kingdom string; superkingdom string].Schema

subspecies::species::genus:: … kingdom::superkingdomAt Mediator

T:TR, TR::TR1 IFT: ‘TAXON’.taxon[Taxon_Rank TR, Taxon_Rank1 TR1],Taxon_Rank::Taxon_Rank1.

Class creation byschema reasoning

Page 14: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 14

Model-Based Mediation with DOMAIN MAPS (DMs)

Integrated-CM(Z1,...) := get X1,... from Src1;

get X2,... from Src2;LINK (Xi, Yj);Zj = CM-QL(X1,...,Y1,...)

LINK(X,Y):

X.zip = Y.zip

X.addr in Y.zipX.zip overlaps Y.county...

• “Semantic Road Maps” for situating source data

=> navigational aid (browsing source classes at the conceptual level)

=> basis for integrated views across multiple worlds

=> link points (concepts) and labeled arcs (roles)

=> formal semantics (in FL and/or DLs)

Example: ANATOM DM

= antatomical entities (concepts) + is_a, has_a, overlaps, ... (roles)

=> from syntactic equality to semantic joins

Page 15: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 15

ANATOM Domain Map ANATOM

Page 16: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 16

ANATOM Domain Map with Registered Data ANATOM DATA

Page 17: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 17

Deductive Closure of “has_a” with “tc(is_a)”:(YES -- Real Recursive Views!! ;-) ANATOM CLOSURE

Page 18: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 18

Example Query Evaluation (I)

• Example: protein_distribution– given: organism, protein, brain_region– ANATOM DM:

• recursively traverse the has_a_star paths under brain_region collect all anatomical_entities

– Source PROLAB:• join with anatomical structures and collect the value of attribute

“image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism

– Mediator:• aggregate over all parents up to brain_region• report distribution

Page 19: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 19

Interactive Queries (I) KIND

Page 20: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 20

Example Query Evaluation (II)

@SENSELAB: X1 := select output from parallel fiber ;@MEDIATOR: X2 := “hang off” X1 from Domain Map;

@MEDIATOR: X3 := subregion-closure(X2);

@NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors);

@MEDIATOR: X5 := compute aggregate(X4);

"How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"

Page 21: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 21

Interactive Queries (II) KIND01

Page 22: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 22

Resulting Sub DOMAIN MAP “Browser” PROTLOC

Page 23: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 23

Computed Protein Localization Data PROTLOC

Page 24: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 24

Client-Side Result Visualization(using AxioMap Viewer: Ilya Zaslavsky)

PROTLOC-AxioMap

Page 25: Model-Based Information Integration in a Neuroscience Mediator System

VLDB2000, CairoVLDB2000, Cairo 25

Summary & Outlook: Federation of Brain Data

CCB, Montana SUSurface atlas, Van Essen

Lab

NCMIR, UCSDstereotaxic atlas LONI

MCell, CNL, Salk

ANATOM

PROTLOC

Result (VML)

Result (XML/XSLT)

MODEL-BASED Mediation