model-based information integration in a neuroscience mediator system
DESCRIPTION
Model-Based Information Integration in a Neuroscience Mediator System. Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San Diego. WWW. DB. A Standard Mediator Architecture ( MIX -- M ediation of I nformation using X ML ). USER-Query. XML Q/A. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/1.jpg)
1
Model-Based Information Integration in a Neuroscience Mediator System
Bertram Ludaescher
Amarnath Gupta
Maryann E. Martone
University of California San Diego
![Page 2: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/2.jpg)
VLDB2000, CairoVLDB2000, Cairo 2
A Standard Mediator Architecture (MIX -- Mediation of Information using XML)
MIX MEDIATOR
INTEGRATED VIEW
USER-QueryUSER-Query
Data Sources
DB Files WWW
Lab1 Lab2 Lab3
Wrapper Wrapper Wrapper
XML Q/A
XML Q/A
XML Integrated View Definition
XML Q/A
![Page 3: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/3.jpg)
VLDB2000, CairoVLDB2000, Cairo 3
Integration Issues
SEMANTIC Integration???
SYNTACTIC/STRUCTURAL Integration
• Integrated Views (Src-XML => Intgr-XML)
• Schema Integration (DTD =>DTD)
• Wrapping, Data Extraction (Text => XML)
MIX
Mediation of Information using XML
SYSTEM Integration
SR
B/M
CA
T
TCP/IP HTTP CORBAstorage, query capabilities
protocols & services
Dis
trib
ute
dQ
ue
ry P
roce
ssin
g
![Page 4: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/4.jpg)
VLDB2000, CairoVLDB2000, Cairo 4
Integration Issues: Mediating across Multiple-Worlds
• Structural Integration=> common semistructured data model (XML)
=> XML queries & transformations to resolve schema conflicts
• Limited Query Capabilities=> mediator is aware of QCs exported by wrappers
• ...
• Semantic Integration– most work deals with issues for “one-world” scenarios
(e.g., amazon.com vs. bn.com)– what if data comes from a “multiple-world” scenario (like
Neuroscience), where data objects from different sources are not even similar, and only the hidden semantics (known to the domain expert) provides the “semantic link”?
![Page 5: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/5.jpg)
VLDB2000, CairoVLDB2000, Cairo 5
A Neuroscience Question
protein localization
What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?
How about other rodents?
??? Integrated View ???
???Mediator ??????Mediator ???
morphometry neurotransmission
Web
CaBP, Expasy
Wrapper WrapperWrapper Wrapper
??? Integrated View Definition ???
![Page 6: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/6.jpg)
VLDB2000, CairoVLDB2000, Cairo 6
Hidden Semantics: Protein Localization
<protein_localization><neuron type=“purkinje cell” /><protein channel=“red”>
<name>RyR</>….</protein><region h_grid_pos=“1” v_grid_pos=“A”>
<density> <structure fraction=“0.8”>
<name>spine</><amount name=“RyR”>0</>
</> <structure fraction=“0.2”>
<name>branchlet</><amount name=“RyR”>30</>
</>
Molecular layer ofCerebellar Cortex
Purkinje Cell layer ofCerebellar Cortex
Fragment of dendrite
![Page 7: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/7.jpg)
VLDB2000, CairoVLDB2000, Cairo 7
Hidden Semantics: Morphometry
<neuron name=“purkinje cell”><branch level=“10”>
<shaft>…
</shaft> <spine number=“1”>
<attachment x=“5.3” y=“-3.2” z=“8.7” />
<length>12.348</> <min_section>1.93</> <max_section>4.47</> <surface_area>9.884</> <volume>7.930</> <head> <width>4.47</>
<length>1.79</> </head>
</spine> …
Branch level beyond 4 is a branchlet
Must be dendritic because Purkinje cells
don’t have somatic spines
![Page 8: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/8.jpg)
VLDB2000, CairoVLDB2000, Cairo 8
The Problem
• Multiple Worlds Integration– compatible terms not directly joinable– complex, indirect associations among schema elements– unstated integrity constraints
• Why not just use Ontologies?– typical ontologies associate terms along limited number of
dimensions
• What’s needed?– a “theory” under which non-identical terms can be “semantically
joined”
=> lift mediation to the level of conceptual models (CMs)=> domain knowledge, ICs become rules over CMs=> Model-Based Mediation
![Page 9: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/9.jpg)
VLDB2000, CairoVLDB2000, Cairo 9
XML-Based vs. Model-Based Mediation
Raw DataRaw DataRaw Data
IF THEN IF THEN IF THEN
LogicalDomainConstraints
Integrated-CM :=
CM-QL(Src1-CM,...)
. . ....
....
........ (XML)Objects
Conceptual Models
XMLElements
XML Models
C2 C3
C1
R
Classes,Relations,is-a, has-a, ...
DOMAIN MAP
Integrated-DTD :=
XML-QL(Src1-DTD,...)
No DomainConstraints
A = (B*|C),DB = ...
Structural Constraints (DTDs),Parent, Child, Sibling, ...
![Page 10: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/10.jpg)
VLDB2000, CairoVLDB2000, Cairo 10
Extended Mediator Architecture
=> Wrappers export Conceptual Models (CMs), i.e., facts+rules for classes, relationships, ICs, ... )
=> Mediator imports CMs (from sources, auxiliary knowledge bases, and domain maps (DMs)
=> a generic conceptual model (GCM, a subset of F-logic), extensible via rules = common target CM language
=> new CMs can be plugged-in by specifying them in GCM + F-logic rules
=> prototype implementation in FLORA:• global-as-view approach• compiler: F-logic => XSB-Prolog• top-down evaluation => virtual (demand-driven) views• external interfaces (XML, RDBs, DM visualization,...)
![Page 11: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/11.jpg)
VLDB2000, CairoVLDB2000, Cairo 11
Model-Based Mediator Architecture
USER/ClientUSER/Client
S1 S2
S3
XML-Wrapper
CM-Wrapper
XML-Wrapper
CM-Wrapper
XML-Wrapper
CM-Wrapper
GCM
CM S1
GCM
CM S2
GCM
CM S3
CM (Integrated View)
MediatorEngine
FL rule proc.
LP rule proc.
Graph proc.XSB Engine
Domain MapDM
Integrated View Definition IVD
Logic API(capabilities)
CM Queries & Results (exchanged in XML)
CM Plug-In
![Page 12: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/12.jpg)
VLDB2000, CairoVLDB2000, Cairo 12
Definition of Integrated Views ...
• XML-2-FL and CM-2-FL Translators
<!ELEMENT Studies (Study)*><!ELEMENT Study (study_id, … animal, experiments, experimenters><!ELEMENT experiments (experiment)*><!ELEMENT experiment (description, instrument, parameters)>
studyDB[studies =>> study].study[study_id => string; … animal => animal; experiments =>> experiment; experimenters =>> string].…
• Specification of Domain Knowledge• Subclasses
• Rules
• Integrity Constraints
• Integrated View Definition
mushroom_spine :: spine
S:mushroom_spine IF S:spine[head_; neck _].
ic1(S):alert[type “invalid spine”; object S] IF S:spine[undef ->> {head, neck}].
protein_distribution(Protein, Organism, Brain_region, Feature_name, Anatom, Value) IF
I:protein_label_image[ proteins ->> {Protein}; organism -> Organism;
anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}],
NAE:neuro_anatomic_entity[name->Anatom; loccated_in->>{Brain_region}],
AS..segments..features[name->Feature_name; value->Value].
![Page 13: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/13.jpg)
VLDB2000, CairoVLDB2000, Cairo 13
... Definition of Integrated Views (Multiple Sources)
• Creating Mediated Classes
• Reasoning with Schema
animal[MR] IF S:source, S.animal [MR] .
X[taxonT] IF X: ‘PROLAB’.animal[name N], words(N,[W1,W2|_]), T: ‘TAXON’.taxon[genus W1;species W2].
union over all classes
association rule
taxon[subspecies string; species string; genus string; … phylum string; kingdom string; superkingdom string].Schema
subspecies::species::genus:: … kingdom::superkingdomAt Mediator
T:TR, TR::TR1 IFT: ‘TAXON’.taxon[Taxon_Rank TR, Taxon_Rank1 TR1],Taxon_Rank::Taxon_Rank1.
Class creation byschema reasoning
![Page 14: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/14.jpg)
VLDB2000, CairoVLDB2000, Cairo 14
Model-Based Mediation with DOMAIN MAPS (DMs)
Integrated-CM(Z1,...) := get X1,... from Src1;
get X2,... from Src2;LINK (Xi, Yj);Zj = CM-QL(X1,...,Y1,...)
LINK(X,Y):
X.zip = Y.zip
X.addr in Y.zipX.zip overlaps Y.county...
• “Semantic Road Maps” for situating source data
=> navigational aid (browsing source classes at the conceptual level)
=> basis for integrated views across multiple worlds
=> link points (concepts) and labeled arcs (roles)
=> formal semantics (in FL and/or DLs)
Example: ANATOM DM
= antatomical entities (concepts) + is_a, has_a, overlaps, ... (roles)
=> from syntactic equality to semantic joins
![Page 15: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/15.jpg)
VLDB2000, CairoVLDB2000, Cairo 15
ANATOM Domain Map ANATOM
![Page 16: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/16.jpg)
VLDB2000, CairoVLDB2000, Cairo 16
ANATOM Domain Map with Registered Data ANATOM DATA
![Page 17: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/17.jpg)
VLDB2000, CairoVLDB2000, Cairo 17
Deductive Closure of “has_a” with “tc(is_a)”:(YES -- Real Recursive Views!! ;-) ANATOM CLOSURE
![Page 18: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/18.jpg)
VLDB2000, CairoVLDB2000, Cairo 18
Example Query Evaluation (I)
• Example: protein_distribution– given: organism, protein, brain_region– ANATOM DM:
• recursively traverse the has_a_star paths under brain_region collect all anatomical_entities
– Source PROLAB:• join with anatomical structures and collect the value of attribute
“image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism
– Mediator:• aggregate over all parents up to brain_region• report distribution
![Page 19: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/19.jpg)
VLDB2000, CairoVLDB2000, Cairo 19
Interactive Queries (I) KIND
![Page 20: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/20.jpg)
VLDB2000, CairoVLDB2000, Cairo 20
Example Query Evaluation (II)
@SENSELAB: X1 := select output from parallel fiber ;@MEDIATOR: X2 := “hang off” X1 from Domain Map;
@MEDIATOR: X3 := subregion-closure(X2);
@NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors);
@MEDIATOR: X5 := compute aggregate(X4);
"How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"
![Page 21: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/21.jpg)
VLDB2000, CairoVLDB2000, Cairo 21
Interactive Queries (II) KIND01
![Page 22: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/22.jpg)
VLDB2000, CairoVLDB2000, Cairo 22
Resulting Sub DOMAIN MAP “Browser” PROTLOC
![Page 23: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/23.jpg)
VLDB2000, CairoVLDB2000, Cairo 23
Computed Protein Localization Data PROTLOC
![Page 24: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/24.jpg)
VLDB2000, CairoVLDB2000, Cairo 24
Client-Side Result Visualization(using AxioMap Viewer: Ilya Zaslavsky)
PROTLOC-AxioMap
![Page 25: Model-Based Information Integration in a Neuroscience Mediator System](https://reader035.vdocument.in/reader035/viewer/2022062721/568135d6550346895d9d461e/html5/thumbnails/25.jpg)
VLDB2000, CairoVLDB2000, Cairo 25
Summary & Outlook: Federation of Brain Data
CCB, Montana SUSurface atlas, Van Essen
Lab
NCMIR, UCSDstereotaxic atlas LONI
MCell, CNL, Salk
ANATOM
PROTLOC
Result (VML)
Result (XML/XSLT)
MODEL-BASED Mediation