i3master

110
Gio Wiederhold I3 1 Integration of Information from Heterogeneous Sources October 2001 Gio Wiederhold Stanford University I3 Master

Upload: gio-wiederhold

Post on 05-Dec-2014

237 views

Category:

Documents


1 download

DESCRIPTION

This is oldish set on an engineering-based approach to sharing diverse and heterogeneous data. It complements a paper about to be published in a Springer collection by Tansel et al. as well as recent Health care record systems discussions.

TRANSCRIPT

Page 1: I3master

Gio Wiederhold I3 1

Integration of Information from

Heterogeneous Sources

October 2001

Gio Wiederhold

Stanford University

I3 Master

Page 2: I3master

Gio Wiederhold I3 2

Change is constant

Changes are imposed by

• Technology advance

• Local government

• Federal rules

• Competition

• Emerging standards

Systems must be designed and operated to recognize and adapt to change

Page 3: I3master

Gio Wiederhold I3 3

Information Leverage

Tactical

• Customers

• Inventory

• Suppliers

Strategic

• Planning

• Capabilities

• Opportunities

a variety of

internal sources

external and

imprecise sources

Page 4: I3master

Gio Wiederhold I3 4

Information overload Data starvation

• More databases

– public & corporate

• Faster communication

– digital

– packeting: TCP-IP, ATM

• World-wide connectivity

– internet

– world-wide web

• Disintermediation

– ubiquitous publishing

Page 5: I3master

Gio Wiederhold I3 5

Focus on Information Systems

Processing

as

Analyses

Payroll, . . .

Information

Systems

(on-line and

. distributed,

. . . )

Real-time

control of

processes,

factories, . . .

Computing

Systems

Page 6: I3master

Gio Wiederhold I3 6

Data and Knowledge

Information is

created at the

confluence of

data -- the state

&

knowledge --

the ability to

select and

project the

state into

the future

Knowledge Loop

Experience

Action

Data Loop

Storage Education

Selection

Integration

Abstraction

Decision-making

State changes

Recording

Page 7: I3master

Gio Wiederhold I3 7

Knowledge Manifestations

• Procedural

• system analysts

• programmers

• Declarative

• domain analysts

• knowledge engineers

• rule writers

• Creators

• faster

• Maintainers

• easier

}-{

Page 8: I3master

Gio Wiederhold I3 8

Transform Data to Information

Application Layer

Mediation Layer

Foundation Layer

data and simulation resources

value-added services

decision-makers at workstations

Page 9: I3master

Gio Wiederhold I3 9

Dealing With Heterogeneity

• Hardware platform . . . . .

• Operating system . . . . . .

• Programming language . . .

• Database system model . .

• Database system . . . . . . .

• Coverage . . . . . . . . . . . . . – Attributes

– Scope

• Data representation . . . . .

• Data semantics . . . . . . . . .

Hidden by operating system

Choices are reducing: NT, UNIX, ...

Fewer choices

Irrelevant in remote access

Relational and E-R common

Standards, convergence

Source dependent documented, additive

undocumented, intersecting

Conversion problems, nulls

Requires knowledge

Page 10: I3master

Gio Wiederhold I3 10

Definition*

A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications.

It should be small and simple, so that it can be maintained by one expert or, at most, a small and coherent group of experts.

* Wiederhold: IEEE Computer March 1992

Page 11: I3master

Gio Wiederhold I3 11

Flow in mediation

• DELIVERY

t s

• SUMMARIZATION

t s

• INTEGRATION

t s

• ABSTRACTION

t s

• ACCESS

Page 12: I3master

Gio Wiederhold I3 12

Functions inside Mediation

Selection

Summarize

Transform Hetero-

genous

resources

articulation

Page 13: I3master

Gio Wiederhold I3 13

Example in Health Care

Health Care Planner

Patient

Care domain

Investment

domain

Loan Interest Patient Volume Growth State Support

Bond Sales Service Operations Age Profile

Will the Clinic loose Money?

Gio Wiederhold. 1995

Page 14: I3master

Gio Wiederhold I3 14

Functional Layer

Service

interface

Resource access

interface

User interface

Real-world

interface

Human-computer

Interaction

Application-

specific code

Domain-

specific

code

Source-

specific

code

MEDIATION

Page 15: I3master

Gio Wiederhold I3 15

Function of Mediation

Apply Domain-specific Specialist Knowledge to add value

• to locate data sources

• to describe data for use

• to convert for consistency

• to abstract for insight / models

• to extrapolate to new situations

• to integrate from diverse sources

• to re-abstract for presentation

INFORMATION

Page 16: I3master

Gio Wiederhold I3 16

Architectures & Communication

Aggre-

gation

Presen-

tation

Infor-

mation

Access,

Select

Data

Source

Compu- tation

terminl

Appli- cation

SQL for

A&S Data Base

Aggre- gation

User Workst.

Infor- mation

SQL, ... for A&S Distr.

Sources

Compu- tatio

Printed reports

Appli- cation

I-O code Local

Storage

CORBA

Work station

Infor- mation

Object Struct. Server Storage

Compu- tation

Mini- comptr

Appli- cation

Select FTP File

Storage

Function ‘mainframe’ smart file server client server mediated terminal Gio Wiederhold. 1995

Page 17: I3master

Gio Wiederhold I3 17

Current Methods

• Access: WWW with MOSAIC – browsing, collection services: Harvest, ALIWEB, Fish

• SQL with Views – one verb, one database, one datatype

– predefined subsets

• Grouping: Objects with Corba – predefined aggregation with methods

• View-Objects – created via extension of relational algebra

• Summarization – Tables from text documents; Exception search

Page 18: I3master

Gio Wiederhold I3 18

Central Solutions do not Scale

What works with 7 modules and one person in charge

fails when we have 100 and need a committee

Any changes in resources affects the central module

Page 19: I3master

Gio Wiederhold I3 19

Evolution of mediation

W2 W1

D2

D6 D4

W3

I1

D1

D5

I2

M1 M2

A1 A4 A5 A2

A6

a.

b.

A3

c.

d. e.

datasources

wrappers

mediators

network

integrators

applications

D3

Page 20: I3master

Gio Wiederhold I3 20

Domain-specific Mediation

• User application – Workstations

• Mediator – Expert-owned

nodes

• Data sources – Remote primary

and byproduct services

Page 21: I3master

Gio Wiederhold I3 21

Mediation for Quality

User Model

f(S,C,T)

Assessments:

S1=.8 S2=.9 S3=8

BEST=

low cost

rapid response

reliable delivery

trustworthiness

C3= 10+_1

T3=50+_80

Estimates:

C1= 5+_1

T1=100+_160

C2= 8+_1

T2=70+_30

S1 S3 S2

S= source

reliability

C= confidence

T=

Page 22: I3master

Gio Wiederhold I3 22

Allocation Flexibility

User Interfaces

Databases

Provider of Mediator M

Copy- if high

intensity of

interaction with

1. Application (M2)

2. Resources (N1,2)

3. Processing (M1)

Provider of medi- ator N

N

M

HPC

DB P

M1

Application C Application B Application I

M2

N 1

N 2 DBS R DB

Q

Mediators are only code

Page 23: I3master

Gio Wiederhold I3 23

Features of Mediation

• Domain-specific partitioning for Creation and Maintenance

• Network-basing for easy Reconfiguration

• Caching to deal with Asynchronocity

• Replication for

Performance

E

A1’ A1

A B

C

D

Page 24: I3master

Gio Wiederhold I3 24

Allocation Flexibility

User Interfaces

Databases

Provider of Mediator M

Copy- if high

intensity of

interaction with

1. Application (M2)

2. Resources (N1,2)

3. Processing (M1)

Provider of medi- ator N

N

M

HPC

DB P

M1

Application C Application B Application I

M2

N 1

N 2 DBS R DB

Q

Mediators are only code

Page 25: I3master

Gio Wiederhold I3 25

Central Solutions do not Scale

What works with 7 modules and one person in charge

fails when we have 100 and need a committee

Changes in resources affect the intermediary modules

Page 26: I3master

Gio Wiederhold I3 26

Integration at two levels

Application

• Informal, pragmatic

• User-control

Mediation

• Formal service

• Domain-Expert control

Gio Wiederhold. 1995

Page 27: I3master

Gio Wiederhold I3 27

Status of Mediation Technology

Today

• Handcrafted

• Expert consults with programmer

• Programmer codes the knowledge needed

• Resource changes require advise, program update

Future

• Generated from models

• Domain Expert maintains models

• Specification determines functions

• Resource changes trigger regeneration

Page 28: I3master

Gio Wiederhold I3 28

Facilitators

Facilitators Procure Linkages

• search for suitable resources

• resolve terminological mappings

• build system configurations

• issue subqueries, as needed

• combine results from subqueries

perform these tasks dynamically

without human intervention

depend greatly on ontologies

• can call on mediators for value added services

Another Module Type in

Information Systems

Page 29: I3master

Gio Wiederhold I3 29

Facilitators and Mediators

designed dynamic

accessible ontology

Page 30: I3master

Gio Wiederhold I3 30

Available Technology/Science

Caching

User Models

Uncertainty algebras

Spatial abstractions

Temporal Algebras

Active Databases

Agents Deductive Databases

Security Filters

Object Bases

Knobots

Wrappers DB Views

High Perf.Comm. Simulation Access Database Models

Human Lang. Proc.

Circumscription

Geographic Models

Constraint Management

Case-based Reasoning

Distributed Storage Systems

Domain Ontologies

Page 31: I3master

Gio Wiederhold I3 31

Status of Mediation Technology

Today

• Handcrafted

• Expert consults with programmer

• Programmer codes the knowledge needed

• Resource changes require advise, program update

Future

• Generated from models

• Domain Expert maintains models

• Specification determines functions

• Resource changes trigger regeneration

Page 32: I3master

Gio Wiederhold I3 32

Databases / Web / Text / Simulation

Coverage of Current I3 Efforts

Facilitation (auto linking)

Maintenance (rule technology?)

Discovery (web,schema searching)

Wrapping (syntactical heterogeneity)

Integration over sources

Abstraction for relevance to customer

Mediators for multiple domains

Caching / History

Good progress / active research / related work / poor coverage

:-[

:-[

:-[

:-(

:-(

:-)

:-)

:-( :-[

:-)

:-(

( ] | )

Security

for cooperation :-(

:-|

:-|

:-)

Page 33: I3master

Gio Wiederhold I3 33

Building Stovepipes

Scaffolding

Similar

functions,

different

assign-

ments to

modules

Scaffolding

Mismatched

assump-

tions

Gio Wiederhold. 1995

Page 34: I3master

Gio Wiederhold I3 34

Middleware

CORBA (Common Object Request Broker)

– IBM SOM, DSOM • DOE (Distributed .Objects Everywhere)

– SunSoft • DOME • EZ-bridge

– System Strategies inc. • ILU (InterLanguage Unification) Xerox

• ISIS • KQML (Knowledge Query & Manipulation Lang.)

• MQM (Message Queing Middleware) – IBM (for mainframe connections)

• OLE (Object embedding and Linking)

• OpenDOC (Apple)

• PDES (Product Data Interchange using STEP)

• TIB (Teknekron Information Bus)

{ Shared

speci-

fication

Man

y s

tan

da

rds

by

man

y v

en

do

r g

rou

ps

Page 35: I3master

Gio Wiederhold I3 35

New Tools

From the ARPA-Sponsored Knowledge Sharing Effort

• KQML: Knowledge Query & Manipulation Language

More Verbs: Performatives

Multi-source, Multi-mediator, Multi-content

• KIF: Kowledge Interchange Formalism

Exchange complex data, rules, . . .

among Expert Systems and Subsystems

• LOOM: Classification-based Expert System

• Ontolingua: Repository for Domain Terminologies

Page 36: I3master

Gio Wiederhold I3 36

KQML KNOWLEDGE QUERY & MANIPULATION LANGUAGE

= Ontology

= Representation

• Get,

• Put,

• Infer,

• Subscribe,

• Advertise,

• . . .

}

speak KIF, objects,

tuples, equations

Hq97

Page 37: I3master

Gio Wiederhold I3 37

KQML APIs

Several suppliers Multiple platforms

Fat and thin versions Mainly to Internet (TCP/IP)

Not (yet) shrinkwrapped, require interaction

– Un.of Maryland, Baltimore County, with UNISYS

– Stanford Design Projects ABSE [Gensererth et al.]

– Crystalliz (Cambridge MA), transmits PDES, SQL on PC

– BBN for planning, rapid assembly of joint task forces

– ISX (Westlake Village, CA) Demonstration tools

– Toronto Univ. Enterprise Integration Laboratory

– EITech Servicemail (uses email to go across firewalls)

FAT THIN

Page 38: I3master

Gio Wiederhold I3 38

KIF -- Knowledge Interchange

Transmits among

Expert Systems

• LOOM

• Ontolingua

• others

ANSI X3T2 evaluation

Compatible with Conceptual Graphs

Used by KQML to describe choices

Page 39: I3master

Gio Wiederhold I3 39

Two Design Phases

1. Resource Integration

2. Customer Focusing

Co mmon

M odel

Page 40: I3master

Gio Wiederhold I3 40

Mediator Design Principle

Transform Data into Information

Match

Customer Model

Hierarchical

to

Resource Model

General network

(and maintain models)

Page 41: I3master

Gio Wiederhold I3 41

Fat versus thin mediators

• too broad:

hard to maintain, needs a committee

• too thin: insufficient added value

• Too fat: hard to

compose

• Too narrow: few costumers

domain scope

service

scope

Just right

Page 42: I3master

Gio Wiederhold I3 42

Heterogeneity among Domains

If interoperation involves distinct

domains mismatch ensues

• Autonomy conflicts with consistency,

– Local Needs have Priority,

– Outside uses are a Byproduct

Heterogeneity must be addressed

• Platform and Operating Systems 4 4

• Representation and Access Conventions 4

• Naming and Ontology :

Page 43: I3master

Gio Wiederhold I3 43

Unsolved problem in Interoperation

Common assumption in assembling and integrating distributed information resources

• The language used by the resources is the same

• Sublanguages used by the resources are subsets of a globally consistent language

This assumption is provably false.

Working towards the goal of global consistency is

1. naïve -- the goal cannot be achieved

2. inefficient -- languages are efficient in local contexts

Page 44: I3master

Gio Wiederhold I3 44

Ontology: components .

We represent the contents and structure of a languages by its ontology:

• a set of well-defined terms, which delimit the domain of discourse

• relationships among those terms, chosen from a limited set

a formalizable subset of expert knowledge

Page 45: I3master

Gio Wiederhold I3 45

SKC’s grounded definition .

• Ontology:

a set of terms and their relationships

• Term:

a reference to real-world and abstract objects

• Relationship:

a named and typed set of links between objects

• Reference:

a label that names objects

• Real-world object:

an entity instance with a physical manifestation

• Abstract object:

a concept which refers to other objects

Page 46: I3master

Gio Wiederhold I3 46

Where are Ontologies found?

Ontologies allow communication among partners in enterprises (rarely in machine-readable form)

Relationships determine meaning - parent, school, company

Variable and Class names in Software

Databases use ontologies during design in

their E-R diagrams (implicitly) and to represent the leaf nodes in their schemas.

Knowledge-bases use term ontologies (often

explicitely), add class definition (to hold instances), constraints, and operations among the terms.

Page 47: I3master

Gio Wiederhold I3 47

Establishing Ontologies

Top-down:

–Commonly acceptable UPPER layers

Domain-specific

–Analysis and Sharing tools –Model and Object-type based

Bottom-up

–Wordlist creation from task-specific collections

–Database models, schemas, and contents

Page 48: I3master

Gio Wiederhold I3 48

Large Ontologies: good or bad?

Have all the Knowledge together + simple for customers of KBs – hard for owners of KBs, must synchronize with many

others – in the limit -- everybody must be globally consistent

Large KB will cover multiple / all domains created by a committee -- slow

maintained by a committee -- costly

Differences in level of abstraction -- efficiency homeowner: nail carpenter: sinker, brad, boxnail, . . .

Page 49: I3master

Gio Wiederhold I3 49

Domain ontology assumption .

• a domain will contain known objects

• the object configuration is consistent

• within a domain all terms are consistent &

• relationships among objects are consistent

• context is implicit in use

• explicit context is needed

for external use

No committee is needed to forge compromises * within a domain

Compromises hide valuable details

Domain Ontology

Page 50: I3master

Gio Wiederhold I3 50

SKC Objective

Provide for Maintainable Ontologies

• devolve maintenance onto many domain-specific experts / authorities

• provide an algebra to compute composed ontologies that are limited to their articulation terms

• enable interpretation within the source contexts

SKC

Page 51: I3master

Gio Wiederhold I3 51

Conservative assumption !

When dealing with multiple ontologies one can never be sure that identically or similarly spelled words mean the same thing,

I.e, refer to exactly the same set of real-world objects under all current and future conditions

• Common, optimistic assumption: Meaning is identical

– Gets worse when terms are stemmed

• SKC, conservative or pessimistic assumption: Meaning never matches, unless there is a match rule

– number of matching rules is reduced by focusing on the articulation

Page 52: I3master

Gio Wiederhold I3 52

An Ontology Algebra

A knowledge-based algebra for ontologies

The Articulation Ontology (AO) consists of matching rules that link domain ontologies

Intersection create a subset ontology keep sharable entries

Union create a joint ontology merge entries

Difference create a distinct ontology remove shared entries

Page 53: I3master

Gio Wiederhold I3 53

Sample Operation: INTERSECTION

Source Domain 1:

Owned and maintained

by Store

Result contains

shared terms

Source Domain 2:

Owned and maintained

by Factory

Terms useful

for purchasing

Page 54: I3master

Gio Wiederhold I3 54

INTERSECTION support

Store

Ontology

Articulation ontology

Matching

rules that use

terms from the

2 source domains

Factory

Ontology

Terms useful

for purchasing

Page 55: I3master

Gio Wiederhold I3 55

Sample Intersections

Shoe Store

• Shoes { . . . }

• Customers { . . . }

• Employees { . . . }

size = size

color =table(colcode)

style = style

Ana-

tomy {. . . }

• Material inventory {...}

• Employees { . . . }

• Machinery { . . . }

• Processes { . . . }

• Shoes { . . . }

Shoe Factory

Hard-

ware

Articulation ontology matching rules :

foot = foot Employees Employees

Nail (toe, foot) Nail (fastener) . . . . . . Department

Store

Page 56: I3master

Gio Wiederhold I3 56

Other Basic Operations

typically prior

intersections

UNION: merging

entire ontologies

DIFFERENCE: material

fully under local control

Arti-

culation

ontology

Page 57: I3master

Gio Wiederhold I3 57

Features of an algebra

Operations can be composed

Operations can be rearranged

Alternate arrangements can be evaluated

Optimization is enabled

The record of past operations can be

kept and reused

Page 58: I3master

Gio Wiederhold I3 58

Articulation

knowledge for

U

U

U

(A B) U

(B C) U

(C E)

Knowledge Composition

Knowledge

resource

B

Knowledge

resource

A

Knowledge

resource

C

Knowledge

resource

D

U

(C D)

U

(B C)

Articulation knowledge

for

Composed knowledge for

applications using A,B,C,E

Knowledge

resource

E

U

(C E)

Legend:

U : union

U

: intersection

Articulation

knowledge for (A B)

U

Page 59: I3master

Gio Wiederhold I3 59

Sample Processing in HPKB

• What is the most recent year an OPEC member nation was on the UN security council?

– Related to DARPA HPKB Challenge Problem

– SKC resolves 3 Sources » CIA Factbook ‘96

(nation) » OPEC (members, dates) » UN (SC members, years)

– SKC obtains the Correct Answer

» 1996 (Indonesia)

– Other groups obtained more,

but factually wrong answers

– Problems resolved by SKC * Factbook has out of date

OPEC & UN SC lists

• Indonesia not listed • Gabon (left OPEC

1994) * different country names

• Gambia => The Gambia

* historical country names

• Yugoslavia » UN lists future security

council members

• Gabon 1999 » intent of original question

• Temporal variants

Page 60: I3master

Gio Wiederhold I3 60

Tools to create articulations

Graph matcher

for

Articulation-

creating

Expert

Vehicle

ontology

Transport

ontology

Suggestions

for articulations

Page 61: I3master

Gio Wiederhold I3 61

continue from initial point

Also suggest similar terms

for further articulation:

• by spelling similarity,

• by graph position

• by term match repository

Expert response:

1. Okay

2. False

3. Irrelevant

to this articulation

All results are recorded

Okay’s are converted into articulation rules

Page 62: I3master

Gio Wiederhold I3 62

Candidate Match Repository

Term linkages automatically extracted from 1912 Webster’s dictionary *

* free, other sources

.have been processed.

Based on processing

headwords definitions

using algebra primitives

Notice presence

of 2 domains:

chemistry, transport

Page 63: I3master

Gio Wiederhold I3 63

Using the match repository

Page 64: I3master

Gio Wiederhold I3 64

Navigating the match repository

Page 65: I3master

Gio Wiederhold I3 65

Primitive Operations

Unary

• Summarize -- structure up

• Glossarize - list terms

• Filter - reduce instances

• Extract - circumscription

Binary

• Match - data corrobaration

• Difference - distance measure

• Intersect - schem discovery

• Blend - schema extension

Constructors

• create object

• create set

Connectors

• match object

• match set

Editors

• insert value

• edit value

• move value

• delete value

Converters

• object - value

• object indirection

• reference indirection

Model and Instance

Page 66: I3master

Gio Wiederhold I3 66

Future: exploiting the result

Processing & query evaluation

is best performed within Source

Domains & by their engines

Result has links

to source

Avoid n2 problem of interpreter

mapping as stated by Swartout

as an issue in HPKB year 1

Page 67: I3master

Gio Wiederhold I3 67

SKC Synopsis

• Research: Reliable query answers from heterogeneous, imperfect

data sources

• Sources:

– General: CIA World Factbook ‘96, UN www, OPEC www

Webster’s Dictionary, Thesaurus, Oxford English Dictionary

– Topical: OPEC, BattleSpace Sensors, Logistics Servers

• Client: DARPA High Performance Knowledge Base

(HPKB) project

• Theory: Rule-based algebra

– Translation & Composition primitives

Page 68: I3master

Gio Wiederhold I3 68

Innovation in SKC

• No need to harmonize full ontologies

• Focus on what is critical for interoperation

• Rules specific for articulation

• Potentially many sets of articulation rules

• Maintenance is distributed –to n sources –to m articulation agents

is m < n2 , depending on architecture density a research question

Page 69: I3master

Gio Wiederhold I3 69

Wrapper / API

Modules to be

composed

mega-

program-

mer Mega-program

Text

Mega-

program

Wrapper / API

Modules to be

composed

Wrapper / API

Module to be

composed

Result

GUI

Module / platform

descriptions

CHAIMS

compiler Feedback

customer

Mega-programming Process

Page 70: I3master

Gio Wiederhold I3 70

Decomposing CALL statements

Copying

Code sharing

Parameterized computation

Objects with overloaded method names

Remote procedure calls to distributed modules

Constrained (black box) access to encapsulated data

progress

in

scale of

computing

Extract Invoke Estimate Inspect Set Up

CHAIMS

decomposes

CALL

functions

Page 71: I3master

Gio Wiederhold I3 71

Maintenance is good for you re

lati

ve a

nn

ual

main

ten

an

ce c

os

t

de

pre

cia

tio

n =

1 /

lif

eti

me

100%

automobile software hardware

0

40

20

70

30

10

80

90

60

50

?

4

2

7

3

1

8

9

6

5

10 years

life

tim

e

13

11

12

Page 72: I3master

Gio Wiederhold I3 72

Growing Systems: n modules

Federated: to deal with many servers and clients

resource reuse

changes are difficult

affect many clients

Page 73: I3master

Gio Wiederhold I3 73

Systems with Mediators

Applications . . . .

Mediators . . . . . .

Data Resources . . .

Gio Wiederhold. 1995

Page 74: I3master

Gio Wiederhold I3 74

Growth through Reuse

New Application

Prior & Revised

Mediators

Extended Data

Resources

Gio Wiederhold. 1995

Page 75: I3master

Gio Wiederhold I3 75

Linear O(n) Cost of Growth now O(n2)

• Data changes only affect some mediators; only in their domain

• Mediators can

1. supply old information to n-1 prior applications

2. provide better information to the new application

3. be partially or completely reused

• New applications, using the new data, can be developed and inserted dynamically

2 7

Page 76: I3master

Gio Wiederhold I3 76

A mediator Is not just static software

Software & People

Application

Interface

Resource Interfaces

Owner/ Creator

Maintainer

Lessor - Seller

Advertisor

Changes of

user needs

Domain

changes

Resource

changes

Models, programs,

rules, caches, . . .

Page 77: I3master

Gio Wiederhold I3 77

Assigning maintenance responsibility

a. Source data quality – supplier database, files, or web pages

b. Interface to the source – wrapper, supplier or vendor for supplier

c. Source selection – expert specialist in mediator

d. Source quality assessment – customer input to mediator

e. Semantic interoperation – specialist group providing input to the mediator

f. Consistency and metadata information – mediator service operation or warehouse

g. Informal, pragmatic integration – client services with customer input

h. User presentation formats – client services with customer input

Services

Sources

Customers

Page 78: I3master

Gio Wiederhold I3 78

Sample projects

• Tsimmis at Stanford

• E-Commerce in Digital Libraries

• INEEL: information integration for environmental restoration

• MIFT: feedback for training

• Civil Engineering and Architecture

• F-22

• SimQL

• Security

Page 79: I3master

Gio Wiederhold I3 79

Projects at Stanford DB group

Data Mining.

Mediator & Wrapper

Generation.

Warehousing.

Security Mediators.

Megaprogramming.

Simulation Access.

Changes, Consistency,

and Configurations.

TSIMMIS

CHAIMS SimQL

TIHI

C3

MIDAS

WHIPS

Page 80: I3master

Gio Wiederhold I3 80

The TSIMMIS Project Ramana Yerneni, Yannis Papakonstantinou, ...

• Objective: Support mediation technology

– integrated access to distributed, autonomous, heterogeneous data sources,

using object fusion

–wrapper toolkit to rapidly create wrappers, based on source specification, a uniform interface to heterogeneous sources

–mediator toolkit to rapidly construct mediators, based on a mediator specification, to integrate data from a set of wrappers

Page 81: I3master

Gio Wiederhold I3 81

Investors Need to Fuse Information

from Multiple Sources .

• group together information about

the same real-world entity

• remove redundancies

• resolve conflicts

WWW

Ticker Tape Personal

database

Network

Page 82: I3master

Gio Wiederhold I3 82

An Integration Architecture

Client

Application

business reports

portfolios for each company

stock market prices

Wrapper Wrapper

Ticker

Tape Dialog

Mediator

Page 83: I3master

Gio Wiederhold I3 83

Additional Challenge: Sources Without a

Well-Structured Schema

• semistructured

– irregular

– deeply nested

• incomplete

schema knowledge

– autonomous

– dynamic

• World Wide Web

• SGML documents

• genome, chemical

structures

• bibliographic

information

• files

Examples

Page 84: I3master

Gio Wiederhold I3 84

Wrappers & Mediators from High-Level Specifications

Wrapper

Client

Mediator Specification

Interpreter

DeclarativeMediator

Specification

Source Source

Declarative

Source

Specifications

Mediator

Wrapper

Wrapper Specification

Interpreter

Page 85: I3master

Gio Wiederhold I3 85

E-money

Services must be paid for

• Incentive for creation and improvement

• price proportional to value added, often small

• profit f (cost, market, price, overhead )

• price low per item, so overhead must be low

Simple payment (no credit accounts, checks)

Enabled through secure signatures

yes

Page 86: I3master

Gio Wiederhold I3 86

E-Commerce in the Digital Library

Delivery Cryptolope

DigiBox

HTTP

E-mail

Shopping Models: Pay-per-view, Subscription,

Session, Shareware, Auctions, Site License,

Major

Integration

Problem

Steven Ketchpel & DL Economics Group

Payment CyberCash

DigiCash

First Virtual

SET

Gift Certificate, Layaway, Pre-paid vouchers, … .

Page 87: I3master

Gio Wiederhold I3 87

Shopping model: merchant-independent

logic controlling flow of business model

Example shopping models:

Order, Pay, (Deliver 52 times)

(1 month; Order, Deliver) Pay

Bill

Start Transfer $

Order

Complete

Payment

Complete

Event Handlers

2 1

3 4

Even

t Han

dle

rs

Even

t Han

dle

rs

Proxy event handlers

translate from

native applications

to shopping model

defined protocols

Abstract API

allows application to

interact with many

different services

in a consistent way

Payment/Delivery/

Other Services

Customer Merchant

Event Handlers

State

Information

Page 88: I3master

Gio Wiederhold I3 88

TSIMMIS Status

• Mediator Specification Interpreter running on Ultrix, AIX, OSF.

• 9000 lines of C/C++ code

• 4000 C++ lines of Server/Client Support Libraries

• Integration of three disparate bibliographic sources

– legacy system

– flat BibTeX files

– relational DB

– wwWeb files

Page 89: I3master

Gio Wiederhold I3 89

Mediator Specification Interpreter Architecture

Query Rewriter

Cost-Based Optimizer

Datamerge Engine

Mediator

Specification

Query

logical datamerge

program

plan

Result

Queries to

Wrappers Results

Page 90: I3master

Gio Wiederhold I3 90

Environmental Restoration at INEL Undoing 50 years of messes

…. MQL [ISX]

MSL [Stanford] OQL [ODMG]

QEM

mediator

QEM QEM

QEM

QEM

QEM

CORBA

other

mediators

OEM

OEM OEM

OEM

OEM OEM

OEM

QEM

QEM

Idaho National

Engineering Laboratory

LOCKHEED MARTIN ISX - Stanford Univ.

Many projects

many sources

wrapper

wrapper

ERIS

wrapper

IEDMS

wrapper

Page 91: I3master

Gio Wiederhold I3 91

CHAIMS - software composition

Domain expert Client workstation

Computation Services

IO module

MEGA modules

IO module

a

b

c d

e

Data Resources

Sites R

T S U T

C

Page 92: I3master

Gio Wiederhold I3 92

Mediation to Implement Feedback in Training

David Maluf, Priya Panchapagesan, Ted Linden

Abstraction to match levels of granularity

Abstraction

Another task of mediators, prior to integration

MIFT

Page 93: I3master

Gio Wiederhold I3 93

Mediation Feedback:

Playback or Graph

Janus SimNet

Trainees

Observers

Commanders Training

Developers Analysts

Wrapped

Simulation

Resources

Mediation

Layers

Application

Layer

Mediators with

rules in CLIPS

Standards

in KQML

Wrappers

in C/C++

UI in

Java

User Interface

I.D.A

Stanford

Objectives

Tasks

Page 94: I3master

Gio Wiederhold I3 94

MIFT . Result .

Analyses:

• Force ratio

• Losses

• Area gain

Exercise

Simulator

Type

Page 95: I3master

Gio Wiederhold I3 95

Control Valve Sizing, Future

• Interpretation – Programmatic

• Analysis – Integrated

• Evaluation – Integrated

• Transformation – Automated

From Andrew Arnold: Civ. Eng. Qualification Exam

Page 96: I3master

Gio Wiederhold I3 96

F-22 IWSDB Phase 6

Integration Services User Interfaces

S Q L

PD DS

Wrappers Databases

Domain Model

Match maker

Domain Matching

Change Notification

Query Re- formulation

Provi- sioner

Engi- neer

Appli- cation PRIDE

IWSDB client

GUI

WAIS server

Index

Suppliers

Sy- base

Page 97: I3master

Gio Wiederhold I3 97

Current state of DM Support

• Spreadsheets

• Planning of allocations

• Other simulations

various point assessments

past now future time

Data integration

distributed, heterogeneous

x17 @qbfera

ffga 67 .78 jjkl,a

nsnd nn 23.5a

Databases

Intuition +

organized support disjointed support

Page 98: I3master

Gio Wiederhold I3 98

Information Systems should also Project into the Future

time past now future

Msg

systems,

sensors

Databases,

accessed via SQL or

CORBA compliant

wrappers

Simulations,

accessed via SimQL and

compliant wrappers

Page 99: I3master

Gio Wiederhold I3 99

SimQL: Simulation Access Service

Decision-making requires dealing with the future, as well the past

• Databases deal well with the past

• Sensors can provide current status

• Spreadsheets, simulations deal with the likely futures

Information systems should be able to combine all three

time past SQL now SimQL future

Information Systems should also deal

with the Future

Page 100: I3master

Gio Wiederhold I3 100

Stanford experiment, supported by DARPA & NIST

Phase 1 Architectures

Spreadsheets Engineering

wrapper wrapper wrapper

Logistics

Application Manufacturing

Application

Weather

(short-, long-term)

wrapper

Test

Data

SimQL access SimQL access

SimQL access

SQL access

Page 101: I3master

Gio Wiederhold I3 101

Enabling Interoperation

Databases • serve clients via SQL by

Sharing a Model (The Schema)

A query language over the model the SQL interface enables • independence of

application development DBMS technology development reuse of infrastructure

Today

• most new systems use a DBMS for data storage

even with less performance, inability to handle all problems, but enough of them well enough.

Simulations should • serve clients via SimQL by

Sharing a Model (research q.) A query language over the model

a SimQL interface will enable • independence of

application development simulation technology develop’t reuse of infrastructure

Objective

• build information systems combining DBMS, Simulations

even with less performance, inability to handle all problems, but enough of them . . .

Page 102: I3master

Gio Wiederhold I3 102

Internet requirements

• Ubiquitous acess to simulations

of a wide variety of types

• Rapid response to parameter changes

– often High-Performance computation is

needed

– distributed simulations with synchronization

• Rapid Service Composition

– High bandwidth among simulations

– Acces to multiple services in parallel

Page 103: I3master

Gio Wiederhold I3 103

Even the present needs SimQL

time past now future

last recorded observations

simple simulations

to extrapolate data

Is the delivery truck in X?

• Is the right stuff on the truck?

• Will the crew be at X?

• Will the forces be ready to accept delivery?

point-in-time for situational assessment

Not all data are current::

Page 104: I3master

Gio Wiederhold I3 104

Use of Simulation Results

Simulation results can be composed for

Alternative Courses-of-actions

Composition should be seamless, elegant, with computation and recomputation of likelihoods

Results change as now moves forwards and eliminates earlier alternatives.

Page 105: I3master

Gio Wiederhold I3 105

Types of simulation services

1. Continously executing: weather prediction – SimQL result reports best match samples

2. Execution specific to query: what-if assessment – may require HPC power for adequate response

3. Past simulations collect results in a base: materials – performs inter- or extra-polations to match query parameters

4. Combinations, i.e., 2. + 3.: top layer simulation using stored

partial lower level results: weapon performance in new setting

5. Human-in-the-loop (mediated by an agent program): SAFs

Note

• A simulation service program can be written in any language

• A simulation service must be compliant to the interface spec.

Page 106: I3master

Gio Wiederhold I3 106

Tools for Managing Partitioning

Separate internals and interfaces, at many levels

• Object Libraries

• Product Design hierarchical standards (PDES)

• Domain-Specific Systems Analysis (DSSA)

• Ontology documentation (Ontolingua)

• Remote Object Access (CORBA 1.2, 2.0)

• Knowledge Interchange Formalism (KIF)

• Transport in / of heterogeneous situations (KQML specifies content repr., ontology)

Page 107: I3master

Gio Wiederhold I3 107

Moving to a Service Paradigm

• Server is an independent contractor, defines service

• Client selects service, and specifies parameters

• Server’s success depends on value provided

• Some form of payment received for services

x,y

Databases are a current example.

Simulations have the same potential.

Page 108: I3master

Gio Wiederhold I3 108

New Role for Consultants

Old

• Used at Design Time

and

• To Explain Failures

Future

• Available as a Service

• Responsible for Knowledge Maintenance

Page 109: I3master

Gio Wiederhold I3 109

Integration

Science

Artificial

Intelligence

knowledge mgmt

domain expertise

uncertainty

Systems

Engineering

analysis

documentation

costing

Databases

access

storage

algebras

Long Range Science Vision

Integration Methods

GIS

Spatial is special.

Page 110: I3master

Gio Wiederhold I3 110

Summary

• Mediation bridges Applications and Sources

• Mediator technology transforms data to information by applying an expert maintainer’s knowledge

• Abstraction reduces data further for decision making

• Must be integrated with sensors, simulation results

• Mediation permits incremental system growth (nlogn)

• Mediators provide a service-model on the networks

New research

Recognition and resolution of semantic differences

Simulation access as a new service

more on http://www-db.stanford.edu/people/gio.html