from enterprise information integration to community-based mediation alin deutsch, yannis katsis,...

73
From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation by on joint works with CSE Department CSE Department

Upload: juliet-golden

Post on 05-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

From Enterprise Information Integration to Community-Based Mediation

Alin Deutsch,

Yannis Katsis,

Michalis Petropoulos

Yannis Papakonstantinou

A presentation by

on joint works with

CSE DepartmentCSE Department

Page 2: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Data Integration Requirements & Desiderata (high level)

• Provide application with integrated database– single point of (query/update) access to the

data• Provide distribution and heterogeneity

transparency– heterogenous formats, heterogenous

interfaces, different rates of change (static versus dynamic), autonomous sources

• Decouple application logic from integration• Easily add/change sources• Customize the delivery of content

Page 3: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Integration Software

Most-Generic Integration System Architecture

. . .Information

SourceInformation

SourceInformation

Source

ClientApplication

ClientApplication

ClientApplication

Page 4: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Information+ Service

Source

SIGMOD Community’s Architecture forUnified Access to Data & Services

Local CommonLocal CommonModel (XML) View Model (XML) View

+ Services+ Services

Mediator

Integrated (XML) Global View / Ontology + ServicesIntegrated (XML) Global View / Ontology + Services

Wrapper

Cache &Replication

(Web) ClientApplication

Information+ Service

Source

Wrapper

(Web) ClientApplication

(Web) ClientApplication

Local CommonLocal CommonModel (XML) View Model (XML) View

+ Services+ Services

Page 5: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Approaches towards View-BasedData Integration

Local As View(LAV)

Global As View(GAV)

GLAV=GAV+LAV

IntegrationSpecification

Method

Info Model &Query Language

Relational (SQL)

XML (XQuery)

Object-Oriented

Warehousing(materialized

views)

On-Demand(virtual views)

Storage Method

Page 6: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Enterprise Information IntegrationReaches Maturity

• Materialized View (Warehousing) approach well-adopted since mid/late 90s– GAV function role played by Extract-Transform-Load tools– Human Intervention Occasionally Needed in Cleaning Up

• Concordance tables for Object Identification

• Virtual View (Mediation) approach at early adoption – many years of research

• Distributed db’s, federated db’s, mediators– moving well into mainstream

• BEA AquaLogic (XML, Virtual, GAV view)• IBM DB2

Enterprise

Page 7: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Current Enterprise InformationIntegration Deployments

IntegrationIntegrationAdminAdmin

Marketing

Local View MLocal View M

Integrated Global View V(M, S, E)Integrated Global View V(M, S, E)

Sales

Local View SLocal View S

Service

Local View ELocal View E

View Builder(design time)

Mediator QueryProcessor (run time)

GAV View V

Schemas Data

• Small Domain• Mostly Vertical Partition of Sources • Primarily Application-Driven View or Identity View• Integration Administrator/Developer in charge

Enterprise

Page 8: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Opportunities and Needs Presented by “Motivated” Communities

• Emerging Myriads of Internet Communities of– Myriads of sources and clients– Source owners motivated to participate

• EII does not address needs– Expensive– Bottleneck of Single Integration Admin

• Make building corresponding portals similar to starting and participating in newsgroups

• Appropriate tools needed to enable source owner and client participation

Communities

Page 9: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

InformationSource 1

Local XML View S1

Client Application 1

Integrated XML View G

InformationSource n

Local XML View Sn

GLAV: V1 GLAV: Vn

GAV: V1a GAV: Vm

a

Application View V1a (G) Application View Vm

a (G)

Client Application m

Integrated View Owner’s Domain

Source Owner’sDomain

SourceOwner’sDomain

DataServices 1

DataServices n

A Community-Based Information Modeling Architecture

Data Services

Page 10: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Visual Tools Matter! (example from the Enosys Query Builder)

C:\Enosys\projects\allPONS.qpr* - Enosys Query Builder

OPEN & VIEW OPEN & VIEW SOURCE SCHEMAS IN SOURCE SCHEMAS IN

XMLXML DRAG & DROP TO CREATE TARGET XML VIEW

TARGET SCHEMA

(XML VIEW)

AUTOMATICALLY GENERATED

MAPS

1

2

Page 11: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

C:\Enosys\projects\allPONS.qpr* - Enosys Query Builder

XML RESULTXQUERY BASED

ON DESIGN SPECS

RUN & TEST XQUERYRUN & TEST XQUERY3

Page 12: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Architecture for Large-Scale Data Integration System and Design Tools

How can the user query and Browse the integrated data?

QURSED

What queries can my app issue?What integrated view services can I build?

CLIDE

How do I export my databaseservices functionality?

RIDE-Services

SourceDomain

WebDomain

ApplicationDomain

IntegrationDomain

Application

DataSource

DataSource

MediatorGlobal View

Schema

Developer

IntegrationEngineer

SourceOwner

Application

Web Forms& Reports

SourceSchema

WebService

WebService

WebService

SourceSchema …

How do I export my data?

RIDE

WebServices

Cache(Metadata)

Page 13: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Dual Interactive Registration Problems

New App

?New Query

?

Source Services

Register Source Given GlobalRegister Source Given Global

Schema, Constraints &QueriesSchema, Constraints &QueriesGuide the client in

query/form writing

Apps

?Queries

?

Register Client Given SourcesRegister Client Given Sources

Guide the source owner in registering a new source and services

New Source and Services

Global ViewGlobal View

Page 14: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Source Data Registration

• How do my source attributes map to global attributes– mappers & automatic

matchers• How do my data relate to

queries & other sources– Inconsistencies?– What takes to

contribute to queries?– How much should I

clean up?• Multiple ways of dealing

with redundancy

Apps

?Queries

?

Server SideServer Side

New Source

Global View

Page 15: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

How to achieve this Goal

Apps

?Queries

?

BeforeBefore

New Source

Apps

?Queries

?

NowNow

New Source

Look at all sources & queries

Decide how to register

your source

Follow the suggestions

of the interface

Global View Global View

Source Registration Tool

Page 16: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Our Goal in Source Registration

Guide the source owner visually through the registration of the source

so as to avoid/warn about (potential) inconsistencies and contribute information to the answer of the queries

while exposing the minimum information possible and/or minimizing effort

Page 17: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

The Problem

?

Client Queries

? ?

Mediator(Global DB)

Sources (Actual Local DBs)

17

Page 18: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

The Contribution Problem

?

Client Queries

? ?

Sources (Actual Local DBs)

• What is the contribution of source S to the result of the query Q?

S

Q

18

Mediator(Global DB)

Page 19: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

The Problem

Client Queries

?

Sources (Actual Local DBs)

• What is the contribution of source S to the result of the query Q?

S

Q

19

Mediator(Global DB)

Q:Q: carscars

carsreviews

Q: cars JOIN reviewsQ: cars JOIN reviews

S is Self Sufficient w.r.t. Q

S is Now Complementary w.r.t. Q

Page 20: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Relational Schemas: Local and Global

attributes

relations

?

• Relational Schemas

• Visual Representation

makeS1

CarmakeOriginSales

autoS2

IdModel

detailIdEngineBaseprice

Source 1Business Magazine

Source 2Car Magazine

GlobalCar Portal

carG

ModelCarmake

Carmakebrand

Origin

DoorsBaseprice

Carmake

20

Page 21: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Source Registration using GLAV Mappings

• Source Registration:Source Registration: Correspondence between a source schema and the global schema = Set of Mapping Constraints of the form

(U V)

• Open WorldOpen World

• Global and Local As View (GLAV)Global and Local As View (GLAV)

?

CQ=

over source schema

CQ= over global schema

21

Page 22: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Target Constraints

• Constraints on the global schema = Set of Constraints of the form

(U V)

• Also Expresses Dependencies (PKs, Ref Integrity, …)Also Expresses Dependencies (PKs, Ref Integrity, …)

?

CQ=

over global schema

CQ= over global schema

22

Page 23: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Visual Representation of Mappings (1)

?

• Visual Representation (IBM Clio)

brandCO

Business Magazine: Provides Carmake and Origin

U1(C, O) :- make(C, O, S)V1(C, O) :- brand(C, O)

(U1 V1)

carG

ModelCarmake

Carmakebrand

Origin

DoorsBaseprice

makeS1

CarmakeOriginSales

OC

makeCOSOC

23

Page 24: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Visual Representation of Mappings (2)

?

• Visual Representation (IBM Clio)

Car Magazine: Provides Model, Carmake and Baseprice

autoIMC

detailIEB

carMC?B

autoS2

IdModel

detailIdEngineBaseprice

Carmake

carG

ModelCarmake

Carmakebrand

Origin

DoorsBaseprice

24

Page 25: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Query Semantics

• Queries in UCQ=

• Set of Possible Global InstancesSet of global instances that satisfy all constraints

• Query Answers = Set of Certain AnswersThe tuples appearing in the answer to Q for any possible global instance

Possible globalinstances

Answer to Qfor any of the

possible globalinstances

CertainCertainAnswersAnswers

to Qto Q

Answer to Qfor any of the

possible globalinstances

CertainCertainAnswersAnswers

to Qto Q

QQ

Possible globalinstances

?

26

Page 26: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Source Instance’s Contribution

Answer to Q - Answer to

Q

• For given instances of the sources

Contribution to Q of Source Instance

=

The tuples in answer of Q not provided by the other sources

27

Page 27: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Source Registration’s Contribution

• Source Registration: Source Mappings

• Degrees of Source Registration’s Contribution

Self Sufficient

Now Complementary

Later Complementary

Unusable

More contribution

Less contribution

28

Page 28: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Self Sufficient Registration: Example

?Baseprices of ModelsBaseprices of Models

carM3

BMW?

45K

carG

ModelCarmake

Carmakebrand

Origin

DoorsBaseprice

Green Registration isSelf Sufficient

carModel

CarmakeDoors

Baseprice

Example

29

Page 29: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Self Sufficient Registration: Definition

Answer to Q

Source instance

s.t.

The source has a non empty contribution in the absence

of the other sources

Answer to Q-

Self SufficientSelf Sufficient

X X X X

30

Page 30: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Now Complementary Registration: Example

?

Baseprices of ModelsBaseprices of Modelsby German manufacturersby German manufacturers

carM3

BMW?

45K

carG

ModelCarmake

Carmakebrand

Origin

DoorsBaseprice

brandBMW

Germany

Green Registration isNow Complementary

brandCarmakeOrigin =

‘Germany’

carModel

CarmakeDoors

Baseprice

Example

31

Page 31: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Now Complementary Registration: Definition

Answer to Q

Not Self Sufficient

&

Source instances

s.t.

The source has a non empty contribution in combination

with the other existing sources

Answer to Q-

Now ComplementaryNow Complementary

32

Page 32: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Later Complementary Registration: Example

?

Baseprices of ModelsBaseprices of Modelsby German manufacturersby German manufacturers

carM3

BMW?

45K

carG

ModelCarmake

Carmakebrand

Origin

DoorsBaseprice

brandBMW

Germany

Green Registration isLater Complementary

brandCarmakeOrigin =

‘Germany’

carModel

CarmakeDoors

Baseprice

Example

33

Page 33: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Later Complementary Registration: Definition

Answer to Q

Not Self Sufficient &Not Now Complementary

&

Potential future sources

& Source instances

s.t.

The source has a non empty contribution in combination

with the future sources

Answer to Q-

Later ComplementaryLater Complementary

34

Page 34: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Unusable Registration: Example

?car

G

ModelCarmake

Carmakebrand

Origin

DoorsBaseprice

brandCarmake

OriginOrigin of CarmakesOrigin of Carmakes

Green Registration isUnusable

Example

35

Page 35: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Unusable Registration: Definition

Answer to Q

Not Self Sufficient &Not Now Complementary &Not Later Complementary

The source has a empty contribution regardless of

what sources enter the system

Answer to Q- =

UnusableUnusable

36

Page 36: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Subtleties for Unusable Registrations

?

Baseprices and DoorsBaseprices and Doorsof Modelsof Models

carG

ModelCarmake

Carmakebrand

Origin

DoorsBaseprice

carM3

BMW?

45K

carM3

BMW2?

Green Registration isUnusable

carModel

CarmakeDoors

Baseprice

Example

37

Page 37: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

In presence of PK Unusable Example becomes Later Complementary

?

Baseprices and DoorsBaseprices and Doorsof Modelsof Models

carG

ModelCarmake

Carmakebrand

Origin

DoorsBaseprice

Green Registration isLater Complementary

carM3

BMW?

45K

carM3

BMW2?

M3BMW

M3BMW

carM3

BMW2

45K

carModel

CarmakeDoors

Baseprice

Example

38

Page 38: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Decidability Results

None Primary keys Primary keys + Referential Integrity Constraints

Self SufficientYes Yes No

Now complementary Yes Yes No

Later complementary Yes Yes ?

UnusableYes Yes ?

Target constraintsTarget constraints

DDeeggrreeeess

Overview: What is decidableOverview: What is decidable

39

Page 39: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Issues

UniqueUnique client query client query MultipleMultiple client queries client queriesVsVs

Contribute to: - all queries? - one query? - specific queries? - some queries based on some ranking?

DataData independence independence DataData dependence dependenceVsVs

M1: cars, refPricesM2: reviewsQ: cars JOIN reviews JOIN refPrices

e.g.

DB1: cars, refPrices (Audis)DB2: reviews (Hondas)

(M2, Q) now-complementarybut Certain Answers for Instances DB1, DB2 =

Page 40: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Putting it all together

ArchitectureArchitecture

?Query

Global Schema

LocalSchemas

Q

S1 Sn Sn+1

…M1 Mn Mn+1

S’

Guide the source owner visually through the registration of the source

so as to raise contribution to the answer of the queries

while exposing the minimum info possible and/or minimizing effort

4 categories:4 categories:

Self Sufficient / Now Complementary / Later Complementary / Unusable

Query Answering / Mappings / SchemasQuery Answering / Mappings / Schemas

Architecture

Goal

Registeredsources

Newsource

Mappings

Contribution

Page 41: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Example 1

carCommunity

price

modeldrive

usedAd*

modelreview*

vin

*

price

refPrice*model

model

condition

quality

Local SchemasLocal Schemas Global SchemaGlobal Schema

carAppQuery

price

modeldrive

usedAd*

modelreview*

vin

*

price

refPrice*model

model

condition

quality

QueryQuery

Without primary keys in the target

Unusable BLUE: Map at least one of the groups

carAutoTrader

vin

cmodel

price

ad*

carId

id*

Page 42: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Example 1

carCommunity

price

modeldrive

usedAd*

modelreview*

vin

*

price

refPrice*modelcondition

quality

Local SchemasLocal Schemas Global SchemaGlobal Schema QueryQuery

Unusable

= cmo

carAutoTrader

vin

cmodel

price

ad*

carId

id* car

AppQuery

price

modeldrive

usedAd*

modelreview*

vin

*

price

refPrice*model

model

condition

quality

model

Without primary keys in the target

Page 43: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Example 1

carCommunity

price

modeldrive

usedAd*

modelreview*

vin

*

price

refPrice*modelcondition

quality

Local SchemasLocal Schemas Global SchemaGlobal Schema QueryQuery

= cmo

carAutoTrader

vin

cmodel

price

ad*

carId

id* car

AppQuery

price

modeldrive

usedAd*

modelreview*

vin

*

price

refPrice*model

model

condition

quality

model= price

LaterComplementary

Without primary keys in the target

Page 44: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Example 2

carCommunity

price

modeldrive

usedAd*

modelreview*

vin

*

price

refPrice*model

model

condition

quality

Local SchemasLocal Schemas Global SchemaGlobal Schema

carAppQuery

price

modeldrive

usedAd*

modelreview*

vin

*

price

refPrice*model

model

condition

quality

QueryQuery

Unusable

carAutoTrader

vin

cmodel

price

ad*

carId

id*

With primary keys in the target

Page 45: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Example 2

carCommunity

price

modeldrive

usedAd*

modelreview*

vin

*

price

refPrice*model

model

condition

quality

Local SchemasLocal Schemas Global SchemaGlobal Schema

carAppQuery

price

modeldrive

usedAd*

modelreview*

vin

*

price

refPrice*model

model

condition

quality

QueryQuery

= price

= vin

carAutoTrader

vin

cmodel

price

ad*

carId

id*

LaterComplementary

With primary keys in the target

Page 46: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Lessons learned

To merge data with that of other sources (become complementary):

Pick a relation and provide…

…all its attributes asked by the query

…its primary key and one of its attributes asked by the query

In absence of primary keys

In presence of primary keys

The number of choices increases in presence of primary keys

Foreign keys on the target affect the suggestions

Target constraints make a differenceTarget constraints make a difference

Page 47: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Large-Scale Data Integration Systems

How can the user query and Browse the integrated data?

QURSED

What queries can the mediator answer for me?

CLIDE

How do I export my databaseservices functionality?

RIDE-Services

SourceDomain

WebDomain

ApplicationDomain

IntegrationDomain

Application

DataSource

DataSource

MediatorGlobal View

Schema

Developer

IntegrationEngineer

SourceOwner

Application

Web Forms& Reports

SourceSchema

WebService

WebService

WebService

SourceSchema …

How do I export my data?

RIDE

Page 48: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Running Example

SchemaComputers(cid, cpu, ram, price)NetCards(cid, rate, standard, interface)

Views

V1 ComByCpu(cpu) (Computer)*SELECT DISTINCT Com1.*FROM Computers Com1WHERE Com1.cpu=cpu

V2 ComNetByCpuRate(cpu, rate) (Computer,

NetCard)*SELECT DISTINCT Com1.*, Net1.*FROM Computers Com1, Network Net1WHERE Com1.cid=Net1.cidAND Com1.cpu=cpuAND Net1.rate=rate

Parameterized Views

DellDell CiscoCiscoSchema

Routers(rate, standard, price, type)

Views

V3 RouByTypeW() (Router)*

SELECT DISTINCT Rou1.*FROM Routers Rou1WHERE Rou1.type='Wired'

V4 RouByTypeWL() (Router)*

SELECT DISTINCT Rou1.*FROM Routers Rou1WHERE Rou1.type='Wireless'

Computersfor a given cpu

Computers & NetCardsfor a given cpu & rate

Wired Routers

Wireless Routers

Page 49: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Running Example

• Global schema puts togetherthe Dell and Cisco schemas

• Resembles the schema of CNET.com portal

Column Associations• (Computers.cid, NetCards.cid)• (NetCards.rate, Routers.rate)• (NetCards.standard, Routers.standard)

Global Schema

V1

Application

V3V2

Dell Cisco

MediatorGlobal Schema

Developer

V4

Page 50: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Sophisticated Mediators Make Feasibility Hard to Predict

Feasible Queries FQ• Equivalent CQ query rewritings using the views• Might involve more than one views• Order might matter

V4

Mediator

RouByTypeWL()

Routers.*

10.11b

50Wirele

ss

54.11g

120Wirele

ssA

B

V2

ComNetByCpuRate(‘P4’, ‘10’)

C

DComputers.* NetCards.*A123

P4512 400A123

10.11b

USB

B123

P41024

550B123

54.11g

USB

Feasible

ComNetByCpuRate(‘P4’, ‘54’)

Computers.* NetCards.* Routers.*A123

P4 512 400A123

10.11b

USB

10.11b

50Wireless

B123

P41024

550B123

54.11g

USB

54.11g

120Wireless

E

Query:Get all ‘P4’ Computers, together with their NetCards and their compatible ‘Wireless’ Routers

Query:Get all Computers

Infeasible

Page 51: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Problem

1. Large number of sources2. Large number of views3. Mediator capabilities

Developer formulates an application query Is an application query feasible? If not, how do I know which ones are feasible?

Previous options:– The developer had to browse the view definitions

and somehow formulate a feasible query– Or formulate queries until a feasible one is found

(trial-and-error)

No system-provided guidance

Page 52: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

The CLIDE Solution

A query formulation interface, which interactively guides the user toward feasible queries by employing a coloring scheme

CLIDE

V1

Application

V3V2

Dell Cisco

MediatorGlobal Schema

Developer

V4

Page 53: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

QBE-Like Interfaces

Microsoft SQL-Server

Page 54: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

CLIDE Interface

• Table, selection, projection and join actions• Color-based suggestions• Feasibility Flag

Projection BoxesTable Boxes

Selection Boxes

Feasibility FlagTable Alias

Page 55: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

CLIDE Interface

Yellow required action– All feasible queries require this action

White optional action– Feasible queries can be formulated

w/ or w/o these actions

Snapshot 1

Page 56: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

CLIDE Interface

Snapshot 2

Blue required choice of action– At least one feasible (next) query cannot

be formulated unless this action is performed

V1

Mediator

ComByCpu(‘P4’)

cidcpu

rampric

eA123 P4 512 400B123 P4 1024 550

rampric

e512 400

1024 550A B

C

Page 57: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

CLIDE Interface

Join Lines:• Only yellow and blue are displayed• Must appear in Column Associations

Snapshot 3

Page 58: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

CLIDE Interface

Snapshot 4

Page 59: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

CLIDE Interface

Snapshot 5

• * any other constant• Red prohibited action

– Does not appear in any feasible query

– Lead to “Dead End” state

Page 60: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

CLIDE Interface

Snapshot 6

V4

Mediator

RouByTypeWL()Routers.*

10.11b

512Wirele

ss

54.11g

1024

Wireless

A

B V2

ComNetByCpuRate(‘P4’, rate)

D

E

Computers.* NetCards.*A123

P4512 400A123

10.11b

50

B123

P41024

550B123

54.11g

120

ramprice

rateinterface

price

512 400 10 USB 501024

550 54 USB 120

F

Page 61: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

CLIDE Facts

• Rapid Convergence

– At every step, yellow and blue actions lead to a feasible query in a minimum number of steps

• Completeness of Suggestions

– Every feasible query can be formulated by performing yellow and blue actions at every step

• Minimality of Suggestions

– At every step, only a minimal number of actions are suggested, i.e., the ones that are needed to preserve completeness

Page 62: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Join ActionTableAction

SelectionAction

Interaction Graph

• Nodes are queries– One for each qCQ

• Edges are actions– Table, selection, projection and join actions

• Green nodes are feasible queries• Infinitely big structure

– All CQ queries– All possible combinations of actions formulating them

Com1.cid=Net1.cidCom1.cpu=‘P4’Com1 Com1.ram Rou1…… Com1.price ……

s

… … ………Net1 …

Page 63: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Interaction Graph: Colorable Actions

• Colorable actions AC labeloutgoing edges of the current node

Net1

Com1.cpu=*

Com1.price=*

Rou1

Com1.ram=*

Com1.cid=*

Com2

Com1.cid

……

……

Com1.cpu

……

……

Current Node

Page 64: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Interaction Graph: Colors

Com1.cpu=*

……

……

……

……

CurrentNode

Net1 Com1.cid=Net1.cid

Com2.cid=Net1.cid

Com2

Com2.cpu=‘P4’ Net1.rate=‘54Mbps’

Net1.rate=’54Mbps’…… … … …

… …

Com1.cpu=*Com1.cpu=* Rou1 Net1.rate=Rou1.rate ……… …

Net1.rate=’54Mbps’ …

Com1.cid=Net1.cid

Com1.cid=Net1.cid …Net1

Com1.cpu=*

Com1.price=*

Rou1

Com1.ram=*

Com1.cid=*

Com2

Com1.cid

Com1.cpu

• Yellow action – Every path from current node n to a

feasible node contains • Blue action

– At least one feasible query cannot be formulated unless this action is performed (minimality)

• Red action – No path to a feasible node contains

Page 65: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Color Determined By a Finite Set of Feasible Queries

• Start by considering the closest feasible queries FQC

• FQC is sufficient to color actions in AC • Theorem: Set of Closest Feasible Queries is Finite

• How far can closest feasible queries FQC be?

Based on Maximally Contained Queries FQMC?

n

ClosestFeasibleQueries FQC

Challenge: Infinitely Many Feasible Queries

Radius

Infinitely manyfeasible queries

?…

Page 66: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Color Algorithm

• Assuming fixed SELECT clause (projection list)• Covered extensively in literature

– MiniCon, Bucket, InverseRules

• FQMC is finite

Maximally Contained Query

Maximally Contained Queries FQMC

Query: Q1Get all Computers

Query: Q2Get all Computers with a given cpu

Query: Q3Get all Computerswith a given cpu & ram

Not Maximally ContainedMaximally Contained Query

Query: Q4Get all Computerswith a given ram

Page 67: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Color Algorithm

• Compute maximally contained queries FQMC

• The radius pL is the longest path to a node n’ such that q(n’) in FQMC

All FQC queries are reachable via a path of length p pL

ClosestFeasibleQueries FQC

MaximallyContainedQueries FQMCn

Maximally Contained Queries FQMC

pL Radius

Page 68: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Color Algorithm

• Theorem: All queries in FQMC are in FQC

• But not all queries in FQC are in FQMC

More on Closest Feasible Queries

ClosestFeasibleQueries FQC

MaximallyContainedFeasibleQueries FQMC

More feasible nodes

n

Page 69: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Color Algorithm

• Naïve Approach

– Start from n and explore paths up to length pL

More on Closest Feasible Queries

ClosestFeasibleQueries FQC

MaximallyContainedFeasibleQueries FQMC

n

Page 70: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Color Algorithm

• Collapse Aliases to compute FQC \ FQMC

• Check satisfiability

Collapse Aliases

ClosestFeasibleQueries FQC

MaximallyContainedFeasibleQueries FQMCn

Page 71: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Color Algorithm

Coloring Non-Projection Actions• No interaction graph materialization• Use of containment mapping from current query to the closest feasible

ones• An action is colored

– Yellow, if is mapped into all queries in FQC

– Red, if is not mapped into any query in FQC

– Blue, if is mapped into at least one query qF in FQC, no other action in AP is mapped into qF, and is neither yellow nor red

Coloring Projection Actions• Never colored yellow• Can be colored blue only if

– the current query is feasible– it is not colored red

• Which ones are red?– Bring all projection atoms from views such that feasibility is preserved

– If action is not mapped into any query in FQC, then is red

Page 72: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

Other Back-End

Parameterized Views Back-End

CLIDE Implementation

ActionCurrentQuery

ClosestFeasibleQueries

SchemasViews

MiniConContainment

TestCollapseAliases

ColorActions

Front-End

Developer

MaximallyContainedQueries

OptimalMaximallyContainedQueries

ColoredActions

ColumnAssociations

MiniCon• Outputs redundant and non-minimal queries• Affects CLIDE’s rapid convergence and minimality propertiesContainment Test• Well-known NP-complete problem• Polynomial when query is acyclicCollapse Aliases / Color Actions• Reuse containment mappings created by MiniCon

Page 73: From Enterprise Information Integration to Community-Based Mediation Alin Deutsch, Yannis Katsis, Michalis Petropoulos Yannis Papakonstantinou A presentation

CLIDE Performance

0

2

4

6

8

10

12

14

(# of Joins, # of Selections) in Current Query

Tim

e (

sec) 112 Views

168 Views

224 Views

280 Views

• Queries

A-span = 7B-span = 4Selections = 4,6,8,10

A

B1…

C1

B2 C1A

BK

B1…

C1

CL

…• Schema

Bi

… Ci

• Views

A

BK

B1…

C1

CL

… …

BiM

Bi1…

CiM

Ci1…

Chains of Stars