a model driven approach to enterprise data integration.ppt

16
15 December 2010 Confidential A model driven approach to enterprise data integration

Upload: others

Post on 27-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A model driven approach to enterprise data integration.ppt

15 December 2010 Confidential

A model driven approach to enterprise data integration

Page 2: A model driven approach to enterprise data integration.ppt

Enterprise data management challenge

•A large enterprise has 1000’s of databases

– Not sure what kind of data exists where, and what kind of dependencies

exist between them

•Different data stores are produced in different process contexts

– Not sure what their semantics are

– Not sure what their quality attributes are

•Lack of a unified view and poor quality severely dent confidence in

15 December 2010 2Confidential

•Lack of a unified view and poor quality severely dent confidence in

data assets

– Affecting operational efficiency

– Affecting quality of decision making

•Ad hoc, point solutions only exacerbate the problem further

– They add more pieces to the puzzle

Need a more systematic approach

Page 3: A model driven approach to enterprise data integration.ppt

Model driven approach

15 December 2010 3Confidential

Page 4: A model driven approach to enterprise data integration.ppt

A model driven approach to enterprise data integration

•How do we model enterprise data so that we can achieve a unified

view of data at the enterprise level?

•Different data sources are produced in different semantic contexts

– Process, geographical, temporal,…

– E.g.

• Customer – current, past, potential?

• Profit – before tax or after?

15 December 2010 4Confidential

• C1: Employees can work only on projects of their own department

• C2: Employees can be deputed to work on projects of other departments

• It is critical to model these contexts

– to interpret and integrate the data correctly

– to assess its quality

Page 5: A model driven approach to enterprise data integration.ppt

…A model driven approach to enterprise data integration

Global knowledge

Context specific

Mapping

Conceptual Model Conceptual Model

Unified Conceptual Model

+ Rules

15 December 2010 5Confidential

Context specific

knowledge

Raw data

(Objects, properties,

events,..)Source N

Conceptual Model

+ Rules

Conceptual Model

+ Rules

Source 1……..

Page 6: A model driven approach to enterprise data integration.ppt

Core infrastructure

•Models

– Conceptual models

• EER or a restricted version of OWL

• Rule language

– Similar to SWRL, but extended with aggregation operators

– Physical models

• Relational

•Mappings

15 December 2010 6Confidential

•Mappings

– Simple attribute-to-attribute mappings

– Views – GAV, LAV, GLAV

– User-defined functions

– Informal correspondences (with textual description of mappings)

Page 7: A model driven approach to enterprise data integration.ppt

…Core infrastructure

•Core reasoning tools

– Query translation

• Given a query on one model translate it into a semantically equivalent one on

another mapped model

– Mapping composition

• Given Model1<-m1->Model2, Model2<-m2->Model3, compose m1, m2 to produce

a mapping between Model1 and Model3.

•Component architecture

15 December 2010 7Confidential

•Component architecture

– Modeling, mapping, query translation, query-to-DFG, DFG-to-query, etc

– Purpose specific data management tools are composed from these

components

• E.g. integration tool, data migration tool, impact analysis tool, etc

Page 8: A model driven approach to enterprise data integration.ppt

Some tools built using modeling approach

15 December 2010 8Confidential

Page 9: A model driven approach to enterprise data integration.ppt

Data integration – a unified view of data

Enterprise conceptual

model

map

Source N

Query

Platformspecificimplementations

•Provide query interface in terms

of enterprise conceptual model

•Generate platform specific

implementations– DB specific queries

– ETL specifications

15 December 2010 9Confidential

Conceptual

model

Conceptual

model

Data store Data store

Source1 Source N

.…

– ETL specifications

• Own engine

• Informatica ETL bridge, ..

Components: modeling, mapping, query translation, query-to-DFG

Page 10: A model driven approach to enterprise data integration.ppt

Data integration – hybrid architecture

• Specify warehouse schema as a view

over the unified enterprise conceptual

model

• Generate ETL to populate warehouse

from sources

• Provide a unified query interface in terms

of the enterprise conceptual model

Enterprise conceptual

model

map

Query

15 December 2010 10Confidential

of the enterprise conceptual model

– Go to warehouse for data that exists

there and to sources for other data

• Pick optimal set of aggregate views

– As requirements change, this mix can

change transparently

Warehouse Data source

Generated ETL

Components: modeling, mapping, mapping composition, query translation, query-to-DFG

Page 11: A model driven approach to enterprise data integration.ppt

Data migration

• Map source data model to target data

model

• Generate ETL to migrate data from

source to target

• Generate migration programs for data

access code migration

TargetData model

SourceData model

TargetData store

SourceData store

mapping

15 December 2010 11Confidential

access code migration

– Query migration

Data store Data store

Generated ETL

Components: modeling, mapping, query translation, query-to-DFG

Page 12: A model driven approach to enterprise data integration.ppt

Open Issues

15 December 2010 12Confidential

Page 13: A model driven approach to enterprise data integration.ppt

Complexity vs. expressive power

•Query complexity vs. data complexity

– We wanted to keep data complexity polynomial

– SQL reducibility is a goal

•But SQL reducible subsets are not expressive enough

– E.g. OWL Lite

•Partial solutions vs. complete solutions

– Completeness sacrificed in favor of expressiveness

– But safety guaranteed

15 December 2010 13Confidential

– But safety guaranteed

– E.g. comparison predicates, aggregation, functional dependencies

• Data complexity is known to be NP

• We use SQL reduction, so obviously there is no completeness guarantee

Can we state conditions on the query and models that tell us when a given rewriting is guaranteed to be complete?

Page 14: A model driven approach to enterprise data integration.ppt

Static vs. dynamic context

•We have just one fixed context model per data source

– But context can vary from instance to instance even within the same entity

– E.g. model: Company, Department, Employee, Project

• Company X: Employees can work only on projects of their own department

• Company Y: Employees can be deputed to work on projects of other departments

•To implement this, we need two levels of rules

– Rules to select applicable context for a given instance

15 December 2010 14Confidential

– Application of context specific rules

How do we do this efficiently is an open issue.

Page 15: A model driven approach to enterprise data integration.ppt

Integrating quality attribute processing

•How do we deal with quality issues such as uncertainty, etc?

•Plenty of research on probabilistic databases and uncertainty management

– E.g. Stanford TRIO - uncertainty and provenance management on top of

relational model

– They assume confidence values are already recorded in the database

– But assessing uncertainty of ‘base facts’ itself is a complex reasoning task

requiring context knowledge

15 December 2010 15Confidential

requiring context knowledge

– Can this be integrated into the rest of the query processing framework?

•Can uncertainty management be used to resolve conflicts better?

– In integration, quite often we end with conflicting values

– Uncertain which value to choose, but forced to choose one

– Can uncertainty management techniques provide a better solution?

Page 16: A model driven approach to enterprise data integration.ppt

15 December 2010 16Confidential

Thank you!

1615 December 2010