enterprise information integration successes, challenges, and controversies by: alon y. halevy,...

22
Enterprise Information Integration Successes, Challenges, and Controversies By: Alon Y. Halevy, Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, Arnon Rosent hal, Vishal Sikka Presented by: Shiwoong Kim November 23, 2005

Post on 20-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

Enterprise Information IntegrationSuccesses, Challenges, and Controversies

By: Alon Y. Halevy, Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, Arno

n Rosenthal, Vishal Sikka

Presented by: Shiwoong Kim

November 23, 2005

EII – A New Industry(Introductory remarks, Alon Y. Halevy, UW) Enterprise Information Integration: A new industry What is EII?

Integrating multi-source data without a central warehouse Research community: Data integration system

Factors in the development of EII industry Expected growth:

EII – Architecture & Applications

Data integration scenario Identify data sources Build virtual schema Queried

by users Query processing

Reformulate query (virtual schema sources) Execute query efficiently using engine

XML data model & XQuery Double problems: XML research in infancy

First applications Customer-relationship management Digital dashboards

EII - Challenges

Scale-up & performanceCompetition vs. Warehouse (ETL)Horizontal vs. verticalEII, EAI, other middlewareMetadata managementSemantic heterogeneity

Cost-Effective & Scalable EII(Naveen Ashish, NASA Ames Lab)

Primary concern: scalability & cost-effectiveness

At NASA: Schema-centric approach overkill Investment in schema management per new source

Return in investment is low (not linear)

Heavy-weight middleware

Diverse needs of data integration applications One-size-fits-all approach unsuitable Nimble & adaptive approach necessary

EII @ NASA

Schema-less system Eliminates schema & DB management Challenges some basic assumptions

Data must be stored/managed in DBMS DB must have formal schema

Business documents: interface to information Clients not too light-weight to do processing Imposition of schema by client applications

EII Will Not Replace Data Warehouse(Dina Bitton, CTO of Callixa)

Data warehousing technologies (ETL) ETL – Extract, Transform, Load Far more sophisticated now than before

Demands for low cost, real-time delivery (EII) Need for “virtual data warehouse” On-demand integration without moving data Single interface to multiple sources: SQL, XQuery

However, EII will not replace ETL

Inefficiency in EII

Performance Inefficient to transform source data to XML and mo

ve across network to (XQuery) processor Join operation not optimized

How to improve it? Maximize parallelism in query processing Minimize amount of data transferred across network Learn from parallel DB servers

Current solutions rushed, simplistic

Persistence vs. Virtualization

Data warehouses Persist data to:

Keep a history When access to source is denied

EII Virtualize…

…across data warehouses …when adding new source to data warehouse …for one-time projects or prototypes …data that must be kept fresh

EII solves different problem, will co-exist

EAI vs. EII(Michael Carey, BEA Systems)

EAI (WebLogic Integration) Supports business process management Connectivity to different applications & sources Update, access, integrate, orchestrate procedurally

EII (Liquid Data) Access/integrate data from many sources & apps Same connectivity options as WebLogic Integration Also, RDBMS, XML files, delimited files, etc… Declarative integration using XQuery

What’s the problem?

Separation of EAI vs. EII artificialWhich technology used when?Can use both to solve same problem

EII more efficient integrating data However, need EAI for updates

Best practice: use both There will be redundancy

EII & EAI die reborn as “EI”

Experience at Nimble Technology(Denise Draper, Microsoft)

EII a “gratifyingly hard” problem Hard technical problems, yet clear and elegant

outlineMapping between data models in extensible fashionDecompose queries across multiple sourcesBe able to optimize queries

Nimble SystemXML-based data model & query languageViews as a central metaphor for the systemVendor/DB specific SQL adapters

EII in Business

Acceptable raw performanceGraphical query builderClient-side SDKsStill, difficult to grow business, why?

EII not recognized as an independent productCompetition: data warehouses (ETL) vs. EII

EII has to be either: cheaper orSolve different problem or Solve the same problem better

EII vs. ETL

Pull (EII) vs. Push (ETL) technology Customers didn’t value live data very much

Dynamic source discovery and querying Automatically map/extract data from sources Would make EII more valuable, but it’s hard to do

EII has cost advantage (no warehouse needed) However, not well understood Cost of ETL (warehousing) is at least predictable ETL & EII both have high human cost of setup/admin

Where Should EII Go?

Is EII a lost cause or dead end?ETL isn’t the final answer

Can’t copy all relevant data to single repository

EII queries migrated to high-performance ETL Same data model, query language

Data modeling standards for interoperability & metadata management

EII will be in the picture (as part of portfolio)

Enterprise Information Interoperability(Jeff Pollock, Network Inference)

Notion of integration flawed Interoperability: Loose coupling, using formal

semantics (no EII tools do this)

Hard part of EII still the same Regardless of syntax/structure Federated, shared info must…

…account for meaning (semantics), within a context

What do today’s EII systems do well? Very efficient federated queries

EII Systems – What are they good for? Internal management of common data views Management of metadata for transformation, query

Three flavors of EII systems Relational-based, XML-based, object-based

But structure of data contains no semantics Contained in proprietary metadata, or code itself

EII will either… …adopt Semantic Web technologies and survive …EII core will be folded into new technology

Author is betting on OWL, RDF (W3C)

EII - Embracing & Guiding Chaos(Arnon Rosenthal, MITRE Corp.)

Long term, EII not viable stand-alone product “It’s the metadata, stupid!” However, EI metadata unintegrated Semantic Web (OWL) promising EII n-tier architectures hide data in different tiers

Needs better Service Oriented Architectures

EII at runtime unattractive in military Ease of change gets lost in gov’t acquisition process How do you measure data integration ‘agility?’

Future Direction of EI

EI has been passive How to exchange data that’s been supplied However, enterprises are proactive

Semantics integration Semantics management Enterprise models – Managers have partial influence Community services creating/influencing new standards

Need for tools managing data supply chains

Role of the User in EII(Vishal Sikka, SAP)

SAP views EII as a goal, not technology Ensuring consistency of information Providing connectivity/accessibility across multiple p

latforms/DBs Timely and complete view of critical entities/events

Information integration from user’s perspective Work triggered by tasks/events Actions not covered by traditional business processes Use of e-mails/documents commonplace Decisions based on multiple sources

Role of User in EI

Actions taken in multiple applications

SAP’s NetWeaver provides: Management of master data Virtual data federation

Distributed query processing, metadata mapping

ETL in the Business Info Warehouse and EAI

Representative problem: Enterprise Search Enable search across all data formats (documents, bu

siness objects, structured data) in enterprise apps EII technologies are still very basic

Open Issues in EII

Open problems Diverse data types, diverse needs

Need for common semantic framework integrating results

User interaction issues Need for common metadata Security issues Performance issues

EII techniques for real-time access met with skepticism Query optimization, query execution time prediction When is real-time access necessary?