enterprise information integration successes, challenges, and controversies by: alon y. halevy,...
Post on 20-Dec-2015
224 views
TRANSCRIPT
Enterprise Information IntegrationSuccesses, Challenges, and Controversies
By: Alon Y. Halevy, Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, Arno
n Rosenthal, Vishal Sikka
Presented by: Shiwoong Kim
November 23, 2005
EII – A New Industry(Introductory remarks, Alon Y. Halevy, UW) Enterprise Information Integration: A new industry What is EII?
Integrating multi-source data without a central warehouse Research community: Data integration system
Factors in the development of EII industry Expected growth:
EII – Architecture & Applications
Data integration scenario Identify data sources Build virtual schema Queried
by users Query processing
Reformulate query (virtual schema sources) Execute query efficiently using engine
XML data model & XQuery Double problems: XML research in infancy
First applications Customer-relationship management Digital dashboards
EII - Challenges
Scale-up & performanceCompetition vs. Warehouse (ETL)Horizontal vs. verticalEII, EAI, other middlewareMetadata managementSemantic heterogeneity
Cost-Effective & Scalable EII(Naveen Ashish, NASA Ames Lab)
Primary concern: scalability & cost-effectiveness
At NASA: Schema-centric approach overkill Investment in schema management per new source
Return in investment is low (not linear)
Heavy-weight middleware
Diverse needs of data integration applications One-size-fits-all approach unsuitable Nimble & adaptive approach necessary
EII @ NASA
Schema-less system Eliminates schema & DB management Challenges some basic assumptions
Data must be stored/managed in DBMS DB must have formal schema
Business documents: interface to information Clients not too light-weight to do processing Imposition of schema by client applications
EII Will Not Replace Data Warehouse(Dina Bitton, CTO of Callixa)
Data warehousing technologies (ETL) ETL – Extract, Transform, Load Far more sophisticated now than before
Demands for low cost, real-time delivery (EII) Need for “virtual data warehouse” On-demand integration without moving data Single interface to multiple sources: SQL, XQuery
However, EII will not replace ETL
Inefficiency in EII
Performance Inefficient to transform source data to XML and mo
ve across network to (XQuery) processor Join operation not optimized
How to improve it? Maximize parallelism in query processing Minimize amount of data transferred across network Learn from parallel DB servers
Current solutions rushed, simplistic
Persistence vs. Virtualization
Data warehouses Persist data to:
Keep a history When access to source is denied
EII Virtualize…
…across data warehouses …when adding new source to data warehouse …for one-time projects or prototypes …data that must be kept fresh
EII solves different problem, will co-exist
EAI vs. EII(Michael Carey, BEA Systems)
EAI (WebLogic Integration) Supports business process management Connectivity to different applications & sources Update, access, integrate, orchestrate procedurally
EII (Liquid Data) Access/integrate data from many sources & apps Same connectivity options as WebLogic Integration Also, RDBMS, XML files, delimited files, etc… Declarative integration using XQuery
What’s the problem?
Separation of EAI vs. EII artificialWhich technology used when?Can use both to solve same problem
EII more efficient integrating data However, need EAI for updates
Best practice: use both There will be redundancy
EII & EAI die reborn as “EI”
Experience at Nimble Technology(Denise Draper, Microsoft)
EII a “gratifyingly hard” problem Hard technical problems, yet clear and elegant
outlineMapping between data models in extensible fashionDecompose queries across multiple sourcesBe able to optimize queries
Nimble SystemXML-based data model & query languageViews as a central metaphor for the systemVendor/DB specific SQL adapters
EII in Business
Acceptable raw performanceGraphical query builderClient-side SDKsStill, difficult to grow business, why?
EII not recognized as an independent productCompetition: data warehouses (ETL) vs. EII
EII has to be either: cheaper orSolve different problem or Solve the same problem better
EII vs. ETL
Pull (EII) vs. Push (ETL) technology Customers didn’t value live data very much
Dynamic source discovery and querying Automatically map/extract data from sources Would make EII more valuable, but it’s hard to do
EII has cost advantage (no warehouse needed) However, not well understood Cost of ETL (warehousing) is at least predictable ETL & EII both have high human cost of setup/admin
Where Should EII Go?
Is EII a lost cause or dead end?ETL isn’t the final answer
Can’t copy all relevant data to single repository
EII queries migrated to high-performance ETL Same data model, query language
Data modeling standards for interoperability & metadata management
EII will be in the picture (as part of portfolio)
Enterprise Information Interoperability(Jeff Pollock, Network Inference)
Notion of integration flawed Interoperability: Loose coupling, using formal
semantics (no EII tools do this)
Hard part of EII still the same Regardless of syntax/structure Federated, shared info must…
…account for meaning (semantics), within a context
What do today’s EII systems do well? Very efficient federated queries
EII Systems – What are they good for? Internal management of common data views Management of metadata for transformation, query
Three flavors of EII systems Relational-based, XML-based, object-based
But structure of data contains no semantics Contained in proprietary metadata, or code itself
EII will either… …adopt Semantic Web technologies and survive …EII core will be folded into new technology
Author is betting on OWL, RDF (W3C)
EII - Embracing & Guiding Chaos(Arnon Rosenthal, MITRE Corp.)
Long term, EII not viable stand-alone product “It’s the metadata, stupid!” However, EI metadata unintegrated Semantic Web (OWL) promising EII n-tier architectures hide data in different tiers
Needs better Service Oriented Architectures
EII at runtime unattractive in military Ease of change gets lost in gov’t acquisition process How do you measure data integration ‘agility?’
Future Direction of EI
EI has been passive How to exchange data that’s been supplied However, enterprises are proactive
Semantics integration Semantics management Enterprise models – Managers have partial influence Community services creating/influencing new standards
Need for tools managing data supply chains
Role of the User in EII(Vishal Sikka, SAP)
SAP views EII as a goal, not technology Ensuring consistency of information Providing connectivity/accessibility across multiple p
latforms/DBs Timely and complete view of critical entities/events
Information integration from user’s perspective Work triggered by tasks/events Actions not covered by traditional business processes Use of e-mails/documents commonplace Decisions based on multiple sources
Role of User in EI
Actions taken in multiple applications
SAP’s NetWeaver provides: Management of master data Virtual data federation
Distributed query processing, metadata mapping
ETL in the Business Info Warehouse and EAI
Representative problem: Enterprise Search Enable search across all data formats (documents, bu
siness objects, structured data) in enterprise apps EII technologies are still very basic
Open Issues in EII
Open problems Diverse data types, diverse needs
Need for common semantic framework integrating results
User interaction issues Need for common metadata Security issues Performance issues
EII techniques for real-time access met with skepticism Query optimization, query execution time prediction When is real-time access necessary?