mdm architecture deep dive: implementing real-time bi-directional
TRANSCRIPT
1
MDM Architecture Deep Dive: Implementing Real-time
Bi-Directional Synchronization
Ron Matusof, VP MDM Solutions
Informatica
Dmitri Korablev, VP Strategy and Planning
Informatica
2
Agenda
• Introduction
• Understanding Synchronization Requirements
• Real World Use Cases
• Best Practices
• Summary
4
Who We Are?
Dmitri Korablev
VP, Strategy & Planning
Almost 20 years designing,
architecting and building
enterprise solutions
Favorite quote:
“Simple things should be
simple and complex things
should be possible.”
Alan Kay
Ron Matusof
VP, Solution Architecture
Almost 30 years architecting
and integrating complex
systems
Favorite quote:
“Nothing is as simple as it
seems.”
Corollary to Murphy’s Law
5
Why is Synchronization an issue?
It may be complex, but it should be possible.
…and it is not as simple as it seems.
Theme for Today’s Presentation
“Make everything as simple as possible, but not simpler.”
Albert Einstein
6
Synchronization Design Tradeoffs
Consistency Performance
Quality
Coherence
Correlation Timeliness
Throughput
Latency
Accuracy
Fidelity
7
Complex MDM Scenario
Data Warehouse
Legacy Systems
Data Marts
Data Steward
Business Users
Call Center
Legacy CIF
Policy DB
B2B Transforms
Federated
Query
Bi-
Di
Sy
nc
hro
niz
ati
on
Read/Update Hub GSDN Integration
1
2
3
4
5
8
Data Warehouse
Data Marts
Data Steward
Business Users
Call Center
Legacy CIF
Policy DB
1
Case 1: Hub as System of Record
9
Questions you should ask • Consistency
• Do I Synch or do I Access in the Hub?
• Performance • Is Real-time Always Better?
• Quality • Do I want a Batch Window?
Data Warehouse
Data Marts
Data
Stewar
d
Business User
s
Call Center
Legacy CIF
Policy DB
1
Case 1: Hub as System of Record
Solutions • Look at Consumption Scenarios
• Synthesize appropriate keys for downstream use
• Leverage Existing Interfaces and Transport • Don’t reinvent the infrastructure
• Understand the Full Data Supply Chain • Propagation, Syndication, or Synchronization?
10
Synchronizing Multiple Systems
Classical approach focuses on data latency
No emphasis on data correlation
Limited metrics for “what is good enough”
Home Office
Taos
Orbit City
Scranton
Party Name
ACME Products
Spacely Space Sprockets
Dunder Mifflin
Home Office
Taos
Orbit City
Scranton
Party Name
ACME Products
Spacely Space Sprockets
Dunder Mifflin
Home Office
Taos
Orbit City
Scranton
Party Name
ACME Products
Spacely Space Sprockets
Dunder Mifflin
11
Synchronizing Multiple Systems
Latency causes discrepancy in targets
At t = 1, System 1 is synchronized with Hub
At t = 3, System 2 is synchronized with Hub
Between t = 1 and t = 3, no downstream correlation
Home Office
Flagstaff
Orbit City
Scranton
Party Name
ACME Products
Spacely Space Sprockets
Dunder Mifflin
Home Office
Flagstaff
Orbit City
Scranton
Party Name
ACME Products
Spacely Space Sprockets
Dunder Mifflin
Home Office
Taos
Orbit City
Scranton
Party Name
ACME Products
Spacely Space Sprockets
Dunder Mifflin
12
Legacy Systems
Data Steward
Business Users
Legacy CIF
Policy DB
2 Federated
Query
Bi-
Di
Sy
nc
hro
niz
ati
on
Case 2: R/T Synch with an application
13
Questions you should ask • Consistency
• How do I correlate with a other systems?
• Performance • What do I do about feedback loops?
• Quality • Can I catch DQ Issues at source?
Case 2: R/T Synch with an application
Solutions • Develop Appropriate Logical Data Models
• Consider handling changes through abstraction & insulation
• Understand both direct and indirect feedback loops • Use Delta Detection and/or CDC for loop suppression
• Integrate with the application business workflow • Event Driven vs. Process Driven (or both)
14
Synchronizing Multiple Systems
Bi-Directional Synch causes race conditions
Need to characterize the latency, throughput and
correlation impacts and tolerances.
Home Office
Flagstaff
Orbit City
Scranton
Party Name
ACME Products
Spacely Space Sprockets
Dunder Mifflin
Home Office
El Paso
Orbit City
Scranton
Party Name
ACME Products
Spacely Space Sprockets
Dunder Mifflin
Home Office
Taos
Orbit City
Scranton
Party Name
ACME Products
Spacely Space Sprockets
Dunder Mifflin
informatica data services informatica data services
16
Case 4: Global Data Synchronization
Questions you should ask • Consistency
• How do I transform the data?
• Performance • How do I handle simultaneous updates?
• Quality • How do I handle End-To-End Lineage?
Solutions • Use composite services to handle transformations
• Use Hub Data Objects with the B2B transform as a facade.
• Put complexity in the transform & not the data model • Watch for the costs of serialization/deserialization and IO
• Integrate with the application business workflow • Event Driven vs. Process Driven (or both)
18
Case 5: Global Distribution of Hubs
Questions you should ask • Consistency
• How do I handle one hub being off-line?
• Performance • How do I handle simultaneous updates?
• Quality • How do I govern data across the Hubs?
Solutions • Develop connection agnostic processes
• Implement Queued Updates
• Design appropriate Replication and Failover strategy • Use embedded or external CDC to generate change lists
• Develop integrated workflow across the Hubs • Event Driven vs. Process Driven (or both)
19
Complex MDM Scenario
Data Warehouse
Legacy Systems
Data Marts
Data Steward
Business Users
Call Center
Legacy CIF
Policy DB
B2B Transforms
Federated
Query
Bi-
Di
Sy
nc
hro
niz
ati
on
Read/Update Hub GSDN Integration
1
2
3
4
5
21
Application Integration
• Design for Application Data Consumption Pattern(s) • Analyze and understand the business use of the data
• Socialize the application use of Master Data
• Model for the Business Usage of the Master Data • Consider using the application data model as a starting point
• Reconcile with the data models for contributing sources
• Optimize the model for application use cases
• Architect for application security/performance • Review regulatory constraint for distributing and sharing data
• Develop methodology for creating test data and testing the app
• Architect to achieve application performance requirements
22
Bi-Directional Synchronization
• Optimize the Synchronization Architecture
• Determine Requirements Based on Consumption Patterns
• Evaluate the Cost/Benefit of overachieving on the SLA
• Architect solution to meet the optimal cost/benefit
• Analyze Round trips and Feedback Loops
• Consider loops that pass through more than two systems
• Use CDC/Delta detection to suppress loops
• Create Transactionality/Compensation Strategies
• Architect solutions for handling cross platform transactions
• Consider how to compensate for discrepancies
23
Global Distribution
• Architect to a Common Core Data Model
• Ensure that synchronized attributes are formatted identically
• Account for local differences only in non-synchronized fields
• Account for Data Differences in 3rd party software
• Date/time
• Currency
• Unique processing approaches
• Make adding additional countries/regions easy
• Ensure the core data model considers future requirements
24
Summary
Do not over-engineer the solution.
• Architect the Synchronization Design to meet the SLA.
Design performance for the whole solution, not the
individual parts.
• Synchronization has the potential for introducing additional bottlenecks.
Design synchronization solution to maintain
appropriate correlation between target systems.
• Perceived synchronization issues are typically related to correlation between downstream systems.
25
Marketplace Overview A Trusted, Open Ecosystem
• Virtual Marketplace for Data Integration Apps
• Solutions across all technology areas – DI, DQ, MDM, Cloud, etc.
• Open Ecosystem – Apps from Partners, ISVs, Consultants, and Developers
• Seal of Approval ensures App quality
• More than 600 Apps, over 200 Free!
• 15k visits per month, 2k downloads
http://marketplace.informatica.com
R
M