![Page 1: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/1.jpg)
1
![Page 2: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/2.jpg)
<Insert Picture Here>
OWB Data Quality – Best PracticesJean-Pierre DijcksDecember 2008
![Page 3: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/3.jpg)
3
Agenda
• Building a data quality firewall• The importance of data rules• The difference between profiling and auditing
![Page 4: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/4.jpg)
4
Basic system architecture
StagingData Layer
Operational data layer
Performance data layerSiebel CRM
Oracle EBS
PeopleSoft
SAP/R3
Other Sources
Data Sources
Message Queues
![Page 5: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/5.jpg)
5
Building a data quality firewall
StagingData Layer
Operational data layer
Performance data layerSiebel CRM
Oracle EBS
PeopleSoft
SAP/R3
Other Sources
Data Sources
Message Queues
DataProfiling
Stage 2 DataCorrection
Schema & Data Type Correction
Data Audits
Data Audits
DataGovernance
![Page 6: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/6.jpg)
6
Building a data quality firewall
StagingData Layer
Operational data layer
Performance data layer
Siebel CRM
Oracle EBS
PeopleSoft
SAP/R3
Other Sources
Data Sources
Message Queues Profile Workspace
Move Sample Data to Profile Workspace
![Page 7: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/7.jpg)
7
Schema and Data Type Correction
• Leverage data profiling for• Generating the staging area tables• Schema corrections• Data Type corrections (enforce real data types)
Oracle EBS
StagingData LayerDiscuss with business users
Untangle for lookups or recoding
Profile data
Schema & Data Type Correction
![Page 8: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/8.jpg)
8
Anatomy of the operational data layer
Goal:• Create lowest grain data for
reporting• Create a schema to service all
applications with correct data• Act as source for performance
layer
Characteristics• De-normalized but still close to 3-
NF• Relationships established and
enforced• Data corrected and de-duplicated• Permanent data
Operational data layer
![Page 9: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/9.jpg)
9
Loading the operational layer
• Leverage in-database architecture• Do all the hard work here!• Load between schemas – not databases• Huge performance gains through OWB architecture
• Embed data quality into the loads• Create a data quality fire wall
• Strictly enforce all required rules• Document all erroneous data and correct if desired
• Do matching and merging to create uniqueness from many data flows• Create master data records• Re-code as necessary• Re-key as necessary• Keep cross references
![Page 10: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/10.jpg)
10
Data Quality Fire Wall
Cleanse:• De-duplicate incoming data• Fix data issues
• Name and address• String comparisons
Protect:• Enforce referential integrity• Enforce data rules• Enforce data types and
conversions
• Report• Data issues• Quality levels• Quality trends
Operational data layer
ProtectCleanse Report
![Page 11: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/11.jpg)
11
Feeding non-DW systems
• Always load from the operational layer
• Delivers flexibility and lowest grain to external systems
• Aggregate on the way out if required (not typical)
• Delivers clean data, with measured service levels for DQ
Operational data layer
ProtectCleanse Report
![Page 12: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/12.jpg)
12
Data QualityThe importance of data rules
1) Profile Know your data
Data Rules
Correction Mappings Data Auditors
Coherent Data Audit Results and trends
2) Generate
3) Operate 4) Monitor
Trust your data
Information
5) Report
Fear your data
Ignorance
![Page 13: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/13.jpg)
13
Data QualityData Profiling – Unique Capabilities
• Complete offering
• Two usage modes:• Use to investigate
unknown data• Use to validate known
business rules against real data
![Page 14: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/14.jpg)
14
Data Profiling vs. Data Auditing
Data Profiling • Ad-hoc when required• Discovery in search of
unknowns• Time consuming• Resource intensive
Data Auditing:• Continuous processes• Planned to be done
repetitively • Gathers information over time• Small tasks
Both serve the same purpose through different means
![Page 15: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/15.jpg)
15
<Insert Picture Here>
D E M O N S T R A T I O N
![Page 16: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/16.jpg)
16
Performance Tips for Data Profiling
• Data Profiling is a highly processor and I/O intensive process
• Run large profiles (>10M Rows in a table) on multi-processor machines
• Use parallel:• OWB uses /*+ PARALLEL(<TBL>) */ hints in DP queries• Default degree of parallelism is picked up from database
• Balance your configuration• Stripe data across disks using ASM• Make sure I/O and CPU ratios are remotely correct
![Page 17: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/17.jpg)
17
Performance Tips for Data Profiling
• When loading the workspace you are moving lots of data => optimize this:• Place the profile workspace in the same database as the
source data• Enable the source tables for parallel reads• Consider moving the data with regular OWB maps first, or use
Transportable Tablespaces or Data Pump
• Memory:• SGA should be no less than 500MB, preferably be around 2-
3G for most profiles• Buffer cache hit ratio >95%• Library cache hit ratio >99%
![Page 18: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/18.jpg)
18
Further Reference Material
• http://blogs.oracle.com/warehousebuilder• Data Quality posts about:
• Using data rules for Referential Integrity• Key Quality Indicators• Match and Merge
• Demonstrations on OTN• Data Profiling and Corrections• Fuzzy match and merging• Name and address cleansing
• Training• Extending your Knowledge (data profiling handson)
![Page 19: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/19.jpg)
19
New Features for DQBeta Program for 11gR2
If you are interested in the beta please contact the OWB product management team:
• Michelle Bird ([email protected])
Or directly go to:http://otnbeta.oracle.com/bpo/prospects/index.htm
Make sure to mention Michelle as sponsor.
![Page 20: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/20.jpg)
20
Questions
![Page 21: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/21.jpg)
21
Summer2009
Spring2009
CY2010
CY2011
UnifiedTeam
UnifiedPlatform
High-Level Data Integration RoadmapNatural Upgrade Path for Existing Solutions
• OWB/ODI Investments are Fully Protected
• No Forced Migrations• Natural Upgrade Path• Unified Platform aims to be
a Superset of Existing Products – no regression
![Page 22: · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer](https://reader033.vdocument.in/reader033/viewer/2022041916/5e69fff4a97f7c71fc1053a9/html5/thumbnails/22.jpg)
22