Maximize efficiency of “Data"with Data Integration
Pramote TrinaksitSenior Technical Consultant
May 10, 2016
2
o Why Integration ?o Today Data Integration Solutionso Simplify Data Consolidationo Master Your Data with MDM
Agenda
4
Silos
Data Stores
Business ApplicationsToday is world of heterogeneity.
We have different technologies.
We operate on different platforms.
We have large amount of data being generated everyday in all sorts of organizations and Enterprises.
And we do have problems with data.
10
Why Data Integration?
Data IntegrationData
WarehousingMaster Data Management
Real Time Messaging
FederationMigration
Data in Disparate Sources
ERP
------------------
CRM
- - -
Legacy
------------------ ------------
------
Best-of-breed Applications
Information How and Where you Want It
Business Intelligence Corporate Performance Management
Business Activity Monitoring
Business Process Management
HAVE…
NEED…
- - -- - - - - -- - -
Data Synchronization
12
Conventional ETL Design
Legacy Application
Oracle, MSSQL,DB2
Text, Excel, CSV
Coding
SQL, Procedure, Function
ODBC, SQL Loader, Coding
SQL, Procedure, Function1
2
3
4
Lookup
5
Staging Area
16
Differentiator: Declarative Design
Conventional ETL DesignSpecify ETL Data Flow Graph• Developer must define every step of
Complex ETL Flow Logic• Traditional approach requires specialized
ETL skills• And significant development and
maintenance efforts
Declarative Set-based Design• Simplifies the number of steps• Automatically generates the Data Flow
whatever the sources and target DB
Benefits Significantly reduce the learning curve Shorter implementation times Streamline access to non-IT pros
ODI Declarative Design
Define How: Built-in Templates
Define WhatYou Want
Automatically Generate
Dataflow
1 2
17
Design Transformations
Oracle Data Integrator “Interface”
Declarative Design
1 Define What You Want 3 Automatically GenerateData flows
2 Define How to Do It: Select Template
Bulk Load • Changed Data Capture • Incremental Update • Slowly Changing Dimension
18
Orchestrate Data Flows1. Sequence Transformations2. Leverage OracleDI Tools
• Data Quality Processes• Files/Archives Management• Send/Receive Emails• Web Services Invokation• Event Detection• Create your Own Tools
3. Use Control Structures• Loops• Conditions• Error Handling
22
Flexible Deployment Models
UnidirectionalQuery Offloading,DR Site, Migration
Bi-DirectionalActive-Active for HA
Peer-to-PeerLoad Balancing, Multi-Master
Broadcast Data Distribution
Integration/ConsolidationData Warehouse
CascadingData Marts
23
Data Consolidation & Distribution Architecture
HQ
Region 3
Region 1Region 2
Region 4
Customer
Customer
Customer
Customer
Real-time Data replication
Real-time Data replication
Real-time Data replication
Real-time Data replication
DRCentral Site
Remote Site
Remote SiteDR Site
Reporting
24
Real-Time Data Integration Platform
EMP
24
OracleGoldenGate
Oracle Data Integrator
EMP DEPT
DIM
FACT
DIM
DIMDIM
ODS Schema DW Schema
On-Disk Logs
Source 1
On-Disk Logs
Source 2
OracleGoldenGate
Oracle GoldenGate• Real time extracts from
transactional systems• Non-invasive on sources
• Continuous streaming load into ODS Schema of target
• Latency in seconds
Oracle Data Integrator EE• High Performance ‘E-LT’ on
target data warehouse• Periodic mini-batches (15 min)• Transform in the Database• Never go back to sources
EMP DEPT
DEPT
28
Core Problem: Data Degrades
Companies
• 240 businesses will change addresses
• 150 business telephone numbers will change or be disconnected
• 112 directorship (CEO, CFO, etc.) changes will occur
• 20 corporations will fail
• 12 new businesses will open their doors
Source: D&B, US Census Bureau, US Department of Health and Human Services, Administrative Office of the US Courts, Bureau of Labor Statistics, Gartner, A.T Kearney, GMA Invoice Accuracy Study
• 5,769 individuals in the US will change jobs
• 2,748 individuals will change address
• 515 individuals will get married
• 263 individuals will get divorced
• 186 individuals will declare a personal bankruptcy
Individuals
Data changes at rate of 2% per month.
Products
• On average 20% duplicates in product data
• 90% of new product introductions fail
• Retailers lost 40 billion or 3.5% of total sales lost each
year due to item info inefficiencies
• 60% error rate for all invoices generated
• Global Data Sync will realize 30% lower IT costs
In one hour… In one hour… In one year…
Compounded, 2% monthly change is 27% per year, 61% in two years, 104% in three years!!!
30
Profiling & Discover – Column Profiling New!
Visual of Nulls, Unique & Non-Unique
Column & Rule Profiling
Drill Down Results
Value & Pattern Freqs
35
System Components - Intelligent Data Warehouse/Data Mart & Business Analytics Platform
Font Channels & Data Sources
End-to-End Security Management
38
Why need database consolidation?
4. Simplify maintenance – Fewer servers in fewer locations
5. Better security – Smaller security perimeter
1. Reduce complexity– Servers running standard configurations
2. Improve efficiency – Drive up hardware utilization rates
3. Lower costs – Hardware/Software (License and Maintenance)– Energy and floor space
40
1. Resource Management2. High Availability3. Testing4. Operation5. Secure Data6. Separate of Duties
Data Consolidation Strategy
41
1. Resource Management2. High Availability3. Testing4. Operation5. Secure Data6. Separate of Duties
Data Consolidation Strategy
42
New architecture for consolidating databases and simplifying operations
Consolidate with Oracle Multitenant
ERP CRM
DW Self-contained PDB for each application• Applications run unchanged• Rapid provisioning (via clones)• Portability (via pluggability)
Shared memory and background processes• More applications per server
Common operations performed at CDB level• Manage many as one (upgrade, HA,
backup)• Granular control when appropriate
Complementary to VMs
43
Mixed Workloads
• OLTP peak time is day• DWH peak time is night
020406080
100
day night day
OLTP
0
50
100
150
200 Pre-12c
0
50
100
150
day night day
12c
020406080
100
day night day
DWH
• Pre-12c DBs – high resource allocation for peak use– Low resource usage during non-peak
activity
• OLTP + DWH in the same CDB on the same Hardware = efficient use of resources
44
1. Resource Management2. High Availability3. Testing4. Operation5. Secure Data6. Separate of Duties
Data Consolidation Strategy
45
High Availability
Active Data Guard– Data Protection, DR– Query Offload
GoldenGate– Active-active– Heterogeneous
RMAN, Oracle Secure Backup– Backup to tape / cloud
Active Replica
Edition-based Redefinition, Online Redefinition, Data Guard, GoldenGate– Minimal downtime maintenance, upgrades, migrations
RAC– Scalability– Server HA
Flashback– Human error
correction
Production Site Application Continuity– Application HA
Global Data Services – Service Failover / Load Balancing
46
1. Resource Management2. High Availability3. Testing4. Operation5. Secure Data6. Separate of Duties
Data Consolidation Strategy
48
1. Resource Management2. High Availability3. Testing4. Operation5. Secure Data6. Separate of Duties
Data Consolidation Strategy
50
1. Resource Management2. High Availability3. Testing4. Operation5. Secure Data6. Separate of Duties
Data Consolidation Strategy
51
Data Breach
Breached using weak or stolen credentials
Preventable with basic controls
76%
97%
Records breached from servers67%
Discovered by an external party69%
52
From MISTAKES to MALICIOUS
Social Engineering
Sophisticated Attacks
Business Data Theft
Reputation Loss
• Privilege Abuse• Curiosity • Data Leakage
• Accidents• Disclosures
Basic security not enough for today’s business
53
Why Are Databases Vulnerable?
Network Security
SIEM
Endpoint Security
Email Security
Authentication & User Security
Database Security
80% of IT Security Programs Don’t Address Database Security
55
Um… How to Start?
1. Understanding what data needs to be protected2. Understanding applicable regulatory compliance requirements3. Performing an inventory of all databases, including nonproduction4. Discovering and classifying databases based on sensitivity of data5. Establishing security policies for all databases6. Taking appropriate security measures
6-Step for Comprehensive Database Security
56
1. Resource Management2. High Availability3. Testing4. Operation5. Secure Data6. Separate of Duties
Data Consolidation Strategy
57
What’s Cloud Computing
• On‐demand network access• Shared pool of configurable computing resources• Provisioned by the Service Provider
64
Real Life
Meet AdamA long serving, high ARPU customer of Service Provider QuadPlay
My name is Adam Jiminez, I live in Bridgeport
I’ve been with my Service Provider ‘QuadPlay’ for 8 years now
They provide me and my family with mobile, broadband and home phone services
They keep trying to sell me broadband which I already have but what I really want is a discounted TV package
Data Records for Adam from CRM Database @ Quadplay:
Name & Address Variations
Missing Data? $10 ARPU or $310 ARPU?
Name Address DoB Email Address Contact Number Products Customer Since Marketing Opt in Billing DateAvg Mth Spend Credit Card
Adam Jiminez 716 Mollis Street, Bridgeport, USA 55365 10/20/1970 123 555 5789 Mobile40 4GB 10/10/2012 Day 10 $200 4921545764673439Mr A Jiminez 716 Mollis Street, Bridgeport, USA 10/20/1970 [email protected] 438 555 1516 HomeFone Std 27/03/2007 Yes Day 27 $50Adam Jimminez Mollis Street, Bridgeport, USA 55365 10/20/1970 [email protected] BBUnlimited 03/07/2010 Day 3 $50Adam N Jiminez 716 Mollis Street, Bridgeport, USA 55365 [email protected] 123 555 8854 PrePay Freedom 14/09/2015 N/A $1020101970
Non Standard format
adams
65
Implementation Step
james smith1008 6th avenue suite 7nyc new york 10018
First Name: jamesLast Name: smithAddressL1: 1008 6th avenueAddressL2: suite 7City: nycState: new yorkZip Code: 10018
First Name: jamesLast Name: smithAddressL1: 1008 Avenue of the AmericasAddressL2: Suite 7City: New YorkState: new yorkZip Code: 10018
First Name: JamesLast Name: SmithAddressL1: 1008 Avenue of the AmericasAddressL2: Suite 7City: New YorkState: NYZip Code: 10018
Jim J. Smyth New York, NY [email protected](212) 755-2551
System X
First Name: JimMid Name: J. Last Name: SmythAddressL1: 1008 Avenue of the Americas AddressL2: Suite 7City: New YorkState: NYZip Code: 10018Phone: (212) 755-2551Email: [email protected]
First Name: JimMid Name: J. Last Name: SmythAddressL1: 1008 Avenue of the Americas AddressL2: Suite 7City: New YorkState: NYZip Code: 10018-5402Longitude: 40.7325525Latitude: -74.004970Phone: (212) 755-2551Email: [email protected]_Category: Affluent Couples & FamiliesC_Group: Affluent Families
Profile
Parse
Correct
Standardize
Match
Merge
Enrich