2010.03.16 pollock.edw2010.modern d ifor warehousing

40
<Insert Picture Here>

Upload: jeffrey-t-pollock

Post on 18-Nov-2014

1.008 views

Category:

Documents


4 download

DESCRIPTION

Presentation describes a modern alternative to conventional hub-based ETL and Replication for Data Warehousing

TRANSCRIPT

Page 1: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

<Insert Picture Here>

Page 2: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

The following is intended to outline our general

product direction. It is intended for information

purposes only, and may not be incorporated into any

contract. It is not a commitment to deliver any

material, code, or functionality, and should not be

relied upon in making purchasing decisions.

2

relied upon in making purchasing decisions.

The development, release, and timing of any

features or functionality described for Oracle’s

products remains at the sole discretion of Oracle.

Page 3: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

<Insert Picture Here>

Modern Data Integration for Data WarehousingOracle Fusion Middleware

Page 4: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Agenda

• Data Warehouse Problem Space (Data Intg. Focus)

• Ancient Pre-History of Data Warehouse

• “The Good Old Days” of Data Warehouse

• Revival Period for Data Warehouse

• Data Integration for Modern Data Warehousing

• Old Generation: Hub & Spoke with Invasive Capture

• New Generation: Agent-based with Non-invasive Capture

4

• New Generation: Agent-based with Non-invasive Capture

• Drive Business Value with Data Integration

• Why Replace? Isn’t my Old _____ Good Enough?

• The Oracle Solution for Data Integration

• Oracle GoldenGate

• Oracle Data Integrator

• Oracle Data Quality

Page 5: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Data Warehousing

P R O B L E M S P A C E

5

P R O B L E M S P A C E

Page 6: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Data Warehouse Ancient History

• 1985 – 1995 “Controlled Chaos”

• Fragmented Strategy for Marts vs. Warehouse

• No practical notion of “Enterprise Data Warehouse”

• Data Integration:

• Hand-coded Scripts (External to DB)

• Not Optimized

6

• Not Optimized

• Procedural Transformations (PL/SQL etc)

• Few Data Integration Tools

• No Formal Methodology, Metrics or Governance

Page 7: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Data Warehouse Good Old Days

• 1995 – 2005 “Formal Methods and Discipline”

• Strategy Choices for Marts vs. Warehouse

• Top-down (Inmon) vs. Bottom-up (Kimball)

• Formal notion of “Enterprise Data Warehouse”

• Data Integration:

• Tool-based Data Integration Solutions

7

• Tool-based Data Integration Solutions

• Optimized, Parallel Server-based Transforms

• Formal Methodology, Metrics or Governance

• Reduced Reliance on Hand-coded Scripts and

Procedural Transformations (PL/SQL etc)

Page 8: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Data Warehouse Revival Period

• 2005 – 2015 “Specialized Warehouse Solutions”

• Technology-driven Choices for High-end DW’s

• Commodity H/W vs. Optimized Appliances

• Relational/Star vs. Columnar (vs. Cubes/OLAP)

• Database + BI vs. Distributed Analytic Apps (Hadoop etc)

• EDW as a “source of truth” vision � morphs and

expands to MDM as a distinct problem domain

8

expands to MDM as a distinct problem domain

• Data Integration is still stuck in the “Good Old Days”

Good Old Days Modern Alternative

Hub-based Runtime Agent-based Runtime

Centralized ETL Server Optimized E-LT (DW Appliance)

Mainly Batch Mainly Real Time / Trickle Feed

Page 9: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Data Warehousing with

MODERN DATA INTEGRATION

9

MODERN DATA INTEGRATION

Page 10: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Traditional ETL + CDC

• Invasive Capture on OLTP

systems using complex Adapters

• Transformations in ETL engine

on expensive middle tier servers

• Continuous feeds from

operational systems

• Non-invasive data capture

• Thin middle tier with

Modern Data Integration ApproachHeterogeneous, Real-time, Non-Invasive, High Performance E-LT

Modern E-LT + Real-time

10

• Bulk load to the data warehouse

with large nightly/daily batch

transformations on the database

platform (target)

• Mini-batches throughout the day

or bulk processing nightly

Staging

Trickle

Lookup

Data

Load

Extract

Lookup

Data

Xform XformBulk

Ag

en

t

Ag

en

t

Heterogeneous

Page 11: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Good Old Days of ETL Batch Integration

Extract Transform Load Lookups/Calcs Transform Load

Development, QA, System (etc)

Environments

• Good Tools, but:

• Expensive Environments, Performance

Bottlenecks, Too Many Data Hops,

Proprietary Skills w/Vendor Lock-in, and

Heavy Optimization in Complex Situations

• Won’t scale w/new Generation of DW’s

11

Stage ProdLookup

DataSources

ETL engines

require BIG

H/W and heavy

parallel tuning

Extract Transform Load Lookups/Calcs Transform Load

ETL Engine(s)

MetaLookup

Data

ETL Metadata

Page 12: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Extract Transform Load Lookups/Calcs Transform Load

Modern Agent-based E-LT Processing

• Same Good Tools you Expect, plus:

• Reduce Data Center Costs, De-commission Servers

• Open Frameworks, Non-Proprietary SQL Skills

• Deploys Seamlessly Alone or within SOA Servers

• Scales Linearly with Modern DW Appliances

12

Extract Transform Load Lookups/Calcs Transform Load

Sources

Meta

Stage ProdLookup

Data

E-LTAgent

Data Movement

Set-based SQL

transforms

typically faster

SQL Load

inside DB is

always faster

Development, QA, System (etc)

Environments

Data Transformation

Page 13: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Good Old Days of Real Time Replication

• Good Tools, but:

• Arcane capture process, sometimes invasive

• Okay for Data Integration Changed Data Capture, but:

• not used for Active-Active / ZDT Migrations

• not used for High Availability or Disaster Recovery

13

Stage ProdLookup

DataSources

CDC Hub(s)

ETL Engine(s)

Transaction Apply

Mgmt Server

Page 14: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Agent-based Real Time Replication

• Same Good Tools you Expect, but:

• Not dependent on hardware for replication

• Capable of Heterogeneous, Active-Active Deployments

• Suitable for Zero Downtime Migrations

• Point-in-time Recovery

14

Sources Stage ProdLookup

Data Data MovementCaptureAgent

ReplicatAgent

Page 15: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Data Capture Architecture Options

• Next Generation Capabilities

• Non-invasive, heterogeneous, disk-based log access

• Suitable for CDC + High Availability & Active-Active

• Bi-directional and high performance

• Check-pointing and Simple Trail/Queue Management

15

On-Disk Logs

Log Tables

TriggersUpdatesInsertsDeletes

OracleIBM DB2MSFT SQL ServerSybaseTeradataEnscribe

Page 16: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Good Old Days of Data Integration

• Monolithic & Expensive Environments

• Fragile, Hard to Manage

• Difficult to Tune or Optimize

ETL engines

require BIG

H/W and heavy

Extract Transform Load Lookups/Calcs Transform Load

MetaLookup

Data

ETL Metadata

Development, QA, System (etc)

Environments

16

Stage ProdLookup

DataSources

H/W and heavy

parallel tuning

ETL Engine(s)

CDC Hub(s)

Transaction Apply

Mgmt Server

Page 17: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Modern Data Integration Architecture

• Lightweight, Inexpensive Environments – Agents

• Resilient, Easy to Manage – Non-Invasive

• Easy to Optimize and Tune – uses DBMS power

Extract Transform Load Lookups/Calcs Transform Load

17

Sources

Meta

Stage ProdLookup

Data Data Transformation

E-LTAgent

Bulk Data Movement

Set-based SQL

transforms

typically faster

SQL Load

inside DB is

always faster

Development, QA, System (etc)

Environments

CaptureAgent

ReplicatAgent

Page 18: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Data Integration Drives

B U S I N E S S V A L U E

18

B U S I N E S S V A L U E

Page 19: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

1. Do More with Less

2. Compete Globally 24X7

Design metadata-driven integrationLeverage skills & dictate patterns

Ensure continuous uptimeAccess data in real time

Business Drivers for Data IntegrationAdd Value to the Core Business Lines

19

3. Use Data for Competitive Advantage

4. Automate and Adapt Business Processes

Ensure the quality of your dataActively govern most valuable asset

Expose data services for reuseOrchestrate processes using SOA

Page 20: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Project Drivers for Data IntegrationEssential Ingredient for Information Agility

Strategic Value of Data Integration

• Consistency for major enterprise initiatives like BI, DW, & MDM

• Common technical foundation platform across data silos

• Central point for data governance, availability and controls

20

Key Data Integration Use Cases

• BI, DW, and OLTP Data Integration & Replication

• SOA, Enterprise Integration & Modernization

• Migrations and Master Data Management

Page 21: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Modern Data Integration Alternatives:

W H Y R E P L A C E _______?

21

W H Y R E P L A C E _______?

Page 22: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Why Replace _______?

• We often hear, “my company has already standardized

on __________, why should I replace it?

Answer:

� Save Money on Data Center Costs

� Accelerate Project Delivery / TTM

22

� Accelerate Project Delivery / TTM

� Supply Real Time Intelligence to the Business

� Reduce Batch Windows on Data Warehouse

� Unify Data Integration with SOA Plans

Page 23: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Save Money on Hardware/Data CenterE-LT runs on Small Commodity Servers as an Agent Process

Next Generation Architecture

E-LTE-LTLoadExtract

Transform Transform

Typical: Separate ETL Server• Proprietary ETL Engine, Poor Performance

• High Costs for Separate Standalone Server

E-LT: No New Servers• Lower Cost: Leverage Compute

Resources & Partition Workload efficiently

• Efficient: Exploits Database Optimizer

23

Conventional ETL Architecture

Extract LoadTransform

• Efficient: Exploits Database Optimizer

• Fast: Exploits Native Bulk Load & Other Database Interfaces

• Scalable: Scales as you add Processors to Source or Target

Benefits• Optimal Performance & Scalability

• Better Hardware Leverage

• Easier to Manage & Lower Cost

Page 24: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Speed Project Delivery/Time to MarketE-LT uses Declarative SQL-style Design + Simple Runtime

• Development Productivity• 40% Efficiency Gains

• Environment Setup (ex: BI Apps)• 33-50% Less Complex

Number of Setup Steps 7

Number of Servers 1

Number of connections 3

24

Number of Setup Steps 10

Number of Servers 3

Number of connections 7

Page 25: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Supply Real Time Business IntelligenceNon-invasive Capture + E-LT Processing

Application Real Time BI(using Data Copy)

Analytic BI(Facts & Dims)

Consistency Window

25

E-LT(Mini-Batch + Transforms)

Page 26: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Stage ProdLookup

DataSources

MetaLookup

Data

ETL Engine(s)

ETL Metadata

ETL engines

require BIG

H/W and heavy

parallel tuning

Main driver for batch

window is data integrity &

consistency; once lookup &

calc functions begin, DW

typically goes offline

Reduce Consistency Windows w/E-LTFewer Steps, Faster Xform, and Faster Loads vs. typical ETL

Extract Transform Load Lookups/Calcs Transform Load

26

DW isOnline

E-LT Batch Window

ETL Batch Window

Sources

Meta

Stage ProdLookup

Data Data Movement

E-LTAgent

Data Movement

Extract

Extract

Transform Load

Load

Extract Transform Load

Transform Load

Set-based SQL

transforms

typically faster

SQL Load

inside DB is

always faster

Uptime GainsTransform

Page 27: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

*What About “Pushdown Processing”• Pushdown Processing is what the ETL vendors do to

compensate for bad performance – push the transformation

processing to the Database

• Both Pushdown & E-LT have in common:• uses the power of your Data Warehouse for maximum performance

• can combine engine-based operations with DB-based transformations to

accomplish any level of data transformation complexity

• can scale to any multi-TB level and using parallel processing

• Only E-LT can claim:

27

• Only E-LT can claim:• performance optimized for your Database – whichever DB you use

• operate without any new IT Hardware costs

• 100% Java-based

• easily embedded within your existing or planned SOA infrastructure

• is not a glorified scheduler that relies on PL-SQL, or other custom-coded

DB scripts to achieve maximal performance

• can entirely eliminate needless network-hops for remote data joins

• can operate with no additional energy drain in your Datacenter

Page 28: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Unified Management + Monitoring• Common Runtime – 100% Java

• Common Monitoring

Example Use Cases• Bulk Data Transformation (any2any)

• XML/EDI Large File Handling

• SOA-driven Business Intelligence

Unify E-LT Agent with SOA RuntimeBest of Breed Data Integration as a Shared SOA Service

28

High PerformanceETL & Replication

Any Data SourceData Warehouse

& OLAP

• SOA-driven Business Intelligence

• Load DW from SOA

• Unified Data Steward Workflow(ETL Error Hospital w/BPEL PM)

• ERP Migration, Replication / Loading

• Query Offloading & Zero Downtime

E-LT Frameworks are optimal architectures for:

• Business Intelligence

• Performance Management

• Database & OLAP

• Embedded Applications

• Application Integration

• Middleware Servers

Page 29: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Data Integration the:

O R A C L E S O L U T I O N

29

O R A C L E S O L U T I O N

Page 30: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Oracle Data Integration SolutionBest-in-class Heterogeneous Platform for Data Integration

MDMApplications

SOAPlatforms

OracleApplications

BusinessIntelligence

Activity Monitoring

Custom Applications

Oracle GoldenGate

SOA Abstraction Layer

Service BusProcess Manager Data Services

Oracle Data Integrator Oracle Data Quality

Data Federation

Comprehensive Data Integration Solution

30

Oracle GoldenGate

Log-based CDC

Bi-directional Replication

Real-time Data

Oracle Data Integrator

ELT/ETL

Data Transformation

Bulk Data Movement

OLTPSystem

Flat FilesData Warehouse/Data Mart

OLAP Cube Web 2.0 Web and Event Services, SOA

Storage

Data Verification

Oracle Data Quality

Data Profiling

Data Parsing

Data Cleansing

Data Lineage Match and Merge

Page 31: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Key Data Integration Products

• Comprehensive Integration

• ELT/ETL for Bulk Data

• Service Bus

• Process Orchestration

• Human Workflow

• Data Grid

• Heterogeneous E-LT & ETL

• High-speed Transformations

• OLAP Data Loading

• Data Warehouse Loading

• Real Time Data Replication

• Changed Data Capture

• DBMS High Availability

• Disaster Tolerance

31

• Business Data / Metadata

• Statistical Analysis

• Time Series Reporting

• Integrated Data Quality

• Cleansing & Parsing

• De-duplication

• High Performance

• Integrated w/ODI

• Data Service Modeling

• Query Federation

• Data Redaction

• Service Data Objects

Page 32: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Oracle Data Integrator Enterprise EditionOptimized E-LT for improved Performance, Productivity and Lower TCO

E-LT Transformation vs. E-T-L

Any Data Warehouse

Legacy Sources

32

Declarative Set-based design

Change Data Capture

vs. E-T-L

Hot-pluggable Architecture

Any Planning System

OLTP DB Sources

Application Sources

Pluggable Knowledge Modules

Page 33: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Oracle GoldenGate OverviewEnterprise-wide Solution for Real Time Data Needs

Log Based, Real-

Time Change Data

Capture

Disaster Recovery, Data Protection

Zero Downtime Migration and

Upgrades

Operational Reporting

Standby(Open & Active)

Reporting

• Standardize on Single

Technology for Multiple Needs

• Deploy for Continuous

Availability and Real-time Data

Access for Reporting / BI

33

Capture

Heterogeneous Source Systems

EDWODS

EDW

Reporting

Real-time BI

ReportingDatabase

OGG

ETL

ETL

Query Offloading

Data Distribution

• Highly Flexible

• Fast Deployments

• Lower TCO & Improved ROI

Page 34: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

How Oracle GoldenGate WorksModular De-Coupled Architecture

Capture: committed transactions are captured (and can be

filtered) as they occur by reading the transaction logs.

Trail: stages and queues data for routing.

Pump: distributes data for routing to target(s).

Route: data is compressed,

encrypted for routing to target(s).

Delivery: applies data with transaction

integrity, transforming the data as required.

34

LAN/WANInternet

TCP/IP

Bi-directional

CaptureTrail

Pump DeliveryTrail

SourceDatabase(s)

TargetDatabase(s)

Page 35: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Govern Data Better with Data Quality

• Data Movement

– E-LT & ETL

– Data Transformation

– Change Data Capture

– Data Access

– Data Services

• Data Profiling

– Statistical Analysis

– Rule-based Validation

– Monitoring & Timeslice

– Fine-grained Auditing Data Movement

35

• Data Cleansing

• Data Validation during ETL

• Data Standardization

• Address Matching & Dedup

• Error Hospital / Workflow

Data Cleansing

Data Quality and Profiling

Data Integration

Page 36: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

C O N C L U S I O N

36

C O N C L U S I O N

Page 37: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Traditional ETL + CDC

• Invasive Capture on OLTP

systems using complex Adapters

• Transformations in ETL engine

on expensive middle tier servers

• Continuous feeds from

operational systems

• Non-invasive data capture

• Thin middle tier with

Modern Data Integration ApproachHeterogeneous, Real-time, Non-Invasive, High Performance E-LT

Modern E-LT + Real-time

37

• Bulk load to the data warehouse

with large nightly/daily batch

transformations on the database

platform (target)

• Mini-batches throughout the day

or bulk processing nightly

Staging

Trickle

Lookup

Data

Load

Extract

Lookup

Data

Xform XformBulk

Ag

en

t

Ag

en

t

Heterogeneous

Page 38: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Questions

38

Page 39: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
Page 40: 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

The preceeding is intended to outline our general

product direction. It is intended for information

purposes only, and may not be incorporated into any

contract. It is not a commitment to deliver any

material, code, or functionality, and should not be

relied upon in making purchasing decisions.

40

relied upon in making purchasing decisions.

The development, release, and timing of any

features or functionality described for Oracle’s

products remains at the sole discretion of Oracle.