replicate from oracle to data warehouses and analytics
TRANSCRIPT
![Page 1: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/1.jpg)
© 2015 VMware Inc. All rights reserved.
VMware Continuent Replication Replicate from Oracle to data warehouses and analytics
MC Brown Senior Product Line Manager October 22nd, 2015
![Page 2: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/2.jpg)
2
Agenda
1 Introduction to VMware Continuent
2 Understanding VMware Continuent Replication
3 Using Analytics and Data Warehouses
4 Warp-up and Questions
![Page 3: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/3.jpg)
Introducing VMware Continuent
Business continuity for business-critical MySQL database applications
Commercial-grade multi-site HA/DR
Database Clustering Flexible, high-performance replication for Oracle and MySQL
Simple data loading into analytics and big data
Data Replication
Oracle Oracle MySQL Oracle MySQL MySQL (+ MariaDB, Percona Server) Oracle Hadoop, Redshift, Vertica MySQL Hadoop, Redshift, Vertica
Products Products MySQL Single Site HA MySQL Multi-Site HA and DR
![Page 4: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/4.jpg)
Replication solves important problems for RDBMS users
• Real-time local copies in case the DBMS fails • Real-time remote copies in case the site fails • Loading data into quickly into analytic systems • Feeding edge applications from the Oracle mother ship • Migrating from Oracle to:
– New Oracle versions – Less expensive editions – Non-Oracle DBMS
CONFIDENTIAL 4
![Page 5: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/5.jpg)
5
Agenda
1 Introduction to VMware Continuent
2 Understanding VMware Continuent Replication
3 Using Analytics and Data Warehouses
4 Wrap-up and Questions
![Page 6: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/6.jpg)
VMware Continuent implements flexible, high-performance replication for Oracle and MySQL
6
Replicator mySQL
DBMS Logs
mySQL
Replicator
THL
THL
Download transactions via network or from file system
Apply using JDBC (Transactions + metadata)
(Transactions + metadata)
Primary
Secondary
Source
Target
Low latency transfer
Low application impact
![Page 7: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/7.jpg)
VMware Continuent captures transactions directly from Oracle REDO logs
7
Replicator mySQL
REDO Logs
mySQL
THL
(Transactions + metadata)
Primary
(To secondary)
Capture data
dictionary
Source
Capture raw transactions
Staging area for REDO log
data
Replicator Host Oracle DBMS Host
Convert to serialized row
changes and DDL
![Page 8: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/8.jpg)
Low-impact, high performance
• Source Oracle DBMS requirements: – Supplemental logging – Archive logs – Replicator metadata stored in DBMS – Replicator login with access to catalogs and flashback query – local process to read REDO logs
• Target Oracle DBMS requirements: – Replicator metadata stored in DBMS
CONFIDENTIAL 8
![Page 9: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/9.jpg)
Transaction Based Replication
CONFIDENTIAL 9
Transaction Log (Row changes + Statements)
0 Create table db1.foo 1 Create table db2.foo 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1… 4 Insert into db2.foo values(5,…) 5 Insert into db1.foo values(3,…) 6 Delete from db2.foo where id=5
Source
Target
![Page 10: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/10.jpg)
Parallel Apply
10
THL Parallel queue (Transactions + metadata)
Target
Extract Filter Apply Extract Filter Apply
Extract Filter Apply
Extract Filter Apply
Extract Filter Apply
Stage Stage Stage
Replicator Pipeline
Source replicator
![Page 11: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/11.jpg)
Parallel Extraction for Provisioning
11
THL
(Transactions + metadata)
Extract Filter Apply Extract Filter Apply
Stage Stage
Replicator Pipeline
Source Multi-threaded data extraction using flashback queries
![Page 12: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/12.jpg)
Topologies
12
Replicator Replicator
Replicator
Fan-in
Replicator Replicator
Replicator
Fan-out
![Page 13: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/13.jpg)
Multiple Targets
13
Replicator Replicator
Replicator
Replicator
Source
Other RDBMS versions and OS platforms
Other RDBMS types
Non-relational DBMS
![Page 14: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/14.jpg)
We can even divide logs into transaction sequences on keys
14
Table=db1.foo, key=1 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1…
Table db2.foo, key=5 4 Insert into db2.foo values(5,…) 6 Delete from db2.foo where id=5
Table=db1.foo, key=3 5 Insert into db1.foo values(3,…) Source
Target
![Page 15: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/15.jpg)
Ordering transactions around keys enables efficient data warehouse loading
15
Replicator
Source DBMS
CSV Files CSV Files CSV Files CSV Files
Load Script
HADOOP CLUSTER
Parallel loading
Map/Reduce View Generation
![Page 16: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/16.jpg)
16
Agenda
1 Introduction to VMware Continuent
2 Understanding Continuent Replication
3 Using Analytics and Data Warehouses
4 Wrap-up and Questions
![Page 17: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/17.jpg)
Data Warehouse Integration and Usage is Changing • Traditional data warehouse usage was based on dump from transactional store, loads into data
warehouse
• Data warehouse and analytics were done off historical data loaded • Data warehouses often use merged data from multiple sources, which was hard to handled
• Data warehouses are now frequently sources as well as targets for data, i.e.: – Export data to data warehouse – Analyze data – Feed summary data back to application to display stats to users
17
![Page 18: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/18.jpg)
Modern Data Warehouse Sequences
![Page 19: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/19.jpg)
How do we cope with that model • Traditional Extract-Transform-Load (ETL) methods take too long • Data needs to be replicated into a data warehouse in real-time
• Continuous stream of information • Replicate everything
• Use data warehouse to provide join and analytics
![Page 20: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/20.jpg)
Data Warehouse Choices • Oracle • Hadoop
– General purpose storage platform – Map Reduce for data processing – Front-end interfaces for interaction in SQL-like (Hive, HBase, Impala) and non-SQL (Pig, native, Spark) – JDBC/ODBC Interfaces improving
• Vertica – Massive cluster-based column store – SQL and ODBC/JDBC Interface
• Amazon Redshift – Highly flexible column store – Easy to deploy
![Page 21: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/21.jpg)
21
(software formerly known as Tungsten Replicator) is a fast,
open source, database replication engine
Designed for speed and flexibility
Apache V2 license 100% open source, find it on Github
VMware Continuent for Replication/Data Warehouses
![Page 22: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/22.jpg)
22
Transactional Store Data Warehouse
Dump/Provision
Transactions? X
Batch
The Data Warehouse Impedance Mismatch
![Page 23: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/23.jpg)
Transactional and Data Warehouse Metadata • Replicating data is not just about the data • Table structures must be replicated too
• ddlscan handles the translation – Migrates an existing MySQL or Oracle schema into the target schema – Template based – Handles underlying data type matches – Needs to be executed before replication starts
![Page 24: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/24.jpg)
Replicating into Vertica
Replicator
Replicator
CSV
JS
JDBC
cpimport
staging
base
merge
![Page 25: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/25.jpg)
Replicating into Redshift
Replicator
Replicator
CSV
JS
JDBC
s3cmd
staging
base
merge
COPY
![Page 26: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/26.jpg)
Replicating into Hadoop
Replicator
Replicator
CSV
JS
hadoop fs
![Page 27: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/27.jpg)
Initial Materialization within Hadoop
load-reduce-check
Migrate staging/base DDL
Hive materialization
CSV
StagingTable
Base Table
![Page 28: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/28.jpg)
Ongoing Materialization within Hadoop
materialize
Hive materialization
CSV
StagingTable
Base Table
![Page 29: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/29.jpg)
Comparing Loading Methods for Hadoop Manual via CSV Sqoop Tungsten
Replicator
Process Manual/Scripted Manual/Scripted Fully Automated
Incremental Loading
Possible with DDL changes
Requires DDL changes
Fully Supported
Latency Full-load Intermittent Real-time
Extraction Requirements
Full table scan Full and partial table scans
Low-impact CDC/binlog scan
![Page 30: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/30.jpg)
Sqoop and Materialization within Hadoop
Hive materialization
CSV
StagingTable
Base Table
Sqoop
Replicate
![Page 31: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/31.jpg)
31
Op Seqno ID Msg
I 1 1 Hello World!
I 2 2 Meet MC
D 3 1
I 3 1 Goodbye World
Op Seqno
ID Msg
I 2 2 Meet MC
I 3 1 Goodbye World
How the Materialization Works
![Page 32: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/32.jpg)
32
1 2 3 4 5 6 7 8 9 1 0
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
2 0
2 1
2 2
2 3
2 4
2 5
2 6
2 7
2 8
2 9
3 0
3 1
3 2
3 3
3 4
3 5
3 6
3 7
3 8
3 9
4 0
4 1
4 2
4 3
4 4
4 5
Monday Wednesday Friday
Data Warehouse Possibilities: Point in Time Tables
![Page 33: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/33.jpg)
33
Op Seqno
ID Date Msg
I 1 1 1/6/14 Hello World!
I 2 2 2/6/14 Meet MC
I 3 1 2/6/14 Goodbye World
I 4 1 3/6/14 Hello Tuesday
I 4 2 3/6/14 Ruby Wednesday
I 5 1 4/6/14 Final Count
ID Date Msg 1 1/6/14 Hello World! 1 2/6/14 Goodbye World 1 3/6/14 Hello Tuesday 1 4/6/14 Final Count
Data Warehouse Possibilities: Time Series Generation
![Page 34: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/34.jpg)
34
Agenda
1 Introduction to VMware Continuent
2 Understanding Continuent Replication
3 Using Analytics and Data Warehouses
4 Wrap-up and Questions
![Page 35: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/35.jpg)
Wrap-up • VMware Continuent Replication provides robust, flexible capabilities that have
been battle-tested in demanding customer environments • Replication features compare favorably to Oracle GoldenGate and Data Guard • VMware Continuent handles HA/DR, data warehouse loading, and edge
application use cases
35
![Page 36: Replicate from Oracle to data warehouses and analytics](https://reader031.vdocument.in/reader031/viewer/2022022414/587e21f31a28abbc2e8b7385/html5/thumbnails/36.jpg)
For more information, contact us: Robert Noyes Alliance Manager, AMER & LATAM [email protected] +1 (650) 575-0958 Philippe Bernard Alliance Manager, EMEA & APJ [email protected] +41 79 347 1385
MC Brown Senior Product Line Manager [email protected] Eero Teerikorpi Sr. Director, Strategic Alliance [email protected] +1 (408) 431-3305
www.vmware.com/products/continuent