replicate from oracle to oracle, oracle to mysql …...©continuent 2017 quick continuent facts •...
TRANSCRIPT
©Continuent 2017
Replicate from Oracle to Oracle, Oracle to MySQL
and Oracle to AnalyticsMC Brown, VP, Products
©Continuent 2017
Introducing Continuent
2
• The leading provider of clustering and replication for open source DBMS
• Our Product: Continuent Tungsten
• Clustering - Commercial-grade HA, performance scaling and data management for MySQL
• Replication - Flexible, high-performance data movement
©Continuent 2017
Quick Continuent Facts
• Largest Tungsten installation processes over 1000 million transactions daily on 225 terabytes of data
• Tungsten Replicator was application of the year at the 2011 MySQL User Conference
• Wide variety of topologies including MySQL, Oracle, Hadoop, Vertica, and MongoDB are in production now
• Kafka, Flume, ElasticSearch are coming - Come see the general challenges on Thursday
3
©Continuent 2017
Replication Basics
©Continuent 2017
Tungsten Master/Slave in Action
5
Master
(Transactions + Metadata)
Slave
THL
DBMSLogs
Replicator
(Transactions + Metadata)
THLReplicator
Download transactions via network
Apply using JDBC
©Continuent 2017
Tungsten Master/Slave in Action
6
Master
(Transactions + Metadata)
Slave
THL
DBMSLogs
Replicator
(Transactions + Metadata)
THLReplicator
Download transactions via network
Apply using JDBC
Redo Reader
©Continuent 2017
Master Replication Pipeline
7
Extract Filter Apply
StageExtract Filter Apply
Stage
Pipeline
MySQLMaster
TransactionHistory Log
In-MemoryQueue
Slave ReplicatorsBinlog
tcp/ip
©Continuent 2017
Slave Replication Pipeline
8
Extract Filter Apply
StageExtract Filter Apply
StageExtract Filter Apply
Stage
Pipeline
TransactionHistory Log
In-MemoryQueue
SlaveDBMS
MasterReplicator
tcp/
ip
©Continuent 2017
Replication to MongoDB
©Continuent 2017
Tungsten Master/Slave in Action
10
Master
(Transactions + Metadata)
Slave
THL
DBMSLogs
Replicator
(Transactions + Metadata)
THLReplicator
Download transactions via network
Apply using Mongo API
Redo Reader
©Continuent 2017
Replication to Vertica, Hadoop
and RedShift
©Continuent 2017
The Data Warehouse Impedance Mismatch
12
Replication
CSVFilesCSVFilesBuffered
Transactions
Dump/load
Single Transactions
Batches
©Continuent 2017
Column Store--Real-Time Batches
MySQL/Oracle Tungsten Master Replicator
Service ora2vr
Special Filters* pkey - Fill in pkey info* colnames - Fill in names* replicate - Ignore tables
Tungsten Slave Replicator
Service ora2vr
CSVFilesCSVFilesCSVFilesCSVFilesCSVFiles
Large transaction batches to leverage load parallelization
©Continuent 2017
Batch Loading--The Gory Details
Replicator
Service ora2vrTransactions from master
CSVFilesCSVFilesCSVFiles
StagingTablesStagingTablesStagingTables
Base Tables
Base Tables
Base Tables
Merge Script
(or)COPY
directly to base tables
COPY to stage tables SELECT to
base tables
©Continuent 2017
Basic Hadoop Loading
MySQL/Oracle Tungsten Master Replicator
hadoop
Master-Side Filtering* pkey - Fill in pkey info* colnames - Fill in names* replicate - Subset tables to be replicated
Tungsten Slave Replicator
hadoop
CSVFilesCSVFilesCSVFilesCSVFilesCSVFiles
HadoopCluster
Extract fromsource DBMS
Load raw CSV to HDFS (e.g., via LOAD
DATA to Hive)
Access via Hive
©Continuent 2017
How the Materialization Works
16
Op Seqno ID Msg
I 1 1 Hello World!
I 2 2 Meet MC
D 3 1
I 3 1 Goodbye World
Op Seqn
ID MsgI 2 2 Meet
MCI 3 1 Goodbye World
©Continuent 2017
Heterogeneous Replication Needs Metadata
• Need to convert DDL using ddlscan
• Not all ddlscan are equal
• Datatypes on some targets are different
• Remember we can convert data
• Inline DDL coming in 5.2 (beta)
• What about provisioning?
©Continuent 2012
Provisioning plus Replication
18
MySQL/Oracle
Tungsten Master
hadoop
Tungsten Slave
hadoop
CSVFilesCSVFilesCSVFilesCSVFilesCSV
Apache Sqoop/ETL
Fast data filtering
BufferedCSV
Programmable load scripts
Parallel applyParallel table
dumps
Low impact extraction
©Continuent 2017
star
master-slave Heterogeneous
fan-in slave all-masters
MySQL
Oracle
Oracle
MySQL Oracle
Oracle
MySQL MySQL
©Continuent 2017
More Advanced Topologies
20
OracleMaster
OracleSlave
MySQL ContinuentCluster
Primary ReplicationPath
Secondary Replication
Path
©Continuent 2017
Split DB
21
MySQL
MySQL
Mongo
©Continuent 2017
More Advanced Topologies
22
ClientA
ClientB
ClientC
©Continuent 2017
Concentration with Source ID
23
Meet MC
Goodbye World
Meet MC
Goodbye World
Meet MC
Goodbye World
ClientA Meet MCClientA Goodbye WorldClientB Meet MCClientB Goodbye WorldClientC Meet MCClientC Goodbye World
ClientA
ClientB
ClientC
©Continuent 2017
Tungsten Replicator & Hadoop
24
• Extract from MySQL or Oracle
• Hadoop Support
• Provision using Sqoop or parallel extraction
• Schema generation for Hive
• Tools for generating materialized views
• Parallel CSV file loading
• Partition loaded data by commit time
• Schema Change Notification
©Continuent 2017
Change Data is Valuable
25
1 2 3 4 5 6 7 8 9 1 0
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
2 0
2 1
2 2
2 3
2 4
2 5
2 6
2 7
2 8
2 9
3 0
3 1
3 2
3 3
3 4
3 5
3 6
3 7
3 8
3 9
4 0
4 1
4 2
4 3
4 4
4 5
Monday Wednesday Friday
©Continuent 2017
Time Series Generation
26
Op Seqno
ID Date MsgI 1 1 1/6/1
4Hello World!
I 2 2 2/6/14
Meet MCI 3 1 2/6/1
4Goodbye
WorldI 4 1 3/6/14
Hello TuesdayI 4 2 3/6/1
4Ruby
WednesdayI 5 1 4/6/14
Final Count
ID Date Msg
1 1/6/14 Hello World!
1 2/6/14 Goodbye World
1 3/6/14 Hello Tuesday
1 4/6/14 Final Count
©Continuent 2017
Where Next?
• Get Replicator from Github - github.com/continuent/tungsten-replicator
• Come on Thursday