from dolphins to elephants: real-time mysql to …...©continuent 2014. from dolphins to elephants:...

23
©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation Linas Virbalas, Senior Software Engineer

Upload: others

Post on 09-Feb-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014.

From Dolphins to Elephants: Real-Time MySQL to Hadoop

Replication with TungstenMC Brown, Director of Documentation

Linas Virbalas, Senior Software Engineer

Page 2: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

About Tungsten Replicator

• Open source drop-in replacement for MySQL replication, providing:

• Global transaction ID

• Multiple masters

• Multiple sources

• Flexible topologies

• Heterogeneous replication

• Parallel replication

���2

Page 3: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Tungsten Replicator

���3

Master

(Transactions + Metadata)

Slave

THL

DBMS Logs

Replicator

(Transactions + Metadata)

THLReplicator

Download transactions via network

Apply using JDBC

Page 4: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

How Tungsten Replicator Works

���4

Extract Filter Apply

StageExtract Filter Apply

StageExtract Filter Apply

Stage

Pipeline

Master DBMS

Transaction History Log

In-Memory Queue

Slave DBMS

Page 5: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Where we replicate

���5

star-schema

master-slave Heterogene Direct slave

fan-in slave all-masters

MySQL

Oracle

Oracle

MySQLRegular MySQL

Page 6: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Why Hadoop

• Customer driven

• Change in the air

• Environments moving to heterogenous

• NoSQL was the first

• We already support MongoDB

• Hadoop used for big analytics

• More frequently a live resource

• Big datasets require Map/Reduce

���6

Page 7: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Tungsten Replicator and Hadoop

• Extract from MySQL or Oracle

• Base Hadoop and Commercial distributions; Cloudera, HortonWorks, Amazon Elastic MapReduce and IBM InfoSphere BigInsights compatible

• Automatic replication of incremental changes Customizable formatting

• Hive Schema generation

• Materialized views in Hive for carbon-copy tables

• Sqoop and parallel extractor compatibility for provisioning

���7

Page 8: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Applying Data into Hadoop

���8

DBMS Logs

Replicator

Extract transactions

from log

THL

Replicator

CSVHadoop

Page 9: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Applying Data into Hadoop

���9

DBMS Logs

Replicator

Extract transactions

from log

THL

Replicator

CSVHadoop

Page 10: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Applying Data into Hadoop

���10

DBMS Logs

Replicator

Extract transactions

from log

THL

Replicator

CSVHadoop

Page 11: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014 ���11

Hadoop

CSV (Staging)

ID Message

Hive Table

Materialized Views

Page 12: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014 ���12

Hadoop

CSV (Staging)

ID Message

Hive Table

Materialised Views

Page 13: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014 ���13

Hadoop

CSV (Staging)

ID Message

Hive Table

Materialised Views

Page 14: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014 ���14

Hadoop

CSV (Staging)

ID Message

Hive Table

Materialized Views

Page 15: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

MySQL Configuration

• Use Row-based replication

• Every table must have primary keys

• Replicator configured with:

• Filters for metadata and primary key optimisation

• Extracts to standard THL

���15

Page 16: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Configure Hadoop

• Data is stored in CSV format on HDFS

• Cloudera, HortonWorks, Amazon Elastic Map Reduce (EMR) and IBM Infosphere BigInsights compatible

• Compatible with Hive, HBase, and others

• Staging DDL can be automatically generated

• Live Table DDL can be automatically generated

���16

Page 17: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

DDL Generation

• Built-in Tool, part of Tungsten Replicator

• Handles staging and live table DDL generation

• Default mode is for default migrations to Hive types

• Customizable for your needs

• BigInts as Strings

• Data transformations possible through filters

���17

Page 18: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Replicator Hadoop Configuration

• Batch Commit interval

• By rows count

• By time interval

• CSV Format

• Predefined formats

• Customizable by field and row characters

• Parallelization Supported

���18

Page 19: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Materialized Views

• Merges Data from Staging CSV into Hive Tables

• Processing separate from Replicator

• Allows individual table views to be generated independently

• Allows for custom materialization intervals

• Views based on 'live' data, or by point-in-time from CSV staging

���19

Page 20: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Demo

���20

Page 21: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Provisioning Data

• Sqoop

• Start the replicator

• Sqoop the data

• Materialized views are idempotent

• DDL generation is Hive compatible

• Parallel Extractor

• Currently Oracle only

• Will extract data in parallel and insert into THL

���21

Page 22: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

©Continuent 2014

Replication Management

• Replication can be stopped, started, restarted at any time

• Enables MySQL or Hadoop maintenance windows

• DDL customizable

• Views regenerated at any time

• Schema changes can be handled by re-Sqooping and dematerialising views

���22

Page 23: From Dolphins to Elephants: Real-Time MySQL to …...©Continuent 2014. From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation

Master Slave Hot Standby Failed

©Continuent 2014

Continuent Web Page: http://www.continuent.com

!

Tungsten Replicator 2.2 and 3.0 Preview: http://code.google.com/p/tungsten-replicator

Our Blogs: http://scale-out-blog.blogspot.com http://mcslp.wordpress.com http://flyingclusters.blogspot.com http://www.continuent.com/news/blogs

560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009 e-mail: [email protected]