ramp-tutorial for mysql cluster - scaling with continuous availability

133
René Cannaò Senior DBA PalominoDB [email protected] @rene_cannao MySQL Cluster (NDB) Tutorial

Upload: pythian

Post on 06-May-2015

2.289 views

Category:

Technology


23 download

DESCRIPTION

Rene Cannao's Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability. Rene, a Senior Operational DBA at PalominoDB.com, will guide attendees through a hands-on experience in the installation, configuration management and tuning of MySQL Cluster. Agenda: - MySQL Cluster Concepts and Architecture: we will review the principle of a fault-tolerant shared nothing architecture, and how this is implemented into NDB; - MySQL Cluster processes : attendees will understand the various roles and interactions between Data Nodes, API Nodes and Management Nodes; - Installation : we will install a minimal HA solution with MySQL Cluster on 3 virtual machines; - Configuration of a basic system : upon describing the most important configuration parameters, Data/API/Management nodes will be configured and the Cluster launched; - Loading data: the "world" schema will be imported into NDB using "in memory" and "disk based" storages; the attendees will experience how data changes are visible across API Nodes; - Understand the NDB Storage Engine : internal implementation details will be explained, like synchronous replication, transaction coordinator, heartbeat, communication, failure detection and handling, checkpoint, etc; - Query and schema design : attendees will understand the execution plan of queries with NDB, how SQL and Data Nodes communicate, how indexes and partitions are implemented, condition pushdown, join pushdown, query cache; - Management and Administration: the attendees will test High Availability of NDB when a node become unavailable will learn how to read log file, how to stop/start any component of the Cluster to perform a rolling restart with no downtime, and how to handle a degraded setup; - Backup and Recovery: attendees will be driven through the procedure of using NDB-native online backup and restore, and how this differs from mysqldump; - Monitor and improve performance: attendee will learn how to boost performance tweaking variables according to hardware configuration and application workload

TRANSCRIPT

Page 1: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

René CannaòSenior DBA PalominoDB

[email protected]@rene_cannao

MySQL Cluster (NDB)

Tutorial

Page 2: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

AgendaIntroduction - Presentation MySQL Cluster OverviewMySQL Cluster ComponentsPartitioningInternal BlocksGeneral ConfigurationTransactionsCheckpoints on diskHandling failureDisk dataStart Types and Start PhasesStorage RequirementsBackupsNDB and binlogNative NDB Online Backups

Page 3: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

About myself

My name is René Cannaò

Working as DBA since 2005Starting my experience with NDB in 2007In 2008 I joined the MySQL Support Engineer Team at Sun/Oracle

Working as Senior DBA for PalominoDBWe have a booth: please come and visit us!

Page 4: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Questions

If you have questions stop me at any time!

If the answer is short will be answered immediately.

If the answer it long, we can discuss at the end of the tutorial

Page 5: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

What this course is Not

This course doesn't cover advanced topics like:

Advanced SetupPerformance TuningConfiguration of multi-threadsOnline add nodesNDBAPInoSQL accessGeographic ReplicationBLOB/TEXT fields

Page 6: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

MySQL Cluster Overview

Page 7: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

MySQL Cluster Overview

High Availability Storage Engine for MySQL

Provides:● High Availability● Clustering● Horizontal Scalability● Shared Nothing Architecture● SQL and noSQL access

Page 8: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

High Availability and Clustering

High Availability:● survives routine failures or maintenance

○ hardware or network failure, software upgrade, etc● provides continuous services● no interruption of services

Clustering:● parallel processing● increase availability

Page 9: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Scalability

Ability to handle increased number of requests

Vertical:● replace HW with more powerful one

Horizontal:● add component to the system

Page 10: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Shared Nothing Architecture

Characterized by:● No single point of failure● Redundancy● Automatic Failover

Page 11: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

MySQL Cluster Summary

Shared Nothing ArchitectureNo Single Point Of Failure (SPOF)Synchronous replication between nodesAutomatic failover with no data lossACID transactionDurability with NumberOfReplica > 1READ COMMITTED transaction isolation levelRow level locking

Page 12: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

MySQL ClusterComponents

Page 13: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Components Overview

mysql client MySQL C APIPHP Connector/J

NDB API

NDB Management Clients

NDB Management Servers

mysqld mysqld mysqld

ndbd

ndbd ndbd

ndbd

Page 14: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Cluster Nodes

3 Node Types are present in a MySQL Cluster:● Management Node● Data Node● Access Node

Page 15: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Management Node

ndb_mgmd process

Administrative tasks:● configure the Cluster● start / stop other nodes● run backup● monitoring● coordinator

Page 16: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Data Node

ndbd process ( or ndbmtd )

Stores cluster data● mainly in memory● also on disk ( limitations apply )

Coordinate transactions

Page 17: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Data Node ( ndbmtd )

What is ndbmtd ?

Multi-threaded equivalent of ndbdFile system compatible with ndbdBy default run in single thread modeConfigured via MaxNoOfExecutionThreads of ThreadConfig

Page 18: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

API Node

Process able to access data stored in Data Nodes:● mysqld process with NDBCLUSTER support● application that connects to the cluster

through NDB API (without mysqld)

Page 19: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Partitioning

Page 20: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Vertical Partitioning

Page 21: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Horizontal Partitioning

Page 22: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Partitions example (1/2)

5681 Alfred 30 2010-11-10

8675 Lisa 34 1971-01-12

5645 Mark 21 1982-01-02

0965 Josh 36 2009-06-33

5843 Allan 22 2007-04-12

1297 Albert 40 2000-12-20

1875 Anne 34 1984-11-22

9346 Mary 22 1983-08-05

8234 Isaac 20 1977-07-06

8839 Leonardo 28 1999-11-08

7760 Donna 37 1997-03-04

3301 Ted 33 2005-05-23

Table

Page 23: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Partitions example (2/2)

5681 Alfred 30 2010-11-10

8675 Lisa 34 1971-01-12

5645 Mark 21 1982-01-02

0965 Josh 36 2009-06-33

5843 Allan 22 2007-04-12

1297 Albert 40 2000-12-20

1875 Anne 34 1984-11-22

9346 Mary 22 1983-08-05

8234 Isaac 20 1977-07-06

8839 Leonardo 28 1999-11-08

7760 Donna 37 1997-03-04

3301 Ted 33 2005-05-23

P1

P2

P3

P4

Table

Page 24: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Partitioning in MySQL Cluster

MySQL Cluster natively supports Partitioning

Distribute portions of individual tables in different locations

PARTITION BY (LINEAR) KEY is the only partitioning type supported

Page 25: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Partitions, Node Groups & Replicas

Partition:portion of the data stored in the cluster

Node Groups:set of data nodes that stores a set of partitions

Replicas:copies of a cluster partition

Page 26: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Partitions in NDB

P2P1 P3 P4

NDB1 NDB2 NDB3 NDB4

Page 27: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Partitions and Replicas

P2P

P1S

P1P

P2S

P3P

P4S

P4P

P3S

NDB1 NDB2 NDB3 NDB4

Page 28: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Node Groups

P2P

P1S

P1P

P2S

P3P

P4S

P4P

P3S

NDB1 NDB2 NDB3 NDB4

Node Group 0 Node Group 1

Page 29: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Node Groups

P2P

P1S

P1P

P2S

P3P

P4S

P4P

P3S

NDB1 NDB2 NDB3 NDB4

Node Group 0 Node Group 1

Page 30: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Node Groups

P2P

P1S

P1P

P2S

P3P

P4S

P4P

P3S

NDB1 NDB2 NDB3 NDB4

Node Group 0 Node Group 1

Page 31: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Internal Blocks

Page 32: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Blocks in Data Node (1/2)

A data node is internally structured in several blocks/code with different functions:● Transaction Coordinator (TC) :

○ coordinates transactions○ starts Two Phase Commit○ distributes requests to LQH

● Local Query Handler (LQH)○ allocates local operation records○ manages the Redo Log

Page 33: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Blocks in Data Node (2/2)

● ACC○ stores hash indexes○ manages records locking

● TUP○ stores row data

● TUX○ stores ordered indexes

● BACKUP , CMVMI , DICT , DIH , SUMA , etc

Ref:http://dev.mysql.com/doc/ndbapi/en/ndb-internals-ndb-protocol.html

Page 34: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Blocks on ndbd (simplified)

Page 35: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Blocks on ndbmtd (simplified)

Page 36: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Where is data stored?

In memory!

TUP TUX

LQH

TC

ACC

IndexMemory

primary keysunique keys

DataMemory

data rowsordered indexes

Page 37: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

General Configuration

Page 38: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Global parameters

NoOfReplicas (mandatory)Defines the number of replicas for each partition/table

DataDir:Location for log/trace files and pid file

FileSystemPath:location for of all files related to MySQL Cluster

Page 39: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Global boolean parameters

LockPagesInMainMemory (0) :Lock all memory in RAM ( no swap )

StopOnError (1) :Automatic restart of data node in case of failure

Page 40: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

IndexMemory and DataMemory

IndexMemory: defines the amount of memory for hashes indexes ( PK and UNIQUE )

DataMemory: defines the amount of memory for:

dataordered indexesUNDO information

Page 41: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Metadata Objects

MaxNoOfTables

MaxNoOfOrderedIndexes

MaxNoOfUniqueHashIndexes

Page 42: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Transactions

Page 43: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Transactions Overview

Two Phase Commit

ACID● Durability with multiple copies● Committed on memory

READ_COMMITTED

Page 44: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Transaction implementation

● The API node contacts a TC in one of the Data Node (round-robin fashion, or Distribution Awareness)

● The TC starts the transaction and the Two Phase Commit (2PC) protocol

● The TC contacts the LQH of the Data Nodes involved in the transaction

● LQH handles row level locking:○ Exclusive lock○ Shared lock○ No lock

Page 45: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Two Phase Commit (1/2)

Phase 1 , Prepare :Node groups get their information updated

Phase 2 , Commit :the change is committed

Page 46: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Two Phase Commit (2/2)

Page 47: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Rows operations

Primary key operation ( read and write )

Unique index operation ( read )

Ordered index scans ( read only )

Full-table scans (read only)

Page 48: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Primary Key Write (1/4)

● API node connects to TC● TC connects to LQH of the data node with the primary

fragment● LQH queries ACC to get the row id● LQH performs the operation on TUP● LQH connects to the LQH of the data node with the

secondary fragment● LQH on secondary performs the same operation● LQH on secondary informs TC that the prepare phase is

completed● TC starts the commit phase

Page 49: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Primary Key Write (2/4)

Page 50: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Primary Key Write (3/4)

Page 51: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Primary Key Write (4/4)

Page 52: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Primary Key ReadWithout distribution awareness

Page 53: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Primary Key ReadWith distribution awareness

Page 54: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Unique Index Read (1/2)

Unique Index is internally implemented as a hidden table with two columns where the PK is the Unique Key itself and the second column is the PK of the main table.

To read a Unique index:● A PK read is performed on hidden table to

get the value of the PK of main table● A PK read is performed on main table

Page 55: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Unique Index Read (2/2)

Page 56: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Full Table Scan

Page 57: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Ordered Index Scan

Page 58: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Transaction Parameters (1/3)

MaxNoOfConcurrentTransactions: number of simultaneous transactions that can run in a node

(maximum number of tables accessed in any single transaction + 1)

* number of cluster SQL nodes

Page 59: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Transaction Parameters (2/3)

MaxNoOfConcurrentOperations: Total number of records that at any time can be updated or locked in data node, and UNDO.Beside the data node, TC needs information about all the records involved in a transaction

MaxNoOfLocalOperations: for not too large transactions, it defaults to MaxNoOfConcurrentOperationsx1.1

Page 60: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Transaction Parameters (3/3)

MaxDMLOperationsPerTransactionLimits the number of DML operationsCount is in # of records, not # of statements

TransactionInactiveTimeoutIdle transactions are rolled back

TransactionDeadlockDetectionTimeoutTime based deadlock and lock wait detectionAlso possible that the data node is dead or overloaded

Page 61: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Checkpoints on disk

Page 62: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Checkpoints

MySQL Cluster is a in-memory storage engine:● data is only in memory? WRONG!● data is regularly saved on disk

When a data node writes on disk is performing a checkpoint

Exception:MySQL Cluster can also run in DiskLess mode, with "reduced durability" and unable to run Native NDB Backup

Page 63: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Checkpoint types

Local Checkpoint (LCP):● specific to a single node;● all data in memory is saved to disk● can take minutes or hours

Global Checkpoint (GCP):● transactions across nodes are synchronized● redo log flushed on disk

Page 64: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Memory and disk checkpoints

Coordinator

DataMemory

IndexMemory

RedoBuffer

LCP0

LCP1

LCP0

DataMemory

IndexMemory

RedoBuffer

LCP1

RedoLog Files RedoLog Files

LCP LCP

GCP GCP

Page 65: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Undo and Redo Parameters

UndoIndexBuffer and UndoDataBufferBuffers used during local checkpoint to perform consistent snapshot without blocking writes.

RedoBufferIn memory buffer for REDO log (on disk)

Page 66: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Checkpointing parameters

TimeBetweenLocalCheckpointsDoesn't define a time interval, but the amount of writes operations ( as a base-2 logarithm of 4-byte words) . Examples:20 ( default ) means 4 x 220 = 4MB24 means 4 x 224 = 64MB

TimeBetweenGlobalCheckpointsDefines how often commits are flushed to REDO logs

Page 67: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Fragment Log Files

NoOfFragmentLogFiles ( default 16 )FragmentLogFileSize ( default 16 MB )

REDO logs are organized in a ring, where head and tail never meet. If checkpointing is slow, updating transactions are aborted. Total REDO logs size is defined as:NoOfFragmentLogFiles set of 4 FragmentLogFileSizeDefault : 16 x 4 x 16MB = 1024MB

InitFragmentLogFiles : SPARSE or FULL

Page 68: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Checkpoint write control

ODirect: (off) O_DIRECT for LCP and GCP

CompressedLCP: (off) equivalent of gzip --fast

DiskCheckpointSpeed: speed of local checkpoints to disk ( default 10MB/s )

DiskCheckpointSpeedInRestart : during restart

Page 69: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Diskless mode

Diskless :It is possible to configure the whole Cluster to run without using the disk.When Diskless is enabled ( disabled by default ):● LCP and GCP are not performed;● NDB Native Backup is not possible;● a complete cluster failure ( or shutdown ) =

data lost

Page 70: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Handling Failure

Page 71: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Data node failure (1/2)

Failures happen!HW failure, network issue, OS crash, SW bug, …

MySQL Cluster is constantly monitoring itself

Each data node periodically checks other data nodes via heartbeat

Each data node periodically checks API nodes via heartbeat

Page 72: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Data node failure (2/2)

If a node fails to respond to heartbeats, the data node that detects the failure communicates it to other data nodes

The data nodes stop using the failed node and switch to the secondary node for the same fragment

Page 73: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Timeouts Parameters

TimeBetweenWatchDogCheckThe watchdog thread checks if the main thread got stuck

HeartbeatIntervalDbDbHeartbeat interval between data nodes

HeartbeatIntervalDbApiHeartbeat interval between data nodes and API nodes

Page 74: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Disk Data

Page 75: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Available since MySQL 5.1.6

Store non-indexed columns on disk

Doesn't support variable sized storage

On disk : tablespaces and undo logs

In memory : page buffer ( cache )

Disk Data

Page 76: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Objects:● Undo log files: store information to rollback transactions● Undo log group: a collection of undo log files● Data files: store MySQL Cluster Disk Data table data● Tablespace : a named collection of data files

Notes:● Each tablespace needs an undo log group assigned● currently, only one undo log group can exist in the same

MySQL Cluster

Disk Data Objects

Page 77: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Disk Data - data flow

Disk Page Buffer

Tablespace 1 Tablespace 2 Undo Log Group

ThreadIOPool

Metadata

Shared Memory

Undo Buffer

Pages Requests Queues

Page 78: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Disk Data Configuration Parameters

DiskPageBufferMemory : cache for disk pages

SharedGlobalMemory : shared memory for● DataDisk UNDO buffer● DiskData metadata ( tablespace / undo / files )● Buffers for disk operations

DiskIOThreadPool : number of threads for I/O

Page 79: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Using SQL statement:

CREATE LOGFILE GROUP logfile_groupADD UNDOFILE 'undo_file'[INITIAL_SIZE [=] initial_size][UNDO_BUFFER_SIZE [=] undo_buffer_size]ENGINE [=] engine_name

Create Undo Log Group

Page 80: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

UNDOFILE : filename on diskINITIAL_SIZE : 128MB by defaultUNDO_BUFFER_SIZE : 8MB by defaultENGINE : { NDB | NDBCLUSTER }

Create Undo Log Group

Page 81: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

ALTER LOGFILE GROUP logfile_groupADD UNDOFILE 'undo_file'[INITIAL_SIZE [=] initial_size]ENGINE [=] engine_name

Alter Undo Log Group

Page 82: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

CREATE LOGFILE GROUP lg1ADD UNDOFILE 'undo1.log'INITIAL_SIZE = 134217728UNDO_BUFFER_SIZE = 8388608ENGINE = NDBCLUSTER

ALTER LOGFILE GROUP lg1 ADD UNDOFILE 'undo2.log' INITIAL_SIZE = 134217728

ENGINE = NDBCLUSTER

Create Undo Log Group : example

Page 83: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

In [NDBD] definition:

InitialLogFileGroup = [name=name;] [undo_buffer_size=size;] file-specification-list

Example:InitialLogFileGroup = name=lg1; undo_buffer_size=8M; undo1.log:128M;undo2.log:128M

Create Undo Log Group

Page 84: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Using SQL statement:

CREATE TABLESPACE tablespace_nameADD DATAFILE 'file_name'USE LOGFILE GROUP logfile_group[INITIAL_SIZE [=] initial_size]ENGINE [=] engine_name

Create Tablespace

Page 85: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

logfile_group : mandatory! The log file group must be specified DATAFILE : filename on diskINITIAL_SIZE : 128MB by defaultENGINE : { NDB | NDBCLUSTER }

Create Tablespace

Page 86: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Using SQL statement:

ALTER TABLESPACE tablespace_name{ADD|DROP} DATAFILE 'file_name'[INITIAL_SIZE [=] size]ENGINE [=] engine_name

Alter Tablespace

Page 87: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

CREATE TABLESPACE ts1ADD DATAFILE 'datafile1.data'USE LOGFILE GROUP lg1INITIAL_SIZE = 67108864ENGINE = NDBCLUSTER;

ALTER TABLESPACE ts1ADD DATAFILE 'datafile2.data'INITIAL_SIZE = 134217728ENGINE = NDBCLUSTER;

Create Tablespace : example

Page 88: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

In [NDBD] definition:

InitialTablespace = [name=name;] [extent_size=size;] file-specification-list

Example:InitialTablespace = name=ts1; datafile1.data:64M; datafile2.data:128M

Note: no need to specify the Undo Log Group

Create Tablespace

Page 89: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Start Typesand

Start Phases

Page 90: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Start Types

● (IS) Initial start● (S) System restart● (N) Node restart● (IN) Initial node restart

Page 91: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Initial start (IS)

All the data nodes start with clean filesystem.

During the very first start, or when all data nodes are started with --initial

Note:Disk Data files are not removed from the filesystem when data node is started with --initial

Page 92: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

System restart (S)

The whole Cluster is restarted after it has been shutdown .

The Cluster resumes operations from where it was left before shutdown. No data is lost.

Page 93: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Node restart (N)

The Node is restarted while the Cluster is running.

This is necessary during upgrade/downgrade, changing configuration, performing HW/SW maintenance, etc

Page 94: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Initial node restart (IN)

Similar to node restart (the Cluster is running) , but the data node is started with a clean filesystem.

Use --initial to clean the filesystem .Note: --initial doesn't delete DiskData files

Page 95: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Start Phases ( 1/4 )

Phase -1 : setup and initializationPhase 0 : filesystem initialization Phase 1 : communication inter-blocks and between nodes is initializedPhase 2 : status of all nodes is checkedPhase 3 : DBLQH and DBTC setup communication between them

Page 96: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Start Phases ( 2/4 )

Phase 4:● IS or IN : Redo Log Files are created● S : reads schemas, reads data from last

Local Checkpoint, reads and apply all the Global Checkpoints from Redo Log

● N : find the tail of the Redo Log

Page 97: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Start Phases ( 3/4 )

Phase 5 : for IS or S, a LCP is performed, followed by GCPPhase 6 : node groups are created Phase 7 : arbitrator is selected, data nodes are marked as Started, and API/SQL nodes can connectPhase 8 : for S , all indexes are rebuiltPhase 9 : internal variables are reset

Page 98: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Start Phases ( 4/4 )

Phase 100 : ( obsolete )Phase 101 : ● data node takes responsibility for delivery its

primary data to subscribers● for IS or S : transaction handlers is enabled● for N or IN : new node can act as TC

Page 99: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Storage requirements

Page 100: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

In general, storage requirements for MySQL Cluster is the same as storage requirements for other storage engines.

But a lot of extra factors need to be considered when calculating storage requirements for MySQL Cluster tables.

Storage requirements

Page 101: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Each individual column is 4-byte aligned.

CREATE TABLE ndb1 ( id INT NOT NULL PRIMARY KEY, type_id TINYINT, status TINYINT, name BINARY(14), KEY idx_status ( status ), KEY idx_name ( name )) ENGINE = ndbcluster;

Bytes used per row: 4+4+4+16 = 28 ( storage )

4-byte alignment

Page 102: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

If table definition allows NULL for some columns, MySQL Cluster reserves 4 bytes for 32 nullable columns ( or 8 bytes for 64 columns)

CREATE TABLE ndb1 ( id INT NOT NULL PRIMARY KEY, type_id TINYINT, status TINYINT, name BINARY(14), KEY idx_status ( status ), KEY idx_name ( name )) ENGINE = ndbcluster;

Bytes used per row: 4 ( NULL ) + 28 ( storage )

NULL values

Page 103: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Each record has a 16 bytes overhead

Each ordered index has a 10 bytes overhead per record

Each page is 32KB , and each record must be stored in just one page

Each page has a 128 byte page overhead

DataMemory overhead

Page 104: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Each primary/unique key requires 25 bytes for the hash index plus the size of the field

8 more bytes are required if the size of the field is larger than 32 bytes

IndexMemory overhead

Page 105: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Storage requirement for test tableCREATE TABLE ndb1 ( id INT NOT NULL PRIMARY KEY, type_id TINYINT, status TINYINT, name BINARY(14), KEY idx_status ( status ), KEY idx_name ( name )) ENGINE = ndbcluster;

Memory used per row = ~107 bytes● 4 bytes for NULL values● 28 bytes for storage● 16 bytes for record overhead● 30 bytes for ordered indexes (3)● 29 bytes for primary key index ( 25 bytes + 4 bytes )plus all the wasted memory in page overhead/alignment

Page 106: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Hidden Primary Key

In MySQL Cluster every table requires a primary key.

A hidden primary key is create if PK is not explicitly defined

The hidden PK uses 31-35 bytes per record

Page 107: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Backups

Page 108: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Why backup?

● Isn't MySQL Cluster fully redundant?● Human mistakes● Application bugs● Point in time recovery● Deploy of development/testing envs● Deploy a DR site● ... others

Page 109: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Backup methodologies

mysqldump

Native Online NDB Backup

Page 110: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Backup with mysqldump

storage engine agnostic

dumps other objects like triggers, view, SP

connects to any API node

Page 111: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Backup with mysqldump ( cont. )

Usage:mysqldump [options] db_name [tbl_name ...]mysqldump [options] --databases db_name ...mysqldump [options] --all-databases

shell> mysqldump world > world.sql

Question: What about --single-transaction ?

Page 112: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Backup with mysqldump ( cont. )

Bad news:No write traffic to MySQL Cluster

○ SINGLE USER MODE○ reduced availability

Page 113: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Backup with mysqldump ( cont. )

Check the content of the dump for:

● CREATE TABLE● INSERT INTO● CREATE LOGFILE● CREATE TABLESPACE

Page 114: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Restore from mysqldump

Standard recovery procedure:shell> mysql -e "DROP DATABASE world;"

shell> mysqladmin create world

shell> cat world.sql | mysql world

Page 115: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Restore from mysqldump ( cont. )

Point in time recovery?

● shell> mysqldump --master-data ...● apply binlog : from which API node?

Page 116: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

NDB and binlog

Page 117: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

MySQL Cluster Overview ( reminder )mysql client MySQL C APIPHP Connector/J

NDB API

NDB Management Clients

NDB Management Servers

mysqld mysqld mysqld

ndbd

ndbd ndbd

ndbd

Page 118: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

MySQL Cluster and binlog

ndbd

ndbd ndbd

ndbd

mysqld

NDB Engine binlog

mysqld

NDB Engine binlog

mysqld

NDB Engine binlog

Page 119: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

NDB binlog injector thread

When connecting to the data nodes, NDBCluster storage engine subscribes to all the tables in the Cluster.

When any data changes in the Cluster, the NDB binlog injection thread gets notified of the change, and writes to the local binlog.

binlog_format=ROW

Page 120: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Asynchronous Replication to standard MySQL ( ex: InnoDB )

ndbd

ndbd ndbd

ndbd

mysqld

NDB Engine

binlog

mysqld

NDB Engine

binlog

mysqldasynchronous replication

Page 121: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Asynchronous Replicationto MySQL Cluster

ndbd

ndbd ndbd

ndbd

mysqld

NDB Engine

binlog

mysqld

NDB Engine

binlog

mysqldasynchronous replication

ndbd

ndbd ndbd

ndbd

ndbd

ndbd ndbd

ndbd

NDB Engine

Page 122: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Native OnlineNDB Backups

Page 123: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Native Online NDB Backup

Hot backup

Consistent backup

Initialized by a cluster management node

Page 124: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

NDB Backup Files

Three main components:● Metadata : BACKUP-backup_id.node_id.ctl● Table records : BACKUP-backup_id-0.node_id.data

different nodes save different fragments during the backup

● Transactional log : BACKUP-backup_id-0.node_id.logdifferent nodes save different logs because they store different fragments

Saved on all data nodes: each data node performs the backup of its own fragment

Page 125: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

START BACKUP

shell> ndb_mgm

ndb_mgm> START BACKUP

Waiting for completed, this may take several minutesNode 1: Backup 4 started from node 12Node 1: Backup 4 started from node 12 completed StartGCP: 218537 StopGCP: 218575 #Records: 2717875 #LogRecords: 167 Data: 1698871988 bytes Log: 99056 bytes

Page 126: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

START BACKUP (cont.)Waiting for completed, this may take several minutesNode Did: Backup Bid started from node MidNode Did: Backup Bid started from node Mid completed StartGCP: 218537 StopGCP: 218575 #Records: 2717875 #LogRecords: 167 Data: 1698871988 bytes Log: 99056 bytes

Did : data node coordinating the backupBid : unique identifier for the current backupMid : management node that started the backup

Number of Global Checkpoints executed during backupNumber and size of records in backupNumber and size of records in log

Page 127: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

START BACKUP (cont.)

START BACKUP options:● NOWAIT● WAIT STARTED● WAIT COMPLETED ( default )

Page 128: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Backup Parameters

BackupDataBufferSize:stores data before is written to disk

BackupLogBufferSize:buffers log records before are written to disk

BackupMemory:BackupDataBufferSize + BackupLogBufferSize

Page 129: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Backup Parameters (cont.)

BackupWriteSize:default size of messages written to disk from backup buffers ( data and log )

BackupMaxWriteSize:maximum size of messages written to disk

CompressedBackup:equivalent to gzip --fast

Page 130: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Restore from Online backup

ndb_restore: program that performs the restore

Procedure:● import the metadata from just one node● import the data from each node

Page 131: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

ndb_restore

Important options:-c string : specify the connect string-m : restore metadata-r : restore data-b # : backup ID-n # : node ID-p # : restore in parallel --no-binlog

Page 132: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

ndb_restore (cont.)

--include-databases = db_lists--exclude-databases = db_lists

--include-tables = table_lists--exclude-tables = table_lists

--rewrite-database = olddb,newdb

... and more!

Page 133: Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability

Q & A