multi source replication with mysql 5.7 @ verisure
TRANSCRIPT
Multi-Source Replication WithMySQL 5.7 @ VerisureAnd How We Got There
1 / 46
Table of Contents
VerisureData WarehouseTungsten ReplicatorMySQL 5.7Multi-Source ReplicationIssues EncounteredCompatibility
3 / 46
Europe's most popular home alarmVerisure
4 / 46
VerisureVerisure is Europe's leading provider of professionallymonitored home alarms and services for the connected andprotected home and business.
We believe it's a human right to feel safe and secure.
We connect and protect what really matters, our servicebrings peace of mind to families and small business owners.
Thanks to our strong focus on quality and service, ourcustomers are among the most satisfied in the industry.
https://www.verisure.com/our-offer.html
5 / 46
VerisureData Warehouse
6 / 46
Data WarehouseWhy the DataWarehouse setup ?
Troubleshooting tool for 3-line.Not possible to have BI optimized DDL in Prod.BI-teams in own deploy structure/scheduleHeavy data mining to follow up on :
Product qualityGsm usage/costs
Stage for Upgrade
7 / 46
Data WarehouseGetting started
First iteration was easyOld prod hardware was kept as a Datawarehouse.
Then you add shardingAnd things got a bit harderMaybe we could use tungsten ?
8 / 46
Tungsten ReplicatorLegacyOperational OverheadDirect ModeHardware RequiredReplication CapacityBugs
9 / 46
Tungsten ReplicatorLegacy
Initially: replicate a config database to shardsneeded "temporarily" during migration to sharding
Extra tungsten instances added to replicate to DW
10 / 46
Tungsten ReplicatorLegacy ... grew into...
11 / 46
Tungsten Replicator... grew into...
12 / 46
Tungsten ReplicatorShard migration done
Down to one Tungsten per shard
13 / 46
Tungsten ReplicatorDirect Mode
Due to legacy reasons, direct mode of tungsten is used.Separate host was configured to serve as tungsten host:
~0.15ms round trip time to database as extraTHL requires disk space:
Replication LAG = lot of disk space.Ended up with several shard clusters with tungsteninstances
14 / 46
Tungsten ReplicatorReplication Capacity
Parallel (per schema) Replication was used:heavier shards limit
Global Warming (Tungsten is very CPU Intensive)
15 / 46
Tungsten ReplicatorBugsIssue 960 (fixed in Tungsten Replicator 3.0):
When using statement based replication with temporarytables where a ROLLBACK of a commit is applied, thereplicator would fail to execute the rollback statement.
... and just commit the to be rollbacked transaction.
Before the fix, replication broke a lot and shards hadto be rebuilt regularly.
16 / 46
Tungsten ReplicatorOperational Overhead
Hard for by Non-DBA's such as oncall staffHard ... even for DBA'sCustom Percona Toolkit Plugin For Tungsten Replicator:https://github.com/grypyrg/percona-toolkit-plugin-tungsten-replicator
$ pt-table-checksum -u checksum --no-check-binlog-format \ --recursion-method=dsn=D=p,t=dsns --plugin=pt-plugin-tungsten_replicator.plCreated plugin from /vagrant/pt-plugin-tungsten_replicator.pl.PLUGIN get_slave_lag: Using Tungsten Replicator to check replication lagTungsten Replicator status of host node3 is OFFLINE:NORMAL, waitingReplica node3 is stopped. Waiting.*Tungsten Replicator status of host node3 is OFFLINE:NORMAL, waitingReplica lag is 119 seconds on node3. Waiting. TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE07-03T10:49:54 0 0 2097152 7 0 213.238 app.large_table
17 / 46
Move toMySQL 5.7
18 / 46
Move to MySQL 5.7Why?
MSR to replace Tungsten Replicator:built-in solution, easy operationallyreplication capacity: parallel replicationless infrastructure requiredeasier to train oncall staff
The start to validate and get experience withMySQL/Percona Server 5.7
19 / 46
Move to MySQL 5.7Native replication replaces Tungsten
20 / 46
Compare Before - After
21 / 46
MySQL 5.7Data Warehouse Queries
Collect queries (slowlog)Replay with pt-upgrade on 2 dw
22 / 46
MySQL 5.7Data Warehouse Queries
few queries were reported slower:sometimes prefers worse indexto be further investigated
table: alarms partitions: p201401,p201603,p201604 type: range key: alarm_insid_sid_time_ix key_len: 13 rows: 165 Extra: Using index condition; Using where; Using temporary; Using filesort
table: alarms partitions: p201401,p201603,p201604 type: range key: alarm_insid_time_ix key_len: 9 rows: 8089 Extra: Using index condition; Using where
23 / 46
MySQL 5.7Multi Source Replication
24 / 46
Multi Source ReplicationSyntax
Create user
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repluser05'@'192.168.204.10' IDENTIFIED BY 'rFAQKARW8rLZ9b2Z';
Figure out where to start
cat xtrabackup_binlog_info mysql-bin.203534 53973866
25 / 46
Multi Source ReplicationSyntax
Requirements
SET GLOBAL master_info_repository = 'TABLE';SET GLOBAL relay_log_info_repository = 'TABLE';
26 / 46
Multi Source ReplicationSyntax CHANGE MASTER TO MASTER_HOST='192.168.204.50', MASTER_USER='repluser05', MASTER_PASSWORD='rFAQKARW8rLZ9b2Z', MASTER_LOG_FILE='mysql-bin.203534', MASTER_LOG_POS=53973866 FOR CHANNEL 'host05';
SHOW SLAVE STATUS FOR CHANNEL 'host05'\G
STOP SLAVE IO_THREAD FOR CHANNEL 'host05'; RESET SLAVE FOR CHANNEL 'host05';
27 / 46
Multi Source ReplicationLoading the data
At first you setup replication before shards is usedBut sooner or later a reload is needed.
ChallengesPhysical backups can't be used to merge severalinstancesTB sized databases and mysqldump, not efficientload of data must be fast, or replication will nevercatch up. (based on past experience with Tungsten)Production is 5.6 and DW 5.7.Partitioned tables not supported for IMPORTTABLESPACE.
28 / 46
Multi Source ReplicationLoading the data
Dump the data using xtrabackup--export --prepare
Dump the schema using mysqldump--no-data --triggers --routines
Restore the DDLmysql < ddl.sql
Load the datadiscard tablespacecpimport tablespace
29 / 46
Multi Source ReplicationLoading the data Tips and Tricks
5.5 -> 5.6Tables with timestamps must be rebuilt to new formatRequires a extra machine to use for the rebuild.LoadALTER TABLE FORCEDump and start the Load
5.6 -> 5.7Tables must be created with row_format=COMPACTALTER TABLE ROW_FORMAT=COMPACT
30 / 46
Multi Source ReplicationLoading the data Tips and Tricks
5.6: Partitioned tablesNot supported, butImport each partition as a separate tableAdd to table using EXCHANGE PARTITION
Supported in 5.7, but no time to test yet...
31 / 46
Multi Source ReplicationSkipping a Trx, non-GTID:mysql> SET GLOBAL sql_slave_skip_counter=1;mysql> START SLAVE;ERROR 3086 (HY000): When sql_slave_skip_counter > 0, it is not allowed to start more than one SQL thread by using 'START SLAVE [SQL_THREAD]'. Value of sql_slave_skip_counter can only be used by one SQL thread at a time.Please use 'START SLAVE [SQL_THREAD] FOR CHANNEL' to start the SQL thread which will use the value of sql_slave_skip_counter.
mysql> START SLAVE FOR CHANNEL 'one';
32 / 46
Multi Source ReplicationReplication Filters
Replication filters cannot be configured per channel:http://bugs.mysql.com/bug.php?id=80843
33 / 46
Replication Capacity ImprovementsTungsten:
channels=5parallel-queue.maxSize=75000
# cat shard.listshard01=0shard02=1shard03=2shard04=3shard05=4
MySQL 5.7 Parallel Replication (per source):
slave_parallel_type=DATABASEslave_parallel_workers=5slave_pending_jobs_size_max=32M
34 / 46
Replication Capacity Improvements
35 / 46
Replication Capacity Improvements
36 / 46
Replication Capacity ImprovementsNew environment has lower replication capacity withlargest shards.Waiting for slave-parallel-type=LOGICAL_CLOCKWaiting on App to become ready forbinlog_format=ROWNeed more in depth analysis of the collected statistics
37 / 46
MySQL 5.7Issues Encountered
38 / 46
seconds_behind_master bughttps://bugs.mysql.com/bug.php?id=66921https://bugs.mysql.com/bug.php?id=80084 (still open)
39 / 46
Crash: innodb_open_files >open_files_limit
http://bugs.mysql.com/bug.php?id=78981Fixed in 5.6.30, 5.7.12, 5.8.0
| Variable_name | Value |+-------------------+-------+| innodb_open_files | 16384 || open_files_limit | 8510 |
2015-10-27 10:20:33 5535 [ERROR] InnoDB: Trying to do i/o to a tablespace which 2015-10-27 10:20:33 7fa725a05700 InnoDB: Error: trying to access tablespace 11015335InnoDB: but the tablespace does not exist or is just being dropped.2015-10-27 10:20:33 7fa725a05700 InnoDB: Operating system error number 24 in a file operation.InnoDB: Error number 24 means 'Too many open files'.InnoDB: Some operating system error numbers are described at...2015-10-27 10:20:33 7fa725a05700 InnoDB: Assertion failure in thread 140355867531008 in file buf0buf.cc line 2740InnoDB: We intentionally generate a memory trap.
40 / 46
Crash: Upgrade from 5.6 to 5.7 MSRReplication channels are getting same name in MSRafter upgrade, can also Crash MySQLhttps://bugs.mysql.com/bug.php?id=80302 -- Open :(
mysql> show slave status\G*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_Port: 11204 [..] Channel_Name: master1 Master_TLS_Version:*************************** 2. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_Port: 13358 [..] Channel_Name: master1 Master_TLS_Version:2 rows in set (0.00 sec)
41 / 46
MySQL 5.7 Multi SourceCompatibility
42 / 46
Percona ToolkitPercona Toolkit is missing MSR support.
Slave Lag: pt-heartbeat:https://github.com/grypyrg/percona-toolkit-plugin-heartbeat
43 / 46
InnoTop Multi Source SupportWritten by Johan Nilsson (Verisure)Soon to be merged:https://github.com/innotop/innotop/pull/129
[RO] Replication Status (? for help) 127.0.0.1, 3m, 1.93 QPS, 5/1/0 con/run/cac thds,
________________________________ Slave SQL Status ______________________________Channel Master Master UUID On? TimeLag Catchup RPos Lastone localhost d7e93be0-0452-08002774c31b Yes 00:00 0.00 327two localhost 5b9d58e4-0452-08002774c31b Yes 00:00 0.00 4
________________________________ Slave I/O Status _______________________________Channel Master Master UUID On? File RSize Postwo localhost 5b9d58e4-0472-08002774c31b No 57-co.bin.000003 154 one localhost d7e93be0-04b2-08002774c31b Yes 57-co.bin.000003 545
____________________________________________ Master Status ________________________File Position Binlog Cache Executed GTID Set Server UUID57-community-bin.000003 154 0.00% N/A b40426f3-045
44 / 46
Monitoring ToolsOur favorite things
Mytopinnotop
Patch for channelsIchinga/NagiosMrtg
Some Mysql metrics that are important for us.Grafana/Graphite/Collect
45 / 46
Kristofer [email protected]
Kenny [email protected]
Multi-Source Replication WithMySQL 5.7 @ Verisure
And How We Got There (Almost :-)Questions?
46 / 46