An issue of all slaves stop replication

An issue of all slaves stop replication Kentoku SHIBA

- All slaves stop replication with “Got fatal error 1236
from master when reading data from binary log: 'unknown error reading log event on the master; the first event 'binlog.004215' at 8610479, the last event read from './binlog.004226' at 154, the last byte read from './binlog.004226' at 154.‘”. We can see this error by “show slave status”. - When you execute “START SLAVE”, position slightly advances and replication stops with the same error again. - Recovery by executing “START SLAVE” on the slaves after binary log rotation on the master. - No HW error, Not disk full. Abstract of the issue

Don’t panic. Let’s explain about detail. Abstract of the issue

When a slave requested the latest binary log Master Slave
Request binary log The master referenced MYSQL_BIN_LOG::binlog_end_pos for checking sendable position of the latest binary log to the slave. Send binary log event MYSQL_BIN_LOG ::binlog_end_pos

A behavior of the COMMIT Master FLUSH stage A case
of SYNC_BINLOG=1, end position of the binlog is held by THD::m_trans_end_pos on FLUSH stage, then it’s copied to MYSQL_BIN_LOG::binlog_end_pos on SYNC stage. COMMIT stage THD:: m_trans_end_pos SYNC stage MYSQL_BIN_LOG ::binlog_end_pos

- FLUSH stage Write events from transactions to the binlog.
(Physical writes are not guaranteed on this stage) - SYNC stage Write the binlog events physically. - COMMIT stage Finalize of the COMMIT on each storage engine. (Transactions have PREPARE status before this stage) Each stage can work independently like a transaction is on COMMIT stage, next transaction is on SYNC stage, another transaction is on FLUSH stage. The abstract of stages at COMMIT

The condition of causing this issue Master FLUSH stage When
a binlog rotation is occurred before updating MYSQL_BIN_LOG::binlog_end_pos on SYNC stage, this issue occurs. COMMIT stage THD:: m_trans_end_pos SYNC stage MYSQL_BIN_LOG ::binlog_end_pos Rotate a binlog Caused by Previous commits at COMMIT stage Or Flush (binary) logs

- The binlog is not broken, the readable position is
wrong. So, replication can restart by command, read and send binarylog event if it is written after previous error position, then stop by causing same error. - When the binlog is rotated again, MYSQL_BIN_LOG::binlog_end_pos is resetted for new binlog file. If this time no transaction is on SYNC stage, the problem is gone. - If transaction has update tables that supports XA transaction, MYSQL_BIN_LOG::m_prep_xids is incremented on FLUSH stage. It makes binlog rotation waiting when it is decremented on COMMIT stage. So this case does not cause this issue. Behaviors after causing the issue

Base conditions - MySQL 5.7 or 8.0 (Include Percona Server
etc. Amazon RDS? Aurora? No source code available. MariaDB doesn’t have this issue) - SYNC_BINLOG=1 - Outputting binary logs The binlog is rotated at the same time of the following actions. (It includes rotation by FLUSH command) - Updating to tables that does not support XA transaction like MyISAM and MEMORY. - Executing DDL. (Except atomic DDL at MySQL 8.0) (“CREATE TABLE … SELECT …” does not include atomic DDL) - Executing commands like ANALYZE TABLE. The detail condition of causing the issue

MySQL 5.7 & MySQL 8.0 ANALYZE TABLE, REPAIR TABLE, OPTIMIZE
TABLE, FLUSH TABLES, FLUSH PRIVILEGES, FLUSH ENGINE LOGS, FLUSH ERROR LOGS, FLUSH GENERAL LOGS, FLUSH HOSTS, FLUSH OPTIMIZER_COSTS, FLUSH RELAY LOGS, FLUSH SLOW LOGS, FLUSH SLOW LOGS, FLUSH STATUS, FLUSH USER_RESOURCES, ALTER INSTANCE, CREATE TABLE ... SELECT ... MySQL 5.7 & MySQL 8.0 (Tables that does not support atomic DDL) CREATE TABLE, DROP TABLE, ALTER TABLE, RENAME TABLE, TRUNCATE, CREATE INDEX, DROP INDEX MySQL 5.7 & MySQL 8.0 (Tables that does not support XA transaction) INSERT, UPDATE, DELETE, REPLACE, LOAD DATA The statements of causing the issue (No.1)

MySQL 5.7 CREATE USER, DROP USER, RENAME USER, ALTER USER,
SET PASSWORD, GRANT, REVOKE (ALL), CREATE DATABASE, DROP DATABASE, ALTER DATABASE, CREATE VIEW, DROP VIEW, CREATE TABLESPACE, DROP TABLESPACE, ALTER TABLESPACE, CREATE FUNCTION, DROP FUNCTION, ALTER FUNCTION, CREATE PROCEDURE, DROP PROCEDURE, ALTER PROCEDURE, CREATE TRIGGER, DROP TRIGGER, CREATE EVENT, DROP EVENT, ALTER EVENT, FLUSH DES_KEY_FILE, FLUSH QUERY CACHE The statements of causing the issue (No.2)

It is possible to do like the following approaches. (Sometimes,
it is possible to do other approaches) 1. Execute “FLUSH BINARY LOGS” command on the master for rotating the binlog, then execute “START SLAVE (IO_THREAD)” on the slaves. If you get same error on the slave, re-execute “FLUSH BINARY LOGS” and “START SLAVE”. 2. Keep executing “START SLAVE (IO_THREAD)” until there are no more errors. Approach of recovering or avoiding from the issue

It is possible to do like the following approaches. (Sometimes,
it is possible to do other approaches) 3. Change “SYNC_BINLOG” from 1. Requires to take care of data inconsistency in case of master failure. 4. If you know in advance that problems do not occur during normal operation, execute “FLUSH BINARY LOGS” before executing DDL etc in maintenance for reducing the possibility of binary log rotation during DDL execution. Approach of recovering or avoiding from the issue

MySQL 5.7 fixed at 5.7.25 MySQL 8.0 fixed at 8.0.14
Current status of fixing

An issue of all slaves stop replication

An issue of all slaves stop replication

Kentoku SHIBA

More Decks by Kentoku SHIBA

Other Decks in Technology

Featured

Transcript

An issue of all slaves stop replication Kentoku SHIBA

- All slaves stop replication with “Got fatal error 1236

Don’t panic. Let’s explain about detail. Abstract of the issue

When a slave requested the latest binary log Master Slave

A behavior of the COMMIT Master FLUSH stage A case

- FLUSH stage Write events from transactions to the binlog.

The condition of causing this issue Master FLUSH stage When

- The binlog is not broken, the readable position is

Base conditions - MySQL 5.7 or 8.0 (Include Percona Server

MySQL 5.7 & MySQL 8.0 ANALYZE TABLE, REPAIR TABLE, OPTIMIZE

MySQL 5.7 CREATE USER, DROP USER, RENAME USER, ALTER USER,

It is possible to do like the following approaches. (Sometimes,

It is possible to do like the following approaches. (Sometimes,

MySQL 5.7 fixed at 5.7.25 MySQL 8.0 fixed at 8.0.14