Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Un-split brain MySQL

Shlomi Noach
February 02, 2019

Un-split brain MySQL

Is there a way to amend a MySQL split brain scenario? When two servers are diverged from each other, is it possible to identify and undo the conflicting changes?

We introduce gh-mysql-rewind, which combines multiple technologies to achieve auto-resolution of data divergence. This presentation explains how gh-mysql-rewind works, and how it is being tested in production to validate its operation.

Shlomi Noach

February 02, 2019
Tweet

More Decks by Shlomi Noach

Other Decks in Technology

Transcript

  1. About me @github/database-infrastructure Author of orchestrator, gh-ost, freno, ccql and

    others. Blog at http://openark.org 
 github.com/shlomi-noach
 @ShlomiNoach
  2. GitHub
 Built for developers Largest open source hosting 100M+ repositories,

    31M+ developers, 
 2.1M+ organizations Supplier of octocat T-Shirts and stickers
  3. ! ! ! ! ! ! ! ! ! !

    ! ! ! ! ! ! ! ! ! ! ! ! ! ! Failover & split brain
  4. gh-mysql-rewind Rewind a dirty server back in time and connect

    it as a healthy replica in a replication topology.
  5. Iteratively restore hosts into replication topology ! ! ! !

    ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
  6. GTID Each server keeps track of: • gtid_executed: all transactions

    ever executed:
 00020192-1111-1111-1111-111111111111:1-130541 • gtid_purged: of which, some may have been purged:
 00020192-1111-1111-1111-111111111111:1-120008
  7. ! ! ! Binary logs ! ! ! Binary logs

    Split brain: GTID contents ! ! ! ! ! ! Binary logs
  8. Row based replication With binlog_row_image=FULL: Each INSERT, DELETE, UPDATE is

    expressed in the binary log as the complete row image before/after change.
  9. Row based replication BINLOG ' L9MtXBMBAAAANgAAAJAWAAAAAG4AAAAAAAEABG1ldGEACWhlYXJ0YmVhdAACAxEBBgCk9AE9 L9MtXB8BAAAAPAAAAMwWAAAAAG4AAAAAAAEAAgAC///8AQAAAFwt0y4BQYr8AQAAAFwt0y8BWKiT IZN+ '/*!*/; ###

    UPDATE `meta`.`heartbeat` ### WHERE ### @1=1 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1546507054.082314 /* TIMESTAMP(6) meta=6 nullable=0 is_null=0 */ ### SET ### @1=1 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1546507055.088232 /* TIMESTAMP(6) meta=6 nullable=0 is_null=0 */ # at 5836 #190103 11:17:35 server id 1 end_log_pos 5867 CRC32 0x2cf60376 Xid = 114 COMMIT/*!*/;
  10. MariaDB, flashback Developed by Alibaba Contributed to MySQL and MariaDB

    Implemented in MariaDB’s mysqlbinlog: • mysqlbinlog --flashback
  11. flashback example, pseudo code insert(1, 'a') insert(2, 'b') insert(3, 'c')

    update(2, 'b')->(2, 'second') update(3, 'c')->(3, 'third') insert(4, 'd') delete(1, 'a') insert(1, 'a') delete(4, 'd') update(3, 'third')->(3, 'c') update(2, 'second')->(2, 'b') delete(3, 'c') delete(2, 'b') delete(1, 'a')
  12. rewind GTID provides information on “what diverged”. flashback provides the

    mechanics to undo changes. Now, what exactly do we need to rewind?
  13. ! ! ! Binary logs Finding bad transactions in binlogs

    mysql-bin.0000620 mysql-bin.0000621 mysql-bin.0000622 mysql-bin.0000623
  14. Finding bad transactions in binlogs # at 4 #190103 11:17:14

    server id 1 end_log_pos 123 CRC32 0x60730f6b Start: binlog v 4, server v 5.7.17-log created 190103 11:17:14 at startup ROLLBACK/*!*/; BINLOG ' GtMtXA8BAAAAdwAAAHsAAAAAAAQANS43LjE3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAa0y1cEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA AWsPc2A= '/*!*/; # at 123 #190103 11:17:14 server id 1 end_log_pos 194 CRC32 0x48f18c88 Previous-GTIDs # 00020192-1111-1111-1111-111111111111:1-59023 # at 194
  15. gh-mysql-rewind: preparation • Identifies the bad GTID transactions. • Identifies

    which binary logs contain those transactions. • Generates flashback per-binlog
  16. gh-mysql-rewind: inject GTID MariaDB/flashback do not speak MySQL-GTID. We use

    some good old awk hacking to inject dummy GTID entries. mysqlbinlog —flashback … | 
 awk 'BEGIN {"uuidgen -r" |& getline u} /^BEGIN/ {c += 1 ; print "SET @@SESSION.GTID_NEXT= \x27" u ":" c "\x27/*!*/;"} {print}' | sed -e s/', @@session.check_constraint_checks=1//g’
  17. gh-mysql-rewind: apply cat all this into the broken MySQL server.

    We have now reverted the server into some consistent point in history, at or (more likely) before the split. !
  18. gh-mysql-rewind: back in time Have we just made things even

    worse? Where are we? gtid_executed is a mess! What happens if we reconnect the server into the topology? ! ! ! Binary logs
  19. Limitations Cannot rewind DDL JSON, POINT not supported Full binlog

    rollback means more recovery time Local run on each server One server at a time • With careful planning, the operation could be applied on a single server and propagate to its replicas.
  20. Testing Ideally: checksum entire data, contaminate, rewind, checksum again. But…

    • Data set is too big, tests would take days • How can we predict where (in history) gh-mysql-rewind will drop us at?
  21. ! ! ! ! ! ! STOP SLAVE CHANGE MASTER

    TO MASTER_AUTO_POSITION=1 Rotates relay logs FLUSH BINARY LOGS Opens a new binary log We predict that gh-mysql-rewind will drop us back into current position. Testing, a dedicated replica
  22. START SLAVE IO_THREAD Sleep 30sec STOP SLAVE We aggregate 30sec

    of production traffic in a relay log. Testing: grab production data ! Relay log
  23. We parse the relay log and list affected tables. We

    use all “small” tables, And a single “large” table. We checksum all these tables. Testing: analyze production data ! Relay log
  24. ! START SLAVE SQL_THREAD; SELECT MASTER_POS_WAIT(<relaylog end coordinates>) We apply

    the entire content of the relay log Testing: apply production data Relay log Binary log
  25. ! For each table, we issue: DELETE FROM <t> ORDER

    BY id DESC LIMIT 10 DELETE FROM <t> LIMIT 10 And to make things worse: START SLAVE Sleep 30sec STOP SLAVE Testing: contaminate Relay log Binary log
  26. Testing: gh-mysql-rewind Connect to a healthy server. gh-mysql-rewind will rewind

    the dirty writes, and past those changes. Our choice of 30sec was intentional: it will rewind back at our FLUSH point. ! ! ! ! ! !
  27. Testing: iterate! The replica is back in the topology. Wait

    for it to catch up, run again, and again, and again. Complementary test exists to confirm our “point of arrival” calculation is correct, however, replication success makes it redundant. ! ! ! ! ! !
  28. Expectations To never use gh-mysql-rewind. To rewind & join replication

    within minutes as opposed to hours. Smaller binary logs can help.
  29. Status Shell script. To be released shortly to the community.

    At some stage integrate into orchestrator. But for that, we will need remote access.