Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Un-split brain MySQL

Shlomi Noach
February 02, 2019

Un-split brain MySQL

Is there a way to amend a MySQL split brain scenario? When two servers are diverged from each other, is it possible to identify and undo the conflicting changes?

We introduce gh-mysql-rewind, which combines multiple technologies to achieve auto-resolution of data divergence. This presentation explains how gh-mysql-rewind works, and how it is being tested in production to validate its operation.

Shlomi Noach

February 02, 2019
Tweet

More Decks by Shlomi Noach

Other Decks in Technology

Transcript

  1. MySQL Un-split brain

    aka Move Back in Time
    via gh-mysql-rewind
    Shlomi Noach
    GitHub
    FOSDEM 2019

    View Slide

  2. About me
    @github/database-infrastructure
    Author of orchestrator, gh-ost, freno, ccql
    and others.
    Blog at http://openark.org

    github.com/shlomi-noach

    @ShlomiNoach

    View Slide

  3. GitHub

    Built for developers
    Largest open source hosting
    100M+ repositories, 31M+ developers, 

    2.1M+ organizations
    Supplier of octocat T-Shirts and stickers

    View Slide

  4. Incentive
    Expected MySQL split brain, with unexpected requirement.

    View Slide

  5. ! !
    !
    !
    !
    !
    ! !
    !
    !
    !
    !
    ! !
    !
    !
    !
    !
    ! !
    !
    !
    !
    !
    Failover & split brain

    View Slide

  6. Incentive
    The time it took to restore data on the demoted topology

    View Slide

  7. Incentive
    Could we just roll back the changes and move back in time?

    View Slide

  8. gh-mysql-rewind
    Rewind a dirty server back in time and connect it as a healthy
    replica in a replication topology.

    View Slide

  9. Iteratively restore hosts into replication topology
    ! !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !

    View Slide

  10. gh-mysql-rewind
    How?

    View Slide

  11. GTID
    The good parts

    View Slide

  12. GTID
    Each server keeps track of:
    • gtid_executed: all transactions ever executed:

    00020192-1111-1111-1111-111111111111:1-130541
    • gtid_purged: of which, some may have been purged:

    00020192-1111-1111-1111-111111111111:1-120008

    View Slide

  13. !
    !
    !
    Binary logs
    ! !
    !
    Binary logs
    Split brain: GTID contents
    ! !
    !
    !
    !
    !
    Binary logs

    View Slide

  14. 00020192-1111-1111-1111-111111111111:1-5042
    Split brain: gtid_executed
    gtid_executed on demoted master:

    View Slide

  15. 00020192-1111-1111-1111-111111111111:1-5000,

    00020193-2222-2222-2222-222222222222:1-200
    Split brain: gtid_executed
    gtid_executed on promoted master:

    View Slide

  16. 00020192-1111-1111-1111-111111111111:1-5042
    -
    00020192-1111-1111-1111-111111111111:5001-5042
    00020192-1111-1111-1111-111111111111:1-5000,

    00020193-2222-2222-2222-222222222222:1-200
    Identifying bad GTID transactions
    gtid_executed (demoted) - gtid_executed (promoted)

    View Slide

  17. Row based replication
    With binlog_row_image=FULL:
    Each INSERT, DELETE, UPDATE is expressed in the binary log
    as the complete row image before/after change.

    View Slide

  18. Row based replication
    BINLOG '
    L9MtXBMBAAAANgAAAJAWAAAAAG4AAAAAAAEABG1ldGEACWhlYXJ0YmVhdAACAxEBBgCk9AE9
    L9MtXB8BAAAAPAAAAMwWAAAAAG4AAAAAAAEAAgAC///8AQAAAFwt0y4BQYr8AQAAAFwt0y8BWKiT
    IZN+
    '/*!*/;
    ### UPDATE `meta`.`heartbeat`
    ### WHERE
    ### @1=1 /* INT meta=0 nullable=0 is_null=0 */
    ### @2=1546507054.082314 /* TIMESTAMP(6) meta=6 nullable=0 is_null=0 */
    ### SET
    ### @1=1 /* INT meta=0 nullable=0 is_null=0 */
    ### @2=1546507055.088232 /* TIMESTAMP(6) meta=6 nullable=0 is_null=0 */
    # at 5836
    #190103 11:17:35 server id 1 end_log_pos 5867 CRC32 0x2cf60376 Xid = 114
    COMMIT/*!*/;

    View Slide

  19. MariaDB, flashback
    Developed by Alibaba
    Contributed to MySQL and MariaDB
    Implemented in MariaDB’s mysqlbinlog:
    • mysqlbinlog --flashback

    View Slide

  20. flashback example, pseudo code
    insert(1, 'a')
    insert(2, 'b')
    insert(3, 'c')
    update(2, 'b')->(2, 'second')
    update(3, 'c')->(3, 'third')
    insert(4, 'd')
    delete(1, 'a')
    insert(1, 'a')
    delete(4, 'd')
    update(3, 'third')->(3, 'c')
    update(2, 'second')->(2, 'b')
    delete(3, 'c')
    delete(2, 'b')
    delete(1, 'a')

    View Slide

  21. rewind
    GTID provides information on “what diverged”.
    flashback provides the mechanics to undo changes.
    Now, what exactly do we need to rewind?

    View Slide

  22. Time axis
    !
    !

    View Slide

  23. Rewind to split point
    !
    !

    View Slide

  24. Rewind beyond split point
    !
    !

    View Slide

  25. ! !
    !
    Binary logs
    Finding bad transactions in binlogs
    mysql-bin.0000620
    mysql-bin.0000621
    mysql-bin.0000622
    mysql-bin.0000623

    View Slide

  26. Finding bad transactions in binlogs
    # at 4
    #190103 11:17:14 server id 1 end_log_pos 123 CRC32 0x60730f6b Start: binlog v 4, server v 5.7.17-log created 190103 11:17:14 at
    startup
    ROLLBACK/*!*/;
    BINLOG '
    GtMtXA8BAAAAdwAAAHsAAAAAAAQANS43LjE3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAa0y1cEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
    AWsPc2A=
    '/*!*/;
    # at 123
    #190103 11:17:14 server id 1 end_log_pos 194 CRC32 0x48f18c88 Previous-GTIDs
    # 00020192-1111-1111-1111-111111111111:1-59023
    # at 194

    View Slide

  27. gh-mysql-rewind: preparation
    • Identifies the bad GTID transactions.
    • Identifies which binary logs contain those transactions.
    • Generates flashback per-binlog

    View Slide

  28. gh-mysql-rewind: inject GTID
    MariaDB/flashback do not speak MySQL-GTID.
    We use some good old awk hacking to inject dummy GTID
    entries.
    mysqlbinlog —flashback … | 

    awk 'BEGIN {"uuidgen -r" |& getline u} /^BEGIN/
    {c += 1 ; print "SET @@SESSION.GTID_NEXT=
    \x27" u ":" c "\x27/*!*/;"} {print}' | sed -e s/',
    @@session.check_constraint_checks=1//g’

    View Slide

  29. gh-mysql-rewind: apply
    cat all this into the broken MySQL server.
    We have now reverted the server into some consistent point in
    history, at or (more likely) before the split.
    !

    View Slide

  30. gh-mysql-rewind: back in time
    Have we just made things even worse?
    Where are we?
    gtid_executed is a mess!
    What happens if we reconnect the
    server into the topology?
    ! !
    !
    Binary logs

    View Slide

  31. 00020192-1111-1111-1111-111111111111:1-5042
    -
    00020192-1111-1111-1111-111111111111:1-4805
    00020192-1111-1111-1111-111111111111:4806-5042
    Computing target GTID
    RESET MASTER

    SET GLOBAL gtid_purged :=

    View Slide

  32. Join topology as healthy replica
    CHANGE MASTER TO

    MASTER_HOST=,

    MASTER_AUTO_POSITION=1



    ! !
    !
    !
    !
    !

    View Slide

  33. Limitations
    Cannot rewind DDL
    JSON, POINT not supported
    Full binlog rollback means more recovery time
    Local run on each server
    One server at a time
    • With careful planning, the operation could be applied on a
    single server and propagate to its replicas.

    View Slide

  34. Testing
    Is this reliable?

    View Slide

  35. Testing
    Continuous testing in production!

    View Slide

  36. Testing
    Ideally: checksum entire data, contaminate, rewind, checksum
    again. But…
    • Data set is too big, tests would take days
    • How can we predict where (in history) gh-mysql-rewind will
    drop us at?

    View Slide

  37. ! !
    !
    !
    !
    !
    STOP SLAVE
    CHANGE MASTER TO MASTER_AUTO_POSITION=1
    Rotates relay logs
    FLUSH BINARY LOGS
    Opens a new binary log
    We predict that gh-mysql-rewind will drop us back into
    current position.
    Testing, a dedicated replica

    View Slide

  38. START SLAVE IO_THREAD
    Sleep 30sec
    STOP SLAVE
    We aggregate 30sec of production traffic in a relay log.
    Testing: grab production data
    !
    Relay log

    View Slide

  39. We parse the relay log and list affected tables.
    We use all “small” tables,
    And a single “large” table.
    We checksum all these tables.
    Testing: analyze production data
    !
    Relay log

    View Slide

  40. !
    START SLAVE SQL_THREAD;
    SELECT MASTER_POS_WAIT(coordinates>)
    We apply the entire content of the relay log
    Testing: apply production data
    Relay log
    Binary log

    View Slide

  41. !
    For each table, we issue:
    DELETE FROM ORDER BY id DESC LIMIT 10
    DELETE FROM LIMIT 10
    And to make things worse:
    START SLAVE
    Sleep 30sec
    STOP SLAVE
    Testing: contaminate
    Relay log
    Binary log

    View Slide

  42. ! Replication is likely to be broken. Data is corrupted.
    A mess.

    View Slide

  43. Testing: gh-mysql-rewind
    Connect to a healthy server.
    gh-mysql-rewind will rewind the dirty writes, and past those
    changes.
    Our choice of 30sec was intentional: it will rewind back at our
    FLUSH point.
    ! !
    !
    !
    !
    !

    View Slide

  44. Testing: checksum
    We checksum all the affected tables again. We expect 100%
    match.
    ! !
    !
    !
    !
    !

    View Slide

  45. Testing: iterate!
    The replica is back in the topology.
    Wait for it to catch up, run again, and again, and again.
    Complementary test exists to confirm our “point of arrival”
    calculation is correct, however, replication success makes it
    redundant.
    ! !
    !
    !
    !
    !

    View Slide

  46. Expectations
    To never use gh-mysql-rewind.
    To rewind & join replication within minutes as opposed to hours.
    Smaller binary logs can help.

    View Slide

  47. Status
    Shell script.
    To be released shortly to the community.
    At some stage integrate into orchestrator.
    But for that, we will need remote access.

    View Slide

  48. In development at MySQL
    Query & manipulate GTID over the client protocol:

    View Slide

  49. Questions?
    github.com/shlomi-noach
    @ShlomiNoach
    Thank you!

    View Slide