$30 off During Our Annual Pro Sale. View Details »

[ZendCon 2013] MySQL HA, Recovery and Load Balancing

[ZendCon 2013] MySQL HA, Recovery and Load Balancing

Using MySQL Server and Percona XtraDB Cluster we will take a look at several possible replication configurations for MySQL high-availability, disaster recovery and load-balancing. After comparing them for administration ease, scalability of reads and writes, robustness of the system for errors and time for data recovery as well as looking at how they impact your code-layer we will discuss the ins-and-outs of each setup, looking at the pros and cons, and what trade-offs you will need to make depending on your needs.

Davey Shafik

October 09, 2013
Tweet

More Decks by Davey Shafik

Other Decks in Programming

Transcript

  1. MySQL HA, DR and
    Load Balancing
    Thursday, October 10, 13

    View Slide

  2. •Community Engineer at Engine
    Yard
    •Author of Zend PHP 5
    Certification Study Guide,
    Sitepoints PHP Anthology: 101
    Essential Tips, Tricks & Hacks
    & PHP Master: Write Cutting
    Edge Code
    •A contributor to Zend
    Framework 1 & 2, phpdoc, and
    PHP internals
    •@dshafik
    Davey Shafik
    Thursday, October 10, 13

    View Slide

  3. High Availability
    Thursday, October 10, 13

    View Slide

  4. High Availability (HA)
    • Not: servers that don’t die
    • Ensures service availability, not server availability
    4
    Thursday, October 10, 13

    View Slide

  5. Disaster Recovery
    Thursday, October 10, 13

    View Slide

  6. Disaster Recovery (DR)
    • Process for recovering from different failure
    scenarios
    • Automated or not (or in-between)
    • Planning
    • Something to recover from
    • Something to recover to
    6
    Thursday, October 10, 13

    View Slide

  7. Load Balancing
    Thursday, October 10, 13

    View Slide

  8. Load Balancing
    • Balancing the load between multiple machines
    • Smart (based on system resources)
    • Dumb (round-robin)
    • ELB is easy, but slow
    • HAProxy is super faster (and still pretty easy)
    8
    Thursday, October 10, 13

    View Slide

  9. Why?
    Thursday, October 10, 13

    View Slide

  10. Why?
    • Curiosity — MySQL admin is fun
    • Goal: Make system predictable — then it can be
    automated (some)
    • Impacts of design decisions
    • Percona Cluster: oooh shiny!
    10
    Thursday, October 10, 13

    View Slide

  11. What
    Thursday, October 10, 13

    View Slide

  12. What
    • HA + Performance
    • Minimal code impact (drop-in system)
    • Simple (less moving parts)
    12
    Thursday, October 10, 13

    View Slide

  13. MySQL 5.5
    Thursday, October 10, 13

    View Slide

  14. Semi-Sync Replication
    • Improved Data Integrity
    • Master blocks until 1 slave has written to relay log,
    flushed to disk & acknowledged
    • Slower... a little.
    • Failover to asynchronous replication
    14
    Thursday, October 10, 13

    View Slide

  15. Thursday, October 10, 13

    View Slide

  16. Semi-Sync Master Setup
    INSTALL PLUGIN rpl_semi_sync_master SONAME
    'semisync_master.so';
    rpl_semi_sync_master_enabled = {0|1};
    rpl_semi_sync_master_timeout = N;
    Thursday, October 10, 13

    View Slide

  17. Semi-Sync Slave Setup
    INSTALL PLUGIN rpl_semi_sync_slave SONAME
    'semisync_slave.so';
    rpl_semi_sync_slave_enabled = {0|1};
    Thursday, October 10, 13

    View Slide

  18. Disaster Recovery
    Thursday, October 10, 13

    View Slide

  19. Disaster Recovery
    • Primary slave is pre-determined failover master
    • Will have the most up-to-date content (not necessarily ALL the
    content)
    • Slaves must re-connect to new master
    • CHANGE MASTER to correct (new) position
    • Switch app to new master
    19
    Thursday, October 10, 13

    View Slide

  20. Performance
    Thursday, October 10, 13

    View Slide

  21. Performance
    • Benchmarker: 1 XL AWS Instance
    • Master: 1 AWS Medium Instance
    • Slaves: 3 AWS Large Instances (one semi-sync)
    • HAProxy: 1 AWS Large Instance
    21
    Thursday, October 10, 13

    View Slide

  22. Benchmarks
    • Sysbench OLTP Benchmark
    • 1.5 Million Rows
    • 64 Connection
    • InnoDB Storage Engine
    • 50,000 Queries
    • Optionally: Read-only
    22
    Thursday, October 10, 13

    View Slide

  23. Performance
    • Writing 1.5 Million Records, with indexes:
    • 1m 13s seconds (replicated)
    • Sysbench OLTP Benchmark:
    • Read/Write (with semi-sync): 2850/qps
    • Read/Write (without semi-sync): 2900/qps
    • Read only: 10850/qps
    23
    Thursday, October 10, 13

    View Slide

  24. MySQL 5.6
    Thursday, October 10, 13

    View Slide

  25. Global Transaction IDs (GTIDs)
    • Unique identifier across all server (binlogs)
    • combines server UUID:Transaction ID
    • Transaction based
    • No need for log files or positions with CHANGE
    MASTER
    • Consistent group-wide
    • Easier failover
    25
    Thursday, October 10, 13

    View Slide

  26. Thursday, October 10, 13

    View Slide

  27. GTID Master & Slave Setup
    gtid-mode = ON
    log-bin
    enforce-gtid-consistency = ON
    log-slave-updates = ON
    Thursday, October 10, 13

    View Slide

  28. GTID Slave Setup
    • STOP SLAVE;
    • CHANGE MASTER TO MASTER_HOST=' ',
    MASTER_USER='',
    MASTER_PASSWORD='',
    MASTER_AUTO_POSITION=1;
    • START SLAVE;
    28
    Thursday, October 10, 13

    View Slide

  29. Disaster Recovery
    Thursday, October 10, 13

    View Slide

  30. Disaster Recovery: mysqlfailover
    • Performs replication health monitoring
    • handles automatic failover
    • automatic !== immediate (interval in seconds >= 5s)
    • Hook scripts for failed check, before/after failover,
    after complete recovery,
    30
    Thursday, October 10, 13

    View Slide

  31. Hooks
    • Failed check:
    • Alert somebody
    • Set app to maintenance
    • Before failover:
    • Alert somebody
    • Log Something
    • Take backups
    31
    Thursday, October 10, 13

    View Slide

  32. Hooks
    • After Failover:
    • Alert Somebody
    • After complete recovery:
    • Modify Application Config
    • Switchover elastic IPs (amazon)
    • Verify Application
    • Alert Somebody!
    32
    Thursday, October 10, 13

    View Slide

  33. Performance
    Thursday, October 10, 13

    View Slide

  34. Performance
    • Benchmarker: 1 XL AWS Instance
    • Master: 1 AWS Medium Instance
    • Slaves: 3 AWS Large Instances
    • HAProxy: 1 AWS Large Instance
    34
    Thursday, October 10, 13

    View Slide

  35. Performance
    • Writing 1.5 Million Records, with indexes:
    • 60 seconds (replicated)
    • Sysbench OLTP Benchmark:
    • Read/Write: 2950/qps
    • Read only: 9500/qps
    35
    Thursday, October 10, 13

    View Slide

  36. Percona XtraDB Cluster
    Thursday, October 10, 13

    View Slide

  37. Multi-Master Replication
    • Everything replicates to everything
    • Uses XtraDB table engine (innodb compatible)
    — not NDB
    • Automatically transfer state to newly introduced
    nodes
    • Still has a “primary” server
    • Based on Galera for multi-master writes
    37
    Thursday, October 10, 13

    View Slide

  38. Thursday, October 10, 13

    View Slide

  39. Primary Server Setup
    [mysqld]
    wsrep_cluster_address=gcomm://
    datadir=/var/lib/mysql
    user=mysql
    binlog_format=ROW
    wsrep_provider=/usr/lib/libgalera_smm.so
    wsrep_slave_threads=2
    wsrep_cluster_name=benchmarking
    wsrep_sst_method=xtrabackup
    wsrep_node_name=
    innodb_locks_unsafe_for_binlog=1
    innodb_autoinc_lock_mode=2
    Thursday, October 10, 13

    View Slide

  40. Other Server Setup
    [mysqld]
    wsrep_cluster_address=gcomm://
    datadir=/var/lib/mysql
    user=mysql
    binlog_format=ROW
    wsrep_provider=/usr/lib/libgalera_smm.so
    wsrep_slave_threads=2
    wsrep_cluster_name=benchmarking
    wsrep_sst_method=xtrabackup
    wsrep_node_name=
    innodb_locks_unsafe_for_binlog=1
    innodb_autoinc_lock_mode=2
    Thursday, October 10, 13

    View Slide

  41. Disaster Recovery
    Thursday, October 10, 13

    View Slide

  42. Disaster Recovery
    • Choose a new Primary server
    • Update config
    • Restart all servers
    42
    Thursday, October 10, 13

    View Slide

  43. Performance
    Thursday, October 10, 13

    View Slide

  44. Performance
    • Unable to test!
    • More than 4 concurrent “users” resulted in auto
    increment primary key collisions
    • Unable to complete the benchmark at all with >= 16
    threads
    • ALERT: failed to execute mysql_stmt_execute():
    Err1062 Duplicate entry
    '500519' for key 'PRIMARY'
    FATAL: database error, exiting...
    44
    Thursday, October 10, 13

    View Slide

  45. Other Issues
    Thursday, October 10, 13

    View Slide

  46. Other Issues
    • Replication was very brittle, restarting the daemon
    would sometimes break it
    • Restarting the server almost always break it
    • Adding new nodes very often failed
    • Completely unable to get the system up and running
    with the latest version for this talk.
    46
    Thursday, October 10, 13

    View Slide

  47. Read/Write Splitting
    mysqlnd_ms
    Thursday, October 10, 13

    View Slide

  48. Read/Write Splitting
    $ pecl install mysqlnd_ms
    mysqlnd_ms.enable=1
    mysqlnd_ms.config_file=
    /path/to/mysqlnd_ms.json
    Thursday, October 10, 13

    View Slide

  49. Read/Write Splitting
    {
    "myapp": {
    "master": {
    "master_0": {
    "host": "localhost",
    "socket": "\/tmp\/mysql.sock"
    }
    },
    "slave": {
    "slave_0": {
    "host": "192.168.2.27",
    "port": "3306"
    }
    }
    }
    }
    Thursday, October 10, 13

    View Slide

  50. Read/Write Splitting
    SELECT * FROM foo; -- Use Slave
    /* MYSQLND_MS_SLAVE_SWITCH */SELECT * FROM foo; -- Use Slave
    SELECT * FROM foo; INSERT INTO bar VALUES ('baz'); -- Use Slave!
    /* MYSQLND_MS_MASTER_SWITCH */SELECT * FROM foo; -- Use Master
    /* MYSQLND_MS_LAST_USED_SWITCH */SELECT * FROM foo; -- Use whatever
    -- was last used
    Thursday, October 10, 13

    View Slide

  51. Other Resources
    • Semi-Sync Replication:
    • http://ey.io/18PllQP
    • Replication with GTIDs:
    • http://ey.io/15nTo20
    • MySQL Utilities (include mysqlfailover):
    • http://ey.io/GGP2Kh
    • Percona XtraDB Cluster:
    • http://ey.io/1gqyyUn
    51
    http://ey.io/zc13-mysql
    Thursday, October 10, 13

    View Slide

  52. Feedback & Questions:
    Feedback: https://joind.in/9093
    Twitter: @dshafik
    Email: [email protected]
    http://ey.io/zc13-mysql
    Thursday, October 10, 13

    View Slide