Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[php[tek] 2013] MySQL High Availability, Disaster Recovery and Load Balancing

[php[tek] 2013] MySQL High Availability, Disaster Recovery and Load Balancing

Using Oracle MySQL Server and Percona XtraDB Cluster we will take a look at several possible replication configurations for MySQL high-availability, disaster recovery and load-balancing.

After comparing them for administration ease, scalability of reads and writes, robustness of the system for errors and time for data recovery as well as looking at how they impact your code-layer we will discuss the ins-and-outs of each setup, looking at the pros and cons, and what trade-offs you will need to make depending on your needs.

Note: This talk was co-presented with Ligaya Turmelle

Davey Shafik

May 17, 2013
Tweet

More Decks by Davey Shafik

Other Decks in Programming

Transcript

  1. MySQL HA, DR and
    Load Balancing
    1
    Friday, May 17, 13

    View Slide

  2. •Engineer at Engine Yard on the
    Community Team
    •Author of Zend PHP 5 Certification
    Study Guide, Sitepoints PHP
    Anthology: 101 Essential Tips, Tricks
    & Hacks & PHP Master: Write
    Cutting Edge Code
    •A contributor to Zend Framework,
    phpdoc, FRAPI and PHP internals
    •@dshafik
    Davey Shafik
    2
    Friday, May 17, 13

    View Slide

  3. • PHP Women
    • MySQL Ace (10+ Years)
    • @lig
    • Full Disclosure
    Ligaya Turmelle
    3
    Friday, May 17, 13

    View Slide

  4. High Availability (HA)
    • Not: servers that don’t die
    • Ensures service availability, not server
    availability
    4
    Friday, May 17, 13

    View Slide

  5. Disaster Recovery
    (DR)
    • Process for recovering from different
    failure scenarios
    • Automated or not (or in-between)
    • Planning
    • Something to recover from
    • Something to recover to
    5
    Friday, May 17, 13

    View Slide

  6. Load Balancing
    • Balancing the load between multiple
    machines
    • Smart (based on system resources)
    • Dumb (round-robin)
    • ELB is easy, but slow
    • HAProxy is super faster (and still pretty
    easy)
    6
    Friday, May 17, 13

    View Slide

  7. Why?
    • Curiosity — MySQL admin is fun
    • Goal: Make system predictable — then it
    can be automated (some)
    • Impacts of design decisions
    • Percona Cluster: oooh shiny!
    7
    Friday, May 17, 13

    View Slide

  8. What
    • HA + Performance
    • Minimal code impact (drop-in system)
    • Simple (less moving parts)
    8
    Friday, May 17, 13

    View Slide

  9. MySQL 5.5
    9
    Friday, May 17, 13

    View Slide

  10. Semi-Sync Replication
    • Improved Data Integrity
    • Master blocks until 1 slave has written to
    relay log, flushed to disk & acknowledged
    • Slower... a little.
    • Failover to asynchronous replication
    10
    Friday, May 17, 13

    View Slide

  11. 11
    Friday, May 17, 13

    View Slide

  12. Semi-Sync Master Setup
    •INSTALL PLUGIN rpl_semi_sync_master
    SONAME 'semisync_master.so';
    •rpl_semi_sync_master_enabled = {0|1};
    •rpl_semi_sync_master_timeout = N;
    12
    Friday, May 17, 13

    View Slide

  13. Semi-Sync Slave Setup
    •INSTALL PLUGIN rpl_semi_sync_slave
    SONAME 'semisync_slave.so';
    •rpl_semi_sync_slave_enabled = {0|
    1};
    13
    Friday, May 17, 13

    View Slide

  14. Disaster Recovery
    • Primary slave is pre-determined
    failover master
    • Will have the most up-to-date content
    (not necessarily ALL the content)
    • Slaves must re-connect to new master
    • CHANGE MASTER to correct (new)
    position
    • Switch app to new master
    14
    Friday, May 17, 13

    View Slide

  15. Performance
    • Benchmarker: 1 XL AWS Instance
    • Master: 1 AWS Medium Instance
    • Slaves: 3 AWS Large Instances (one semi-
    sync)
    • HAProxy: 1 AWS Large Instance
    15
    Friday, May 17, 13

    View Slide

  16. Benchmarks
    • Sysbench OLTP Benchmark
    • 1.5 Million Rows
    • 64 Connection
    • InnoDB Storage Engine
    • 50,000 Queries
    • Optionally: Read-only
    16
    Friday, May 17, 13

    View Slide

  17. Performance
    • Writing 1.5 Million Records, with indexes:
    • 1m 13s seconds (replicated)
    • Sysbench OLTP Benchmark:
    • Read/Write (with semi-sync): 2900/qps
    • Read/Write (without semi-sync): 2850/qps
    • Read only: 10850/qps
    17
    Friday, May 17, 13

    View Slide

  18. MySQL 5.6
    18
    Friday, May 17, 13

    View Slide

  19. Global Transaction IDs
    (GTIDs)
    • Unique identifier across all server (binlogs)
    • combines server UUID:Transaction ID
    • Transaction based
    • No need for log files or positions with
    CHANGE MASTER
    • Consistent group-wide
    • Easier failover
    19
    Friday, May 17, 13

    View Slide

  20. 20
    Friday, May 17, 13

    View Slide

  21. GTID Master & Slave
    Setup
    • gtid-mode = ON
    • log-bin
    • enforce-gtid-consistency = ON
    • log-slave-updates = ON
    21
    Friday, May 17, 13

    View Slide

  22. GTID Slave Setup
    • STOP SLAVE;
    • CHANGE MASTER TO
    MASTER_HOST=' ',
    MASTER_USER='',
    MASTER_PASSWORD='',
    MASTER_AUTO_POSITION=1;
    • START SLAVE;
    22
    Friday, May 17, 13

    View Slide

  23. Disaster Recovery:
    mysqlfailover
    • Performs replication health monitoring
    • handles automatic failover
    • automatic !== immediate (interval in
    seconds >= 5s)
    • Hook scripts for failed check, before/after
    failover, after complete recovery,
    23
    Friday, May 17, 13

    View Slide

  24. Hooks
    • Failed check:
    • Alert somebody
    • Set app to maintenance
    • Before failover:
    • Alert somebody
    • Log Something
    • Take backups
    24
    Friday, May 17, 13

    View Slide

  25. Hooks
    • After Failover:
    • Alert Somebody
    • After complete recovery:
    • Modify Application Config
    • Switchover elastic IPs (amazon)
    • Verify Application
    • Alert Somebody!
    25
    Friday, May 17, 13

    View Slide

  26. Performance
    • Benchmarker: 1 XL AWS Instance
    • Master: 1 AWS Medium Instance
    • Slaves: 3 AWS Large Instances
    • HAProxy: 1 AWS Large Instance
    26
    Friday, May 17, 13

    View Slide

  27. Performance
    • Writing 1.5 Million Records, with indexes:
    • 60 seconds (replicated)
    • Sysbench OLTP Benchmark:
    • Read/Write: 2950/qps
    • Read only: 9500/qps
    27
    Friday, May 17, 13

    View Slide

  28. Percona XtraDB
    Cluster
    28
    Friday, May 17, 13

    View Slide

  29. Multi-Master
    Replication
    • Everything replicates to everything
    • Uses XtraDB table engine (innodb
    compatible) — not NDB
    • Automatically transfer state to newly
    introduced nodes
    • Still has a “primary” server
    29
    Friday, May 17, 13

    View Slide

  30. 30
    Friday, May 17, 13

    View Slide

  31. Primary Server Setup
    [mysqld]
    wsrep_cluster_address=gcomm://
    datadir=/var/lib/mysql
    user=mysql
    binlog_format=ROW
    wsrep_provider=/usr/lib/
    libgalera_smm.so
    wsrep_slave_threads=2
    wsrep_cluster_name=benchmarking
    wsrep_sst_method=xtrabackup
    wsrep_node_name=
    innodb_locks_unsafe_for_binlog=1
    innodb_autoinc_lock_mode=2
    31
    Friday, May 17, 13

    View Slide

  32. Other Server Setup
    [mysqld]
    wsrep_cluster_address=gcomm://
    datadir=/var/lib/mysql
    user=mysql
    binlog_format=ROW
    wsrep_provider=/usr/lib/
    libgalera_smm.so
    wsrep_slave_threads=2
    wsrep_cluster_name=benchmarking
    wsrep_sst_method=xtrabackup
    wsrep_node_name=
    innodb_locks_unsafe_for_binlog=1
    innodb_autoinc_lock_mode=2
    32
    Friday, May 17, 13

    View Slide

  33. Disaster Recovery
    • Choose a new Primary server
    • Update config
    • Restart all servers
    33
    Friday, May 17, 13

    View Slide

  34. Performance
    • Unable to test!
    • More than 4 concurrent “users” resulted in
    auto increment primary key collisions
    • Unable to complete the benchmark at all
    with >= 16 threads
    •ALERT: failed to execute
    mysql_stmt_execute(): Err1062
    Duplicate entry
    '500519' for key 'PRIMARY'
    FATAL: database error,
    exiting...
    34
    Friday, May 17, 13

    View Slide

  35. Other Issues
    • Replication was very brittle, restarting the
    daemon would sometimes break it
    • Restarting the server almost always break it
    • Adding new nodes very often failed
    • Completely unable to get the system up
    and running with the latest version for this
    talk.
    35
    Friday, May 17, 13

    View Slide

  36. Read/Write Splitting
    $ pecl install mysqlnd_ms
    mysqlnd_ms.enable=1
    mysqlnd_ms.config_file=
    /path/to/mysqlnd_ms.ini
    36
    Friday, May 17, 13

    View Slide

  37. Read/Write Splitting
    {
           "myapp":  {
                   "master":  {
                           "master_0":  {
                                   "host":  "localhost",
                                   "socket":  "\/tmp\/mysql.sock"
                           }
                   },
                   "slave":  {
                           "slave_0":  {
                                   "host":  "192.168.2.27",
                                   "port":  "3306"
                           }
                   }
           }
    }
    37
    Friday, May 17, 13

    View Slide

  38. Read/Write Splitting
    SELECT * FROM foo; -- Use Slave
    /* MYSQLND_MS_SLAVE_SWITCH */SELECT * FROM foo; -- Use Slave
    SELECT * FROM foo; INSERT INTO bar VALUES ('baz'); -- Use Slave!
    /* MYSQLND_MS_MASTER_SWITCH */SELECT * FROM foo; -- Use Master
    /* MYSQLND_MS_LAST_USED_SWITCH */SELECT * FROM foo; -- Use whatever
    -- was last used
    38
    Friday, May 17, 13

    View Slide

  39. Questions?
    39
    Friday, May 17, 13

    View Slide

  40. Thank You
    • https://joind.in/8170
    • @lig | [email protected]
    • @dshafik | [email protected]
    • http://daveyshafik.com/slides
    40
    Friday, May 17, 13

    View Slide

  41. Other Resources
    • Semi-Sync Replication: http://bit.ly/10uT3bc
    • Replication with GTIDs: http://bit.ly/
    THduPa
    • MySQL Utilities (include mysqlfailover):
    http://bit.ly/W98HGC
    • Percona XtraDB Cluster: http://bit.ly/
    RtK6aP
    41
    Friday, May 17, 13

    View Slide