[php[tek] 2013] MySQL High Availability, Disaster Recovery and Load Balancing

MySQL HA, DR and Load Balancing 1 Friday, May 17,
13

•Engineer at Engine Yard on the Community Team •Author of
Zend PHP 5 Certification Study Guide, Sitepoints PHP Anthology: 101 Essential Tips, Tricks & Hacks & PHP Master: Write Cutting Edge Code •A contributor to Zend Framework, phpdoc, FRAPI and PHP internals •@dshafik Davey Shafik 2 Friday, May 17, 13

• PHP Women • MySQL Ace (10+ Years) • @lig
• Full Disclosure Ligaya Turmelle 3 Friday, May 17, 13

High Availability (HA) • Not: servers that don’t die •
Ensures service availability, not server availability 4 Friday, May 17, 13

Disaster Recovery (DR) • Process for recovering from different failure
scenarios • Automated or not (or in-between) • Planning • Something to recover from • Something to recover to 5 Friday, May 17, 13

Load Balancing • Balancing the load between multiple machines •
Smart (based on system resources) • Dumb (round-robin) • ELB is easy, but slow • HAProxy is super faster (and still pretty easy) 6 Friday, May 17, 13

Why? • Curiosity — MySQL admin is fun • Goal:
Make system predictable — then it can be automated (some) • Impacts of design decisions • Percona Cluster: oooh shiny! 7 Friday, May 17, 13

What • HA + Performance • Minimal code impact (drop-in
system) • Simple (less moving parts) 8 Friday, May 17, 13

MySQL 5.5 9 Friday, May 17, 13

Semi-Sync Replication • Improved Data Integrity • Master blocks until
1 slave has written to relay log, ﬂushed to disk & acknowledged • Slower... a little. • Failover to asynchronous replication 10 Friday, May 17, 13

11 Friday, May 17, 13

Semi-Sync Master Setup •INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so'; •rpl_semi_sync_master_enabled =
{0|1}; •rpl_semi_sync_master_timeout = N; 12 Friday, May 17, 13

Semi-Sync Slave Setup •INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; •rpl_semi_sync_slave_enabled =
{0| 1}; 13 Friday, May 17, 13

Disaster Recovery • Primary slave is pre-determined failover master •
Will have the most up-to-date content (not necessarily ALL the content) • Slaves must re-connect to new master • CHANGE MASTER to correct (new) position • Switch app to new master 14 Friday, May 17, 13

Performance • Benchmarker: 1 XL AWS Instance • Master: 1
AWS Medium Instance • Slaves: 3 AWS Large Instances (one semisync) • HAProxy: 1 AWS Large Instance 15 Friday, May 17, 13

Benchmarks • Sysbench OLTP Benchmark • 1.5 Million Rows •
64 Connection • InnoDB Storage Engine • 50,000 Queries • Optionally: Read-only 16 Friday, May 17, 13

Performance • Writing 1.5 Million Records, with indexes: • 1m
13s seconds (replicated) • Sysbench OLTP Benchmark: • Read/Write (with semi-sync): 2900/qps • Read/Write (without semi-sync): 2850/qps • Read only: 10850/qps 17 Friday, May 17, 13

MySQL 5.6 18 Friday, May 17, 13

Global Transaction IDs (GTIDs) • Unique identiﬁer across all server
(binlogs) • combines server UUID:Transaction ID • Transaction based • No need for log ﬁles or positions with CHANGE MASTER • Consistent group-wide • Easier failover 19 Friday, May 17, 13

GTID Master & Slave Setup • gtid-mode = ON •
log-bin • enforce-gtid-consistency = ON • log-slave-updates = ON 21 Friday, May 17, 13

GTID Slave Setup • STOP SLAVE; • CHANGE MASTER TO
MASTER_HOST='<host> ', MASTER_USER='<user>', MASTER_PASSWORD='<password>', MASTER_AUTO_POSITION=1; • START SLAVE; 22 Friday, May 17, 13

Disaster Recovery: mysqlfailover • Performs replication health monitoring • handles
automatic failover • automatic !== immediate (interval in seconds >= 5s) • Hook scripts for failed check, before/after failover, after complete recovery, 23 Friday, May 17, 13

Hooks • Failed check: • Alert somebody • Set app
to maintenance • Before failover: • Alert somebody • Log Something • Take backups 24 Friday, May 17, 13

Hooks • After Failover: • Alert Somebody • After complete
recovery: • Modify Application Conﬁg • Switchover elastic IPs (amazon) • Verify Application • Alert Somebody! 25 Friday, May 17, 13

Performance • Benchmarker: 1 XL AWS Instance • Master: 1
AWS Medium Instance • Slaves: 3 AWS Large Instances • HAProxy: 1 AWS Large Instance 26 Friday, May 17, 13

Performance • Writing 1.5 Million Records, with indexes: • 60
seconds (replicated) • Sysbench OLTP Benchmark: • Read/Write: 2950/qps • Read only: 9500/qps 27 Friday, May 17, 13

Percona XtraDB Cluster 28 Friday, May 17, 13

Multi-Master Replication • Everything replicates to everything • Uses XtraDB
table engine (innodb compatible) — not NDB • Automatically transfer state to newly introduced nodes • Still has a “primary” server 29 Friday, May 17, 13

Primary Server Setup [mysqld] wsrep_cluster_address=gcomm:// datadir=/var/lib/mysql user=mysql binlog_format=ROW wsrep_provider=/usr/lib/ libgalera_smm.so
wsrep_slave_threads=2 wsrep_cluster_name=benchmarking wsrep_sst_method=xtrabackup wsrep_node_name=<name> innodb_locks_unsafe_for_binlog=1 innodb_autoinc_lock_mode=2 31 Friday, May 17, 13

Other Server Setup [mysqld] wsrep_cluster_address=gcomm://<ip> datadir=/var/lib/mysql user=mysql binlog_format=ROW wsrep_provider=/usr/lib/ libgalera_smm.so
wsrep_slave_threads=2 wsrep_cluster_name=benchmarking wsrep_sst_method=xtrabackup wsrep_node_name=<name> innodb_locks_unsafe_for_binlog=1 innodb_autoinc_lock_mode=2 32 Friday, May 17, 13

Disaster Recovery • Choose a new Primary server • Update
conﬁg • Restart all servers 33 Friday, May 17, 13

Performance • Unable to test! • More than 4 concurrent
“users” resulted in auto increment primary key collisions • Unable to complete the benchmark at all with >= 16 threads •ALERT: failed to execute mysql_stmt_execute(): Err1062 Duplicate entry '500519' for key 'PRIMARY' FATAL: database error, exiting... 34 Friday, May 17, 13

Other Issues • Replication was very brittle, restarting the daemon
would sometimes break it • Restarting the server almost always break it • Adding new nodes very often failed • Completely unable to get the system up and running with the latest version for this talk. 35 Friday, May 17, 13

Read/Write Splitting $ pecl install mysqlnd_ms mysqlnd_ms.enable=1 mysqlnd_ms.config_file= /path/to/mysqlnd_ms.ini 36
Friday, May 17, 13

Read/Write Splitting { "myapp": {
"master": { "master_0": { "host": "localhost", "socket": "\/tmp\/mysql.sock" } }, "slave": { "slave_0": { "host": "192.168.2.27", "port": "3306" } } } } 37 Friday, May 17, 13

Read/Write Splitting SELECT * FROM foo; -- Use Slave /*
MYSQLND_MS_SLAVE_SWITCH */SELECT * FROM foo; -- Use Slave SELECT * FROM foo; INSERT INTO bar VALUES ('baz'); -- Use Slave! /* MYSQLND_MS_MASTER_SWITCH */SELECT * FROM foo; -- Use Master /* MYSQLND_MS_LAST_USED_SWITCH */SELECT * FROM foo; -- Use whatever -- was last used 38 Friday, May 17, 13

Questions? 39 Friday, May 17, 13

Thank You • https://joind.in/8170 • @lig | [email protected] • @dshaﬁk
| [email protected] • http://daveyshaﬁk.com/slides 40 Friday, May 17, 13

Other Resources • Semi-Sync Replication: http://bit.ly/10uT3bc • Replication with GTIDs:
http://bit.ly/ THduPa • MySQL Utilities (include mysqlfailover): http://bit.ly/W98HGC • Percona XtraDB Cluster: http://bit.ly/ RtK6aP 41 Friday, May 17, 13

[php[tek] 2013] MySQL High Availability, Disast...

[php[tek] 2013] MySQL High Availability, Disaster Recovery and Load Balancing

More Decks by Davey Shafik

Other Decks in Programming

Featured

Transcript