Slide 1

Slide 1 text

MySQL HA, DR and Load Balancing 1 Friday, May 17, 13

Slide 2

Slide 2 text

•Engineer at Engine Yard on the Community Team •Author of Zend PHP 5 Certification Study Guide, Sitepoints PHP Anthology: 101 Essential Tips, Tricks & Hacks & PHP Master: Write Cutting Edge Code •A contributor to Zend Framework, phpdoc, FRAPI and PHP internals •@dshafik Davey Shafik 2 Friday, May 17, 13

Slide 3

Slide 3 text

• PHP Women • MySQL Ace (10+ Years) • @lig • Full Disclosure Ligaya Turmelle 3 Friday, May 17, 13

Slide 4

Slide 4 text

High Availability (HA) • Not: servers that don’t die • Ensures service availability, not server availability 4 Friday, May 17, 13

Slide 5

Slide 5 text

Disaster Recovery (DR) • Process for recovering from different failure scenarios • Automated or not (or in-between) • Planning • Something to recover from • Something to recover to 5 Friday, May 17, 13

Slide 6

Slide 6 text

Load Balancing • Balancing the load between multiple machines • Smart (based on system resources) • Dumb (round-robin) • ELB is easy, but slow • HAProxy is super faster (and still pretty easy) 6 Friday, May 17, 13

Slide 7

Slide 7 text

Why? • Curiosity — MySQL admin is fun • Goal: Make system predictable — then it can be automated (some) • Impacts of design decisions • Percona Cluster: oooh shiny! 7 Friday, May 17, 13

Slide 8

Slide 8 text

What • HA + Performance • Minimal code impact (drop-in system) • Simple (less moving parts) 8 Friday, May 17, 13

Slide 9

Slide 9 text

MySQL 5.5 9 Friday, May 17, 13

Slide 10

Slide 10 text

Semi-Sync Replication • Improved Data Integrity • Master blocks until 1 slave has written to relay log, flushed to disk & acknowledged • Slower... a little. • Failover to asynchronous replication 10 Friday, May 17, 13

Slide 11

Slide 11 text

11 Friday, May 17, 13

Slide 12

Slide 12 text

Semi-Sync Master Setup •INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so'; •rpl_semi_sync_master_enabled = {0|1}; •rpl_semi_sync_master_timeout = N; 12 Friday, May 17, 13

Slide 13

Slide 13 text

Semi-Sync Slave Setup •INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; •rpl_semi_sync_slave_enabled = {0| 1}; 13 Friday, May 17, 13

Slide 14

Slide 14 text

Disaster Recovery • Primary slave is pre-determined failover master • Will have the most up-to-date content (not necessarily ALL the content) • Slaves must re-connect to new master • CHANGE MASTER to correct (new) position • Switch app to new master 14 Friday, May 17, 13

Slide 15

Slide 15 text

Performance • Benchmarker: 1 XL AWS Instance • Master: 1 AWS Medium Instance • Slaves: 3 AWS Large Instances (one semi- sync) • HAProxy: 1 AWS Large Instance 15 Friday, May 17, 13

Slide 16

Slide 16 text

Benchmarks • Sysbench OLTP Benchmark • 1.5 Million Rows • 64 Connection • InnoDB Storage Engine • 50,000 Queries • Optionally: Read-only 16 Friday, May 17, 13

Slide 17

Slide 17 text

Performance • Writing 1.5 Million Records, with indexes: • 1m 13s seconds (replicated) • Sysbench OLTP Benchmark: • Read/Write (with semi-sync): 2900/qps • Read/Write (without semi-sync): 2850/qps • Read only: 10850/qps 17 Friday, May 17, 13

Slide 18

Slide 18 text

MySQL 5.6 18 Friday, May 17, 13

Slide 19

Slide 19 text

Global Transaction IDs (GTIDs) • Unique identifier across all server (binlogs) • combines server UUID:Transaction ID • Transaction based • No need for log files or positions with CHANGE MASTER • Consistent group-wide • Easier failover 19 Friday, May 17, 13

Slide 20

Slide 20 text

20 Friday, May 17, 13

Slide 21

Slide 21 text

GTID Master & Slave Setup • gtid-mode = ON • log-bin • enforce-gtid-consistency = ON • log-slave-updates = ON 21 Friday, May 17, 13

Slide 22

Slide 22 text

GTID Slave Setup • STOP SLAVE; • CHANGE MASTER TO MASTER_HOST=' ', MASTER_USER='', MASTER_PASSWORD='', MASTER_AUTO_POSITION=1; • START SLAVE; 22 Friday, May 17, 13

Slide 23

Slide 23 text

Disaster Recovery: mysqlfailover • Performs replication health monitoring • handles automatic failover • automatic !== immediate (interval in seconds >= 5s) • Hook scripts for failed check, before/after failover, after complete recovery, 23 Friday, May 17, 13

Slide 24

Slide 24 text

Hooks • Failed check: • Alert somebody • Set app to maintenance • Before failover: • Alert somebody • Log Something • Take backups 24 Friday, May 17, 13

Slide 25

Slide 25 text

Hooks • After Failover: • Alert Somebody • After complete recovery: • Modify Application Config • Switchover elastic IPs (amazon) • Verify Application • Alert Somebody! 25 Friday, May 17, 13

Slide 26

Slide 26 text

Performance • Benchmarker: 1 XL AWS Instance • Master: 1 AWS Medium Instance • Slaves: 3 AWS Large Instances • HAProxy: 1 AWS Large Instance 26 Friday, May 17, 13

Slide 27

Slide 27 text

Performance • Writing 1.5 Million Records, with indexes: • 60 seconds (replicated) • Sysbench OLTP Benchmark: • Read/Write: 2950/qps • Read only: 9500/qps 27 Friday, May 17, 13

Slide 28

Slide 28 text

Percona XtraDB Cluster 28 Friday, May 17, 13

Slide 29

Slide 29 text

Multi-Master Replication • Everything replicates to everything • Uses XtraDB table engine (innodb compatible) — not NDB • Automatically transfer state to newly introduced nodes • Still has a “primary” server 29 Friday, May 17, 13

Slide 30

Slide 30 text

30 Friday, May 17, 13

Slide 31

Slide 31 text

Primary Server Setup [mysqld] wsrep_cluster_address=gcomm:// datadir=/var/lib/mysql user=mysql binlog_format=ROW wsrep_provider=/usr/lib/ libgalera_smm.so wsrep_slave_threads=2 wsrep_cluster_name=benchmarking wsrep_sst_method=xtrabackup wsrep_node_name= innodb_locks_unsafe_for_binlog=1 innodb_autoinc_lock_mode=2 31 Friday, May 17, 13

Slide 32

Slide 32 text

Other Server Setup [mysqld] wsrep_cluster_address=gcomm:// datadir=/var/lib/mysql user=mysql binlog_format=ROW wsrep_provider=/usr/lib/ libgalera_smm.so wsrep_slave_threads=2 wsrep_cluster_name=benchmarking wsrep_sst_method=xtrabackup wsrep_node_name= innodb_locks_unsafe_for_binlog=1 innodb_autoinc_lock_mode=2 32 Friday, May 17, 13

Slide 33

Slide 33 text

Disaster Recovery • Choose a new Primary server • Update config • Restart all servers 33 Friday, May 17, 13

Slide 34

Slide 34 text

Performance • Unable to test! • More than 4 concurrent “users” resulted in auto increment primary key collisions • Unable to complete the benchmark at all with >= 16 threads •ALERT: failed to execute mysql_stmt_execute(): Err1062 Duplicate entry '500519' for key 'PRIMARY' FATAL: database error, exiting... 34 Friday, May 17, 13

Slide 35

Slide 35 text

Other Issues • Replication was very brittle, restarting the daemon would sometimes break it • Restarting the server almost always break it • Adding new nodes very often failed • Completely unable to get the system up and running with the latest version for this talk. 35 Friday, May 17, 13

Slide 36

Slide 36 text

Read/Write Splitting $ pecl install mysqlnd_ms mysqlnd_ms.enable=1 mysqlnd_ms.config_file= /path/to/mysqlnd_ms.ini 36 Friday, May 17, 13

Slide 37

Slide 37 text

Read/Write Splitting {        "myapp":  {                "master":  {                        "master_0":  {                                "host":  "localhost",                                "socket":  "\/tmp\/mysql.sock"                        }                },                "slave":  {                        "slave_0":  {                                "host":  "192.168.2.27",                                "port":  "3306"                        }                }        } } 37 Friday, May 17, 13

Slide 38

Slide 38 text

Read/Write Splitting SELECT * FROM foo; -- Use Slave /* MYSQLND_MS_SLAVE_SWITCH */SELECT * FROM foo; -- Use Slave SELECT * FROM foo; INSERT INTO bar VALUES ('baz'); -- Use Slave! /* MYSQLND_MS_MASTER_SWITCH */SELECT * FROM foo; -- Use Master /* MYSQLND_MS_LAST_USED_SWITCH */SELECT * FROM foo; -- Use whatever -- was last used 38 Friday, May 17, 13

Slide 39

Slide 39 text

Questions? 39 Friday, May 17, 13

Slide 40

Slide 40 text

Thank You • https://joind.in/8170 • @lig | [email protected] • @dshafik | [email protected] • http://daveyshafik.com/slides 40 Friday, May 17, 13

Slide 41

Slide 41 text

Other Resources • Semi-Sync Replication: http://bit.ly/10uT3bc • Replication with GTIDs: http://bit.ly/ THduPa • MySQL Utilities (include mysqlfailover): http://bit.ly/W98HGC • Percona XtraDB Cluster: http://bit.ly/ RtK6aP 41 Friday, May 17, 13