Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ensuring MySQL High Availability at Scale

Ensuring MySQL High Availability at Scale

This presentation was given by ProxySQL CEO René Cannaò on May 14, 2020 at the "MySQL Best Practices for High Performance, Scalability & High Availability" joint online meetup with GridGain Systems.

The talk abstract is as follows:
"High availability means that your data is properly replicated with its consistency and security in mind. Modern HA solutions also come with built-in failover scenarios. Clustering solutions for MySQL like Galera or Group Replication provide virtual synchronous replication. High availability is ensured on the database level yet the app isn’t aware of the best options for traffic routing (it may go to overloaded nodes which affects performance at scale).

ProxySQL is a high-performance protocol-aware proxy for MySQL which can balance your traffic load between the nodes of the cluster based on their current status. Serving as the middle layer between the application and MySQL clustering solutions from various vendors, ProxySQL can ensure HA even under the conditions of cluster degradation."

ProxySQL LLC

May 14, 2020
Tweet

More Decks by ProxySQL LLC

Other Decks in Technology

Transcript

  1. A bit about me… René Cannaò • CEO of ProxySQL

    LLC • Author of ProxySQL • MySQL DBA
  2. A bit about ProxySQL LLC We provide services to help

    build, support as well as improve the performance & reliability of your Cloud-Based or On-Premise MySQL infrastructure: • ProxySQL Product Support • Consulting & DBRE • ProxySQL Training • ProxySQL Development
  3. What is ProxySQL? • A "Layer 7" database proxy •

    MySQL / ClickHouse protocol aware • High Performance • High Availability • Feature Rich
  4. Architecture Overview • Clients connect to ProxySQL • Requests are

    evaluated • Actions are performed (e.g. RW Split / Sharding / etc.)
  5. ProxySQL is built with very specific design goals in mind

    and these are the factors which influence the product’s technical specifications: 1. Maximum Uptime 2. Maximum Scalability ProxySQL Design Goals
  6. 1. High Availability 2. Seamless planned and unplanned failover 3.

    Native support for Galera / PXC 4. Native support for Group Replication 5. Integration with Amazon RDS and Aurora 6. MySQL Session Load balancing 7. Supports millions of users, thousands of servers, hundreds of thousands of schemas 8. Multiplexing ProxySQL Feature Highlights
  7. ProxySQL Failover Goals The main goals for ProxySQL during a

    database failover are to: • Reduce service unavailability time during failover • Prevent errors sent to the application • Perform transparent planned database failovers: gracefully redirecting traffic without the application knowing • Avoid having to rewrite existing applications to reconnect after a failover by maintaining the connections
  8. Notes about failover with ProxySQL • ProxySQL itself does not

    manage MySQL topology changes i.e. it is not responsible for handling promotion or re-slaving ◦ This allows a more flexible MySQL architecture as you can choose which failover method best suits your environment • Integrated HA mechanisms (Galera or Group Replication) or external process are needed, such as MHA or Orchestrator to perform failovers
  9. What is Galera Cluster? • Multi-master / Active-Active Clustered MySQL

    Solution • Synchronous Replication (certification based) • Multi-threaded Replication • InnoDB Compliant • Suitable for LAN, WAN and Cloud Solutions • IST (incremental) & SST (full) for state transfer • Auto reconnect mechanism for rejected nodes
  10. What is Oracle Group Replication? • Multi-master / Active-Active Clustered

    MySQL Solution • Single master by default (group_replication_single_primary_mode) • Synchronous Replication (certification based) • Multi-threaded Replication • InnoDB Compliant • Suitable for LAN and low latency networks • State transfer is based on GTID matching
  11. Galera Support in ProxySQL • Historically in ProxySQL v1.x support

    for Galera is based on “external” scripts • An “external” script is configured in the ProxySQL scheduler: • proxysql_galera_checker.sh is provided as a “ready to use script” • ProxySQL 2.x provides native support for Galera
  12. A quick overview of pre-2.x support Typically the script is

    added to the “scheduler” table and loaded to runtime Admin> select * from scheduler\G id: 1 interval_ms: 10000 filename: /var/lib/proxysql/proxysql_galera_checker.sh arg1: 127.0.0.1 arg2: 6032 arg3: 0 arg4: /var/lib/proxysql/proxysql_galera_checker.log arg5: NULL
  13. A quick overview of pre-2.x support Sample log file during

    an outage: ### /var/lib/proxysql/proxysql_galera_checker.log Sat Mar 14 18:42:14 UTC 2020 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4 ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111) Sat Mar 14 18:42:14 UTC 2020 Check server 10.10.10.12:3306 , status ONLINE , wsrep_local_state Sat Mar 14 18:42:14 UTC 2020 Changing server 10.10.10.12:3306 to status OFFLINE_SOFT Sat Mar 14 18:42:14 UTC 2020 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4 Sat Mar 14 18:42:25 UTC 2020 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4 ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111) Sat Mar 14 18:42:25 UTC 2020 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state Sat Mar 14 18:42:25 UTC 2020 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
  14. A quick overview of pre-2.x support Changes are reflected in

    the “mysql_servers” and “runtime_mysql_servers” tables: Admin> SELECT hostgroup_id,hostname,port,status FROM mysql_servers; +--------------+-------------+------+--------------+ | hostgroup_id | hostname | port | status | +--------------+-------------+------+--------------+ | 0 | 10.10.10.11 | 3306 | ONLINE | | 0 | 10.10.10.12 | 3306 | OFFLINE_SOFT | | 0 | 10.10.10.13 | 3306 | ONLINE | +--------------+-------------+------+--------------+
  15. ProxySQL 2.0 Galera Support • In ProxySQL 2.0 the concept

    of the regular “mysql_replication_hostgroup” is extended • In addition to “reader_hostgroup” and “writer_hostgroup” we also have the following additional concepts and hostgroup types: • max_writers • writer_is_also_reader • max_transactions_behind • backup_writer_hostgroup • offline_hostgroup
  16. Galera Related Tables The key Galera tables in ProxySQL Admin

    are: +--------------------------------------------+ | galera related tables | +--------------------------------------------+ | mysql_galera_hostgroups | | runtime_mysql_galera_hostgroups | | mysql_server_galera_log | +--------------------------------------------+
  17. Galera Configuration Table The key configuration table is “mysql_galera_hostgroups”: CREATE

    TABLE mysql_galera_hostgroups ( writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY, backup_writer_hostgroup INT CHECK (backup_writer_hostgroup>=0 AND backup_writer_hostgroup<>writer_hostgroup) NOT NULL, reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND backup_writer_hostgroup<>reader_hostgroup AND reader_hostgroup>0), offline_hostgroup INT NOT NULL CHECK (offline_hostgroup<>writer_hostgroup AND offline_hostgroup<>reader_hostgroup AND backup_writer_hostgroup<>offline_hostgroup AND offline_hostgroup>=0), active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1, max_writers INT NOT NULL CHECK (max_writers >= 0) DEFAULT 1, writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1,2)) NOT NULL DEFAULT 0, max_transactions_behind INT CHECK (max_transactions_behind>=0) NOT NULL DEFAULT 0, comment VARCHAR
  18. Galera Configuration Table • writer_hostgroup • the hostgroup id that

    will contain writer nodes (read_only=0) • backup_writer_hostgroup • when multiple writers are active (read_only=0 on more than 1x node) but max_writers is defined • all nodes that exceed this value are moved to the backup writer group (standby nodes) • reader_hostgroup • the hostgroup id that will contain reader nodes (read_only=1) • offline_hostgroup • when ProxySQL's monitoring determines a node is offline it will be moved to the offline_hostgroup
  19. Galera Configuration Table • active • ProxySQL monitors the active

    hostgroups and makes use only of active hostgroups • Facilitates standby configurations e.g. switch configuration in case of DC failover • max_writers • limits the number of nodes allocated to the writer hostgroup • controls behaviour of backup_writer_hostgroup • writer_is_also_reader • determines if a node in the writer hostgroup will also belong to the reader hostgroup • max_transactions_behind • determines the maximum number of writesets behind the node can fall before shunning the node to prevent stale reads (this is determined by querying the `wsrep_local_recv_queue` galera variable).
  20. Galera Related Global Variables The check timeouts are controlled in

    the “global_variables” table’s “xxx_galera_healthcheck_xxx” variables: ProxySQL Admin> select * from global_variables where variable_name like '%monitor_galera%'; +----------------------------------------------------+----------------+ | variable_name | variable_value | +----------------------------------------------------+----------------+ | mysql-monitor_galera_healthcheck_interval | 5000 | | mysql-monitor_galera_healthcheck_timeout | 800 | | mysql-monitor_galera_healthcheck_max_timeout_count | 3 | +----------------------------------------------------+----------------+
  21. Define the Galera Hostgroups In this example, we define the

    following: writer_hostgroup = 2 reader_hostgroup = 3 offline_hostgroup = 1 backup_writer_hostgroup = 4 max_writers = 1 writer_is_also_reader = 0 active = 1 max_transactions_behind = 100 ProxySQL Admin> INSERT INTO mysql_galera_hostgroups (writer_hostgroup, backup_writer_hostgroup, reader_hostgroup, offline_hostgroup, active, max_writers, writer_is_also_reader, max_transactions_behind) VALUES (2, 4, 3, 1, 1, 1, 0, 100);
  22. Define the Galera Servers • Add two servers to “writer_hostgroup

    = 2” • 172.16.1.112 (read_only=0) • 172.16.1.113 (read_only=0) • Add one server to “reader_hostgroup = 3” • 172.16.1.114 (read_only=1) ProxySQL Admin> INSERT INTO mysql_servers (hostgroup_id, hostname, port, weight) VALUES (2,'172.16.1.112',3306,100), (2,'172.16.1.113',3306,10), (3,'172.16.1.114',3306,100);
  23. Verify mysql_servers ProxySQL Admin> select hostgroup_id, hostname, port, status, weight,

    max_connections from mysql_servers; +--------------+--------------+------+--------+--------+-----------------+ | hostgroup_id | hostname | port | status | weight | max_connections | +--------------+--------------+------+--------+--------+-----------------+ | 2 | 172.16.1.112 | 3306 | ONLINE | 100 | 1000 | | 2 | 172.16.1.113 | 3306 | ONLINE | 10 | 1000 | | 3 | 172.16.1.114 | 3306 | ONLINE | 100 | 1000 | +--------------+--------------+------+--------+--------+-----------------+
  24. Verify mysql_galera_hostgroups ProxySQL Admin> select * from mysql_galera_hostgroups\G *************************** 1.

    row *************************** writer_hostgroup: 2 backup_writer_hostgroup: 4 reader_hostgroup: 3 offline_hostgroup: 1 active: 1 max_writers: 1 writer_is_also_reader: 0 max_transactions_behind: 100 comment: NULL
  25. Load / Save Configuration • We must not forget to

    “load the configuration to runtime” and additionally “save the configuration to disk”: # The following command will LOAD both mysql_servers and mysql_galera_hostgroups to RUNTIME making this the active ProxySQL configuration for runtime only (does not persist across restarts): LOAD MYSQL SERVERS TO RUNTIME; # The following command will SAVE both mysql_servers and mysql_galera_hostgroups to DISK making this the “persisted” configuration in case ProxySQL is restarted: SAVE MYSQL SERVERS TO DISK;
  26. Dynamic Reconfiguration • After loading the configuration to runtime ProxySQL

    has reconfigured the “mysql_servers” based on the configuration of “mysql_galera_hostgroups” (specifically due to the configuration “max_writers = 1”) • The host “172.16.1.113” was moved to the backup_writer_hostgroup ProxySQL Admin> select hostgroup, srv_host, status, ConnUsed, MaxConnUsed, Queries from stats.stats_mysql_connection_pool order by srv_host; +-----------+--------------+--------+----------+-------------+-----------+ | hostgroup | srv_host | status | ConnUsed | MaxConnUsed | Queries | +-----------+--------------+--------+----------+-------------+-----------+ | 2 | 172.16.1.112 | ONLINE | 3 | 4 | 930742390 | | 4 | 172.16.1.113 | ONLINE | 0 | 0 | 0 | | 3 | 172.16.1.114 | ONLINE | 1 | 1 | 233130345 | +-----------+--------------+--------+----------+-------------+-----------+
  27. Galera Monitoring Statistics The statistics related to the Galera cluster

    can be viewed in the “monitor.mysql_server_galera_log” table: ProxySQL Admin> select * from mysql_server_galera_log order by time_start_us desc limit 3; *************************** 1. row *************************** hostname: 172.16.1.112 port: 3306 time_start_us: 1529510693289001 success_time_us: 1234 primary_partition: YES read_only: NO wsrep_local_recv_queue: 0 wsrep_local_state: 4 wsrep_desync: NO wsrep_reject_queries: NO wsrep_sst_donor_rejects_queries: NO error: NULL *************************** 2. row *************************** hostname: 172.16.1.113 ...
  28. OGR Support in ProxySQL • ProxySQL has had support for

    OGR since version 1.3.x • Native support has been available since version 1.4.x • Configuration of ProxySQL for OGR is very similar to the configuration for Galera • The differences are highlighted in the next slides: • The “max_transactions_behind” variable • The OGR healthcheck global variables • A set of stored procedures required for OGR to support ProxySQL
  29. OGR Related Tables The key OGR tables in ProxySQL Admin

    are: +---------------------------------------------+ | ogr related tables. | +---------------------------------------------+ | mysql_group_replication_hostgroups | | runtime_mysql_group_replication_hostgroups | | mysql_server_group_replication_log | +---------------------------------------------+
  30. OGR Configuration Table The key configuration table is “mysql_group_replication_hostgroups”: CREATE

    TABLE mysql_group_replication_hostgroups ( writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY, backup_writer_hostgroup INT CHECK (backup_writer_hostgroup>=0 AND backup_writer_hostgroup<>writer_hostgroup) NOT NULL, reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND backup_writer_hostgroup<>reader_hostgroup AND reader_hostgroup>0), offline_hostgroup INT NOT NULL CHECK (offline_hostgroup<>writer_hostgroup AND offline_hostgroup<>reader_hostgroup AND backup_writer_hostgroup<>offline_hostgroup AND offline_hostgroup>=0), active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1, max_writers INT NOT NULL CHECK (max_writers >= 0) DEFAULT 1, writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1)) NOT NULL DEFAULT 0, max_transactions_behind INT CHECK (max_transactions_behind>=0) NOT NULL DEFAULT 0, comment VARCHAR, UNIQUE (reader_hostgroup), UNIQUE (offline_hostgroup), UNIQUE (backup_writer_hostgroup))
  31. OGR Config - max_transactions_behind • The only difference when compared

    to the Galera Configuration Table is the concept of “max_transactions_behind”: • max_transactions_behind • determines the maximum number of binlog events behind the node can fall before shunning the node to prevent stale reads • this is determined by querying the sys.gr_member_routing_candidate_status view • When dealing with OGR this relates to the number of GTID events rather than Galera writesets
  32. Galera Config - max_transactions_behind • The only difference when compared

    to the Group Replication Configuration Table is the concept of “max_transactions_behind”: • max_transactions_behind: - determines the maximum number of binlog events behind the node can fall before shunning the node to prevent stale reads (this is determined by querying the `wsrep_local_recv_queue` galera variable). • Galera writesets
  33. OGR Related Global Variables The check timeouts are controlled in

    the “global_variables” table’s “xxx_groupreplication_healthcheck_xxx” variables: ProxySQL Admin> select * from global_variables where variable_name like '%monitor_groupreplication%'; +--------------------------------------------------------------+----------------+ | variable_name | variable_value | +--------------------------------------------------------------+----------------+ | mysql-monitor_groupreplication_healthcheck_interval | 5000 | | mysql-monitor_groupreplication_healthcheck_timeout | 800 | | mysql-monitor_groupreplication_healthcheck_max_timeout_count | 3 | | mysql-monitor_groupreplication_max_transactions_behind_count | 3 | +--------------------------------------------------------------+----------------+
  34. Group Replication Required Stored Procs The following view and helper

    functions also need to be added to MySQL in order for Group Replication to be compatible with ProxySQL’s monitoring: • CREATE VIEW gr_member_routing_candidate_status • CREATE FUNCTION gr_member_in_primary_partition • CREATE FUNCTION gr_applier_queue_length • CREATE FUNCTION GTID_COUNT • CREATE FUNCTION GTID_NORMALIZE • CREATE FUNCTION LOCATE2 • CREATE FUNCTION IFZERO *** Available here - https://gist.github.com/lefred/77ddbde301c72535381ae7af9f968322
  35. OGR Required Stored Procs Once deployed ProxySQL will query the

    view in order to retrieve the status of each of the Group Replication nodes (the view can also be used for troubleshooting purposes): ## Status of the primary node (mysql_node1) mysql> SELECT * FROM sys.gr_member_routing_candidate_status; +------------------+-----------+---------------------+----------------------+ | viable_candidate | read_only | transactions_behind | transactions_to_cert | +------------------+-----------+---------------------+----------------------+ | YES | NO | 0 | 0 | +------------------+-----------+---------------------+----------------------+ ## Status of a secondary node (mysql_node2) mysql> SELECT * FROM sys.gr_member_routing_candidate_status; +------------------+-----------+---------------------+----------------------+ | viable_candidate | read_only | transactions_behind | transactions_to_cert | +------------------+-----------+---------------------+----------------------+ | YES | YES | 0 | 0 | +------------------+-----------+---------------------+----------------------+
  36. Group Replication Monitoring Statistics The statistics related to the Group

    Replication cluster can be viewed in the “monitor.mysql_server_group_replication_log” table: ProxySQL Admin> select * from mysql_server_group_replication_log order by time_start_us desc limit 3\G *************************** 1. row *************************** hostname: mysql_node1 port: 3306 time_start_us: 1515079109821971 success_time_us: 1582 viable_candidate: YES read_only: NO transactions_behind: 0 error: NULL *************************** 2. row *************************** hostname: mysql_node2 port: 3306 time_start_us: 1515079109822292 success_time_us: 1845 viable_candidate: YES read_only: YES transactions_behind: 0 error: NULL ...
  37. AWS Aurora Integration • Get metrics and statuses from REPLICA_HOST_STATUS

    • Automatically detects writer/reader roles • Auto-discovery of new replicas • AZ awareness • Replication lag monitoring with millisecond granularity
  38. AWS Aurora Integration New configuration table Create Table: CREATE TABLE

    mysql_aws_aurora_hostgroups ( writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY, reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>0), active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1, aurora_port INT NOT NUlL DEFAULT 3306, domain_name VARCHAR NOT NULL CHECK (SUBSTR(domain_name,1,1) = '.'), max_lag_ms INT NOT NULL CHECK (max_lag_ms>= 10 AND max_lag_ms <= 600000) DEFAULT 600000, check_interval_ms INT NOT NULL CHECK (check_interval_ms >= 100 AND check_interval_ms <= 600000) DEFAULT 1000, check_timeout_ms INT NOT NULL CHECK (check_timeout_ms >= 80 AND check_timeout_ms <= 3000) DEFAULT 800, writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1)) NOT NULL DEFAULT 0, new_reader_weight INT CHECK (new_reader_weight >= 0 AND new_reader_weight <=10000000) NOT NULL DEFAULT 1, add_lag_ms INT NOT NULL CHECK (add_lag_ms >= 0 AND add_lag_ms <= 600000) DEFAULT 30, min_lag_ms INT NOT NULL CHECK (min_lag_ms >= 0 AND min_lag_ms <= 600000) DEFAULT 30, lag_num_checks INT NOT NULL CHECK (lag_num_checks >= 1 AND lag_num_checks <= 16) DEFAULT 1, comment VARCHAR, UNIQUE (reader_hostgroup))
  39. AWS Aurora Integration New monitor table mysql_server_aws_aurora_log mysql> SHOW CREATE

    TABLE monitor.mysql_server_aws_aurora_log\G *************************** 1. row *************************** table: mysql_server_aws_aurora_log Create Table: CREATE TABLE mysql_server_aws_aurora_log ( hostname VARCHAR NOT NULL, port INT NOT NULL DEFAULT 3306, time_start_us INT NOT NULL DEFAULT 0, success_time_us INT DEFAULT 0, error VARCHAR, SERVER_ID VARCHAR NOT NULL DEFAULT '', SESSION_ID VARCHAR, LAST_UPDATE_TIMESTAMP VARCHAR, replica_lag_in_milliseconds INT NOT NULL DEFAULT 0, CPU INT NOT NULL DEFAULT 0, PRIMARY KEY (hostname, port, time_start_us, SERVER_ID))
  40. AWS Aurora Integration New monitor table mysql_server_aws_aurora_check_status mysql> SHOW CREATE

    TABLE monitor.mysql_server_aws_aurora_check_status\G *************************** 1. row *************************** table: mysql_server_aws_aurora_check_status Create Table: CREATE TABLE mysql_server_aws_aurora_check_status ( writer_hostgroup INT NOT NULL, hostname VARCHAR NOT NULL, port INT NOT NULL DEFAULT 3306, last_checked_at VARCHAR, checks_tot INT NOT NULL DEFAULT 0, checks_ok INT NOT NULL DEFAULT 0, last_error VARCHAR, PRIMARY KEY (writer_hostgroup, hostname, port))
  41. AWS Aurora Integration New monitor table mysql_server_aws_aurora_failovers mysql> SHOW CREATE

    TABLE monitor.mysql_server_aws_aurora_failovers\G *************************** 1. row *************************** table: mysql_server_aws_aurora_failovers Create Table: CREATE TABLE mysql_server_aws_aurora_failovers ( writer_hostgroup INT NOT NULL, hostname VARCHAR NOT NULL, inserted_at VARCHAR NOT NULL)
  42. MySQL HA at Scale • Potentially very complex database topologies

    • Hundreds or thousands of MySQL servers • Each cluster solution has its own HA mechanism • Service discovery
  43. ProxySQL - DBaaS layer • Support millions of distinct users

    • Hundreds of thousands of concurrent connections • Thousands of backend database servers • Complex routing to support hundreds of thousands of schema
  44. Clustered ProxySQL Architecture DB DB DB APP DB PROXYSQL DB

    DB APP PROXYSQL APP PROXYSQL APP PROXYSQL PROXYSQL PROXYSQL DBMANAGER PROXYSQL Clustered Proxy Layer