ProxySQL High Availaibility 2019 - Rene Cannao (ProxySQL)

ProxySQL HA Overview 2019

ProxySQL LLC We provide services to help build, support and
improve the performance & reliability of your Cloud-Based and On-Premise MySQL infrastructure. © ProxySQL 2013-2019. All rights reserved.

ProxySQL LLC • ProxySQL Development • ProxySQL Support Services •
ProxySQL, MySQL, DevOps & Outsourcing • Consulting Services © ProxySQL 2013-2019. All rights reserved.

What is ProxySQL? MySQL protocol aware data gateway – Clients
connect to ProxySQL – Requests are evaluated – Various actions are performed © ProxySQL 2013-2019. All rights reserved.

• High Availability and infinite scalability • Seamless planned and
unplanned failover • MySQL Session Load balancing • Launching multiple instances on same port • Binding services on different ports ProxySQL features © ProxySQL 2013-2019. All rights reserved.

• Connection pooling and multiplexing • Read caching outside of
the database server • Complex query routing and read/write split • Query throttling, firewalling and mirroring • On-the-fly query rewrite • Advanced data masking ProxySQL features © ProxySQL 2013-2019. All rights reserved.

• Dynamic runtime reconfiguration • Real time statistics and monitoring
• Scheduling of external scripts • Causal reads by tracking GTIDs across pools of backend servers • Native ProxySQL Clustering ProxySQL features © ProxySQL 2013-2019. All rights reserved.

• SSL support for frontend & SSL v1.2 • Native
support Galera / PXC and Group Replication • Integration with Amazon RDS and Aurora • MySQL, MariaDB, Percona and ClickHouse backends • Supports millions of users and tens of thousands of database servers ProxySQL features © ProxySQL 2013-2019. All rights reserved.

Deployment and HA for ProxySQL

Production Deployment © ProxySQL 2013-2019. All rights reserved. The two
main approaches for deploying ProxySQL are either: - Deploy ProxySQL on your application servers - Deploy ProxySQL in a separate layer / standalone server Each approach has its own advantages and disadvantages

App Server Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL
is deployed locally • No network overhead • No single point of failure • Isolated configuration • Rolling upgrades 2

App Server Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL
is deployed locally • No network overhead • No single point of failure • Isolated configuration • Rolling upgrades 2 • DB monitoring overhead • More backend connections • Configuration effort • Query cache isolated Configuration management QC1 QC2 QC3

ProxySQL Layer Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL
is deployed on a standalone server • DB monitoring overhead • Less backend connections • Configuration effort • Shared Query Cache Configuration management Global QC

ProxySQL Layer Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL
is deployed on a standalone server • DB monitoring overhead • Less backend connections • Configuration effort • Shared Query Cache Configuration management Global QC • Additional Round Trip Time • Single point of failure • Shared configuration • No rolling upgrades 2

HA for ProxySQL © ProxySQL 2013-2019. All rights reserved. ProxySQL
is the ”first contact” for applications connecting to a database services. • ProxySQL Service is CRITICAL – Deployment decisions affect uptime – Architectural considerations affect scalability – Additional layers introduce possible points of failure

HA for ProxySQL © ProxySQL 2013-2019. All rights reserved. Common
high availability architectures for ProxySQL include: • Keepalived (IP failover / TCP load balancer) • DNS / Consul DNS (other service discovery tools) • ProxySQL Cascading

Keepalived © ProxySQL 2013-2019. All rights reserved. VRRP / LB
VIP 10.10.10.1 • HA provided by Virtual Router Redundancy Protocol or load balancer

Keepalived © ProxySQL 2013-2019. All rights reserved. VIP 10.10.10.1 •
HA provided by Virtual Router Redundancy Protocol • Service check fails VRRP / LB

HA provided by Virtual Router Redundancy Protocol • Service check fails • Keepalived switches the VIP to a standby ProxySQL instance VRRP / LB

HA provided by Virtual Router Redundancy Protocol • Service check fails • Keepalived switches the VIP to a standby ProxySQL instance • All client connections are LOST and re-established by the application VRRP / LB

DNS / Consul DNS © ProxySQL 2013-2019. All rights reserved.
• HA provided Consul agents running on each ProxySQL instance

• HA provided Consul agents running on each ProxySQL instance • Application servers access the service via DNS

• HA provided Consul agents running on each ProxySQL instance • Application servers access the service via DNS • Provides load balancing as well as HA

• HA provided Consul agents running on each ProxySQL instance • Application servers access the service via DNS • Provides load balancing as well as HA • All client connections are LOST and re-established by the application

ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. • ProxySQL
is deployed on each application server and in a ProxySQL layer • Applications connect to the local ProxySQL server • Provides load balancing as well as HA ProxySQL

is deployed on each application server and in a ProxySQL layer • Applications connect to the local ProxySQL server • Provides load balancing as well as HA • Open connections are held and retried on an available backend – no connections are lost ProxySQL

is deployed on each application server and in a ProxySQL layer • Applications connect to the local ProxySQL server • Provides load balancing as well as HA • Open connections are held and retried on an available backend – no connections lost • If the PRIMARY MySQL instance is lost while multiplexing is disabled or a transaction is active – the statement will fail (this applies to the ProxySQL layer as well) ProxySQL

ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. ProxySQL can
be used to provide HA to itself as it communicates using the MySQL Protocol: • Application layer aware • Provides connection retry • Allows for scale up / scale down without connection loss (during planned maintenance) • Transactions can be lost during edge case unplanned failover

High Availability for MySQL

Before looking at High Availability we need to first look
into how ProxySQL monitors availability of MySQL backends. • ProxySQL is designed to perform various checks against backend servers • The checks target various layers of communication • Checks are built with dependencies, if a lower level check fails there is no need for a higher level check Monitoring Availability © ProxySQL 2013-2019. All rights reserved.

The Monitor Module is responsible for checking backends and ensuring
that traffic is only sent to operational servers. Monitoring is performed at the following levels: • Connect – tests new connections to backend servers (MySQL connect) • Ping – pings backend servers (MySQL ping) • Replication Lag – checks Seconds_Behind_Master status • Read only – checks read_only status The Monitor Module © ProxySQL 2013-2019. All rights reserved.

Backend server states © ProxySQL 2013-2019. All rights reserved. The
monitor module will determine the status of a backend server in the “runtime_mysql_servers” table (these states can also be set manually in “mysql_servers”): • ONLINE - backend server is fully operational (manually set / automatically set by monitor) • SHUNNED - backend sever is temporarily taken out of use because of either too many connection errors in a time that was too short, or replication lag exceeded the allowed threshold (automatically set by monitor)

Backend server states © ProxySQL 2013-2019. All rights reserved. •
OFFLINE_SOFT - when a server is put into OFFLINE_SOFT mode, new incoming connections aren't accepted anymore, while the existing connections are kept until they became inactive. In other words, connections are kept in use until the current transaction is completed. This allows to gracefully detach a backend (manually set). • OFFLINE_HARD - when a server is put into OFFLINE_HARD mode, the existing connections are dropped and no new incoming connections are accepted. This is equivalent to deleting the server from a hostgroup (indefinitely or for maintenance) and is done by ProxySQL on failover (manually / automatically set by monitor)

Async Replication Failover © ProxySQL 2013-2019. All rights reserved. The
most common topology for MySQL deployments is traditional primary / replica async (or semi-sync): RW Primary for writes RO Replicas for reads

Since there is no native support for failover in MySQL
external mechanisms are used to promote / demote servers: MHA for MySQL Replication Managers © ProxySQL 2013-2019. All rights reserved. RW Primary for writes RO Replicas for reads

During a failover these tools always perform “roughly” the same
actions, specifically: Failover Process © ProxySQL 2013-2019. All rights reserved. RW RO

ProxySQL transparently detects and adapts to these topology changes via
the monitor modules read-only check Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RW RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)

When the status of a Primary server changes to read-only
ProxySQL will remove the server from the writer hostgroup (OFFLINE_HARD) Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RO RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)

Momentarily there are no RW nodes available, ProxySQL holds on
to the connections for mysql-connect_timeout_server_max milliseconds Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)

Once a RW is detected it is added to the
writer_hostgroup (ONLINE) and connections are routed to the new Primary server Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RW RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)

Extended Replication Support © ProxySQL 2013-2019. All rights reserved. ProxySQL
was initially built to support regular “async” / “semi-sync” replication topologies (traditional primary / replica), later support for additional types of replication was introduced: • Support for custom replication topologies was available via external scheduler scripts since ProxySQL version 1.x • ProxySQL 1.3.x introduced support for Oracle Group Replication • Native support for Oracle Group Replication was added in ProxySQL 1.4.x • Native support for Galera Replication was added in ProxySQL 2.x • Native support for Amazon Aurora and RDS was added in ProxySQL 2.x

What is Group Replication? © ProxySQL 2013-2019. All rights reserved.
• Multi-master / Active-Active Clustered MySQL Solution – Single master by default (group_replication_single_primary_mode) • Synchronous Replication (certification based) • Multi-threaded Replication • InnoDB Compliant • Suitable for LAN and low latency networks • State transfer is based on GTID matching

What is Galera Cluster? © ProxySQL 2013-2019. All rights reserved.
• Multi-master / Active-Active Clustered MySQL Solution • Synchronous Replication (certification based) • Multi-threaded Replication • InnoDB Compliant • Suitable for LAN, WAN and Cloud Solutions • IST (incremental) & SST (full) for state transfer • Auto reconnect mechanism for rejected nodes

ProxySQL 2.x MySQL Clustering © ProxySQL 2013-2019. All rights reserved.
In ProxySQL 2.0 the concept of “mysql_replication_hostgroup” is extended to allow for multiple hostgroups. • In addition to “reader_hostgroup” and “writer_hostgroup” we also have the following additional concepts and hostgroup types: – max_writers – writer_is_also_reader – max_transactions_behind – backup_writer_hostgroup – offline_hostgroup

Failover in sync & async replication environments without losing transactions

Graceful Failover © ProxySQL 2013-2019. All rights reserved. As previously
discussed: • ProxySQL will automatically re-route “writer” connections to the new Primary after a failover • The failover time is configurable based on the read only monitoring check • If the PRIMARY MySQL instance is lost while multiplexing is disabled or a transaction is active – the statement will fail since the full transaction should be retried.

Graceful Failover © ProxySQL 2013-2019. All rights reserved. Although this
is acceptable in an “emergency situation” failovers can also be required for common maintenance activities including: • Scaling up hardware and resources • Operating system updates • Upgrading MySQL to a newer release • Executing certain blocking ALTERs e.g. converting a primary key from INT to BIGINT

Graceful Failover © ProxySQL 2013-2019. All rights reserved. ProxySQL facilitates
“planned failovers” as well • In the “mysql_servers” table each server can have a status of ONLINE, SHUNNED, OFFLINE_SOFT, OFFLINE_HARD configured • OFFLINE_SOFT is a special status that allows all currently executing connections (statements and transactions) to complete while preventing any new connections to be made • New connections are either sent to another available server in the same destination_hostgroup or held until a server is available

The initial state of the healthy MySQL cluster Graceful Failover
Example © ProxySQL 2013-2019. All rights reserved. Replication Hostgroup RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)

Firstly remove the replication hostgroup to prevent automatic failover as
a safety precaution Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)

Next set the status of the Primary to OFFLINE_SOFT to
allow current statements and transactions to complete their execution Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=OFFLINE_SOFT Hostgroup 1: Replicas (read_only=1)

Once completed we’ll see the connection count on hostgroup=0 drop
to zero Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=OFFLINE_SOFT Hostgroup 1: Replicas (read_only=1)

Now its time to perform the failover either manually or
using a tool such as Orchestrator or MHA and promote a Replica to Primary Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1) Previous Primary (read_only=0) - status=OFFLINE_SOFT

Traffic will continue to be served from the existing nodes
and maintenance activities can be performed Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1) Maintenance can now be performed on the previous Primary

Replication Hostgroup At this point the replication hostgroup can be
reconfigured to allow for any unplanned failover that may occur Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)

Once maintenance is completed we can add the previous Primary
to the Replica hostgroup=1 by dynamically updating the configuration Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. Replication Hostgroup RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)

Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. The
commands to perform the previous example can be executed manually or scripted • Manually executing allows for more control while scripts are more precise and are likely to execute in a smaller time-frame • NOTE: The time between draining connections and promoting a replica must be shorter than “mysql-connect_timeout_server_max” else the new waiting connections will return an error – you can increase this before failover and revert afterwards as needed

Failover Considerations © ProxySQL 2013-2019. All rights reserved. • Ensure
that the number of available nodes support the maximum throughput required by applications • Whenever possible use the same sized servers for Primary and Replicas which are candidates for promotion (at least one pair should be equal) • Prepare scripts to execute “planned failover” to ensure a smooth and repeatable process • When integrating with MHA and Orchestrator consider implementing ProxySQL planned failover logic in the available hooks

Failover Best Practices © ProxySQL 2013-2019. All rights reserved. A
few failover best practices: • When building failover mechanisms for small environments it is still recommended to use a tool such as MySQL for MHA • For larger environments it is suggested to use Orchestrator, especially for environments when multiple data centers are involved (consider the RAFT implementation) or complex replication hierarchy (long chains of replication)

Failover Best Practices © ProxySQL 2013-2019. All rights reserved. •
Carefully consider under what circumstances automatic failover should occur, if failover is too frequent it can lead to unneeded downtime • Manual failover is acceptable if there is a 24/7 on-call team available, in fact even preferred • Properly test failover time for both “local” and “cross-dc” failovers and calculate your ”uptime loss” per failover to create estimates of “number of failovers” that can be supported per year

ProxySQL High Availaibility 2019 - Rene Cannao ...

ProxySQL High Availaibility 2019 - Rene Cannao (ProxySQL)

More Decks by ProxySQL LLC

Other Decks in Technology

Featured

Transcript