Slide 1

Slide 1 text

ProxySQL HA Overview 2019

Slide 2

Slide 2 text

ProxySQL LLC We provide services to help build, support and improve the performance & reliability of your Cloud-Based and On-Premise MySQL infrastructure. © ProxySQL 2013-2019. All rights reserved.

Slide 3

Slide 3 text

ProxySQL LLC • ProxySQL Development • ProxySQL Support Services • ProxySQL, MySQL, DevOps & Outsourcing • Consulting Services © ProxySQL 2013-2019. All rights reserved.

Slide 4

Slide 4 text

What is ProxySQL? MySQL protocol aware data gateway – Clients connect to ProxySQL – Requests are evaluated – Various actions are performed © ProxySQL 2013-2019. All rights reserved.

Slide 5

Slide 5 text

• High Availability and infinite scalability • Seamless planned and unplanned failover • MySQL Session Load balancing • Launching multiple instances on same port • Binding services on different ports ProxySQL features © ProxySQL 2013-2019. All rights reserved.

Slide 6

Slide 6 text

• Connection pooling and multiplexing • Read caching outside of the database server • Complex query routing and read/write split • Query throttling, firewalling and mirroring • On-the-fly query rewrite • Advanced data masking ProxySQL features © ProxySQL 2013-2019. All rights reserved.

Slide 7

Slide 7 text

• Dynamic runtime reconfiguration • Real time statistics and monitoring • Scheduling of external scripts • Causal reads by tracking GTIDs across pools of backend servers • Native ProxySQL Clustering ProxySQL features © ProxySQL 2013-2019. All rights reserved.

Slide 8

Slide 8 text

• SSL support for frontend & SSL v1.2 • Native support Galera / PXC and Group Replication • Integration with Amazon RDS and Aurora • MySQL, MariaDB, Percona and ClickHouse backends • Supports millions of users and tens of thousands of database servers ProxySQL features © ProxySQL 2013-2019. All rights reserved.

Slide 9

Slide 9 text

Deployment and HA for ProxySQL

Slide 10

Slide 10 text

Production Deployment © ProxySQL 2013-2019. All rights reserved. The two main approaches for deploying ProxySQL are either: - Deploy ProxySQL on your application servers - Deploy ProxySQL in a separate layer / standalone server Each approach has its own advantages and disadvantages

Slide 11

Slide 11 text

App Server Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL is deployed locally • No network overhead • No single point of failure • Isolated configuration • Rolling upgrades 2

Slide 12

Slide 12 text

App Server Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL is deployed locally • No network overhead • No single point of failure • Isolated configuration • Rolling upgrades 2 • DB monitoring overhead • More backend connections • Configuration effort • Query cache isolated Configuration management QC1 QC2 QC3

Slide 13

Slide 13 text

ProxySQL Layer Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL is deployed on a standalone server • DB monitoring overhead • Less backend connections • Configuration effort • Shared Query Cache Configuration management Global QC

Slide 14

Slide 14 text

ProxySQL Layer Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL is deployed on a standalone server • DB monitoring overhead • Less backend connections • Configuration effort • Shared Query Cache Configuration management Global QC • Additional Round Trip Time • Single point of failure • Shared configuration • No rolling upgrades 2

Slide 15

Slide 15 text

HA for ProxySQL © ProxySQL 2013-2019. All rights reserved. ProxySQL is the ”first contact” for applications connecting to a database services. • ProxySQL Service is CRITICAL – Deployment decisions affect uptime – Architectural considerations affect scalability – Additional layers introduce possible points of failure

Slide 16

Slide 16 text

HA for ProxySQL © ProxySQL 2013-2019. All rights reserved. Common high availability architectures for ProxySQL include: • Keepalived (IP failover / TCP load balancer) • DNS / Consul DNS (other service discovery tools) • ProxySQL Cascading

Slide 17

Slide 17 text

Keepalived © ProxySQL 2013-2019. All rights reserved. VRRP / LB VIP 10.10.10.1 • HA provided by Virtual Router Redundancy Protocol or load balancer

Slide 18

Slide 18 text

Keepalived © ProxySQL 2013-2019. All rights reserved. VIP 10.10.10.1 • HA provided by Virtual Router Redundancy Protocol • Service check fails VRRP / LB

Slide 19

Slide 19 text

Keepalived © ProxySQL 2013-2019. All rights reserved. VIP 10.10.10.1 • HA provided by Virtual Router Redundancy Protocol • Service check fails • Keepalived switches the VIP to a standby ProxySQL instance VRRP / LB

Slide 20

Slide 20 text

Keepalived © ProxySQL 2013-2019. All rights reserved. VIP 10.10.10.1 • HA provided by Virtual Router Redundancy Protocol • Service check fails • Keepalived switches the VIP to a standby ProxySQL instance • All client connections are LOST and re-established by the application VRRP / LB

Slide 21

Slide 21 text

DNS / Consul DNS © ProxySQL 2013-2019. All rights reserved. • HA provided Consul agents running on each ProxySQL instance

Slide 22

Slide 22 text

DNS / Consul DNS © ProxySQL 2013-2019. All rights reserved. • HA provided Consul agents running on each ProxySQL instance • Application servers access the service via DNS

Slide 23

Slide 23 text

DNS / Consul DNS © ProxySQL 2013-2019. All rights reserved. • HA provided Consul agents running on each ProxySQL instance • Application servers access the service via DNS • Provides load balancing as well as HA

Slide 24

Slide 24 text

DNS / Consul DNS © ProxySQL 2013-2019. All rights reserved. • HA provided Consul agents running on each ProxySQL instance • Application servers access the service via DNS • Provides load balancing as well as HA

Slide 25

Slide 25 text

DNS / Consul DNS © ProxySQL 2013-2019. All rights reserved. • HA provided Consul agents running on each ProxySQL instance • Application servers access the service via DNS • Provides load balancing as well as HA • All client connections are LOST and re-established by the application

Slide 26

Slide 26 text

ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. • ProxySQL is deployed on each application server and in a ProxySQL layer • Applications connect to the local ProxySQL server • Provides load balancing as well as HA ProxySQL

Slide 27

Slide 27 text

ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. • ProxySQL is deployed on each application server and in a ProxySQL layer • Applications connect to the local ProxySQL server • Provides load balancing as well as HA ProxySQL

Slide 28

Slide 28 text

ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. • ProxySQL is deployed on each application server and in a ProxySQL layer • Applications connect to the local ProxySQL server • Provides load balancing as well as HA • Open connections are held and retried on an available backend – no connections are lost ProxySQL

Slide 29

Slide 29 text

ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. • ProxySQL is deployed on each application server and in a ProxySQL layer • Applications connect to the local ProxySQL server • Provides load balancing as well as HA • Open connections are held and retried on an available backend – no connections lost • If the PRIMARY MySQL instance is lost while multiplexing is disabled or a transaction is active – the statement will fail (this applies to the ProxySQL layer as well) ProxySQL

Slide 30

Slide 30 text

ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. ProxySQL can be used to provide HA to itself as it communicates using the MySQL Protocol: • Application layer aware • Provides connection retry • Allows for scale up / scale down without connection loss (during planned maintenance) • Transactions can be lost during edge case unplanned failover

Slide 31

Slide 31 text

High Availability for MySQL

Slide 32

Slide 32 text

Before looking at High Availability we need to first look into how ProxySQL monitors availability of MySQL backends. • ProxySQL is designed to perform various checks against backend servers • The checks target various layers of communication • Checks are built with dependencies, if a lower level check fails there is no need for a higher level check Monitoring Availability © ProxySQL 2013-2019. All rights reserved.

Slide 33

Slide 33 text

The Monitor Module is responsible for checking backends and ensuring that traffic is only sent to operational servers. Monitoring is performed at the following levels: • Connect – tests new connections to backend servers (MySQL connect) • Ping – pings backend servers (MySQL ping) • Replication Lag – checks Seconds_Behind_Master status • Read only – checks read_only status The Monitor Module © ProxySQL 2013-2019. All rights reserved.

Slide 34

Slide 34 text

Backend server states © ProxySQL 2013-2019. All rights reserved. The monitor module will determine the status of a backend server in the “runtime_mysql_servers” table (these states can also be set manually in “mysql_servers”): • ONLINE - backend server is fully operational (manually set / automatically set by monitor) • SHUNNED - backend sever is temporarily taken out of use because of either too many connection errors in a time that was too short, or replication lag exceeded the allowed threshold (automatically set by monitor)

Slide 35

Slide 35 text

Backend server states © ProxySQL 2013-2019. All rights reserved. • OFFLINE_SOFT - when a server is put into OFFLINE_SOFT mode, new incoming connections aren't accepted anymore, while the existing connections are kept until they became inactive. In other words, connections are kept in use until the current transaction is completed. This allows to gracefully detach a backend (manually set). • OFFLINE_HARD - when a server is put into OFFLINE_HARD mode, the existing connections are dropped and no new incoming connections are accepted. This is equivalent to deleting the server from a hostgroup (indefinitely or for maintenance) and is done by ProxySQL on failover (manually / automatically set by monitor)

Slide 36

Slide 36 text

Async Replication Failover © ProxySQL 2013-2019. All rights reserved. The most common topology for MySQL deployments is traditional primary / replica async (or semi-sync): RW Primary for writes RO Replicas for reads

Slide 37

Slide 37 text

Since there is no native support for failover in MySQL external mechanisms are used to promote / demote servers: MHA for MySQL Replication Managers © ProxySQL 2013-2019. All rights reserved. RW Primary for writes RO Replicas for reads

Slide 38

Slide 38 text

During a failover these tools always perform “roughly” the same actions, specifically: Failover Process © ProxySQL 2013-2019. All rights reserved. RW RO

Slide 39

Slide 39 text

During a failover these tools always perform “roughly” the same actions, specifically: Failover Process © ProxySQL 2013-2019. All rights reserved. Switch Primary node to read-only RO RO

Slide 40

Slide 40 text

During a failover these tools always perform “roughly” the same actions, specifically: Failover Process © ProxySQL 2013-2019. All rights reserved. Elect a Replica to become the new Primary and connect Replicas RO Replicas for reads RO RO RO

Slide 41

Slide 41 text

During a failover these tools always perform “roughly” the same actions, specifically: Failover Process © ProxySQL 2013-2019. All rights reserved. Optionally set previous Primary as a Replica of the new Primary RO Replicas for reads RO RO

Slide 42

Slide 42 text

During a failover these tools always perform “roughly” the same actions, specifically: Failover Process © ProxySQL 2013-2019. All rights reserved. RW RO Set the new Primary to read-write

Slide 43

Slide 43 text

ProxySQL transparently detects and adapts to these topology changes via the monitor modules read-only check Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RW RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)

Slide 44

Slide 44 text

When the status of a Primary server changes to read-only ProxySQL will remove the server from the writer hostgroup (OFFLINE_HARD) Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RO RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)

Slide 45

Slide 45 text

Momentarily there are no RW nodes available, ProxySQL holds on to the connections for mysql-connect_timeout_server_max milliseconds Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)

Slide 46

Slide 46 text

Once a RW is detected it is added to the writer_hostgroup (ONLINE) and connections are routed to the new Primary server Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RW RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)

Slide 47

Slide 47 text

Extended Replication Support © ProxySQL 2013-2019. All rights reserved. ProxySQL was initially built to support regular “async” / “semi-sync” replication topologies (traditional primary / replica), later support for additional types of replication was introduced: • Support for custom replication topologies was available via external scheduler scripts since ProxySQL version 1.x • ProxySQL 1.3.x introduced support for Oracle Group Replication • Native support for Oracle Group Replication was added in ProxySQL 1.4.x • Native support for Galera Replication was added in ProxySQL 2.x • Native support for Amazon Aurora and RDS was added in ProxySQL 2.x

Slide 48

Slide 48 text

What is Group Replication? © ProxySQL 2013-2019. All rights reserved. • Multi-master / Active-Active Clustered MySQL Solution – Single master by default (group_replication_single_primary_mode) • Synchronous Replication (certification based) • Multi-threaded Replication • InnoDB Compliant • Suitable for LAN and low latency networks • State transfer is based on GTID matching

Slide 49

Slide 49 text

What is Galera Cluster? © ProxySQL 2013-2019. All rights reserved. • Multi-master / Active-Active Clustered MySQL Solution • Synchronous Replication (certification based) • Multi-threaded Replication • InnoDB Compliant • Suitable for LAN, WAN and Cloud Solutions • IST (incremental) & SST (full) for state transfer • Auto reconnect mechanism for rejected nodes

Slide 50

Slide 50 text

ProxySQL 2.x MySQL Clustering © ProxySQL 2013-2019. All rights reserved. In ProxySQL 2.0 the concept of “mysql_replication_hostgroup” is extended to allow for multiple hostgroups. • In addition to “reader_hostgroup” and “writer_hostgroup” we also have the following additional concepts and hostgroup types: – max_writers – writer_is_also_reader – max_transactions_behind – backup_writer_hostgroup – offline_hostgroup

Slide 51

Slide 51 text

Failover in sync & async replication environments without losing transactions

Slide 52

Slide 52 text

Planned Maintenance © ProxySQL 2013-2019. All rights reserved. Lets expand to operational activities involving failover: - Graceful failover - Failover considerations - Best practices

Slide 53

Slide 53 text

Graceful Failover © ProxySQL 2013-2019. All rights reserved. As previously discussed: • ProxySQL will automatically re-route “writer” connections to the new Primary after a failover • The failover time is configurable based on the read only monitoring check • If the PRIMARY MySQL instance is lost while multiplexing is disabled or a transaction is active – the statement will fail since the full transaction should be retried.

Slide 54

Slide 54 text

Graceful Failover © ProxySQL 2013-2019. All rights reserved. Although this is acceptable in an “emergency situation” failovers can also be required for common maintenance activities including: • Scaling up hardware and resources • Operating system updates • Upgrading MySQL to a newer release • Executing certain blocking ALTERs e.g. converting a primary key from INT to BIGINT

Slide 55

Slide 55 text

Graceful Failover © ProxySQL 2013-2019. All rights reserved. ProxySQL facilitates “planned failovers” as well • In the “mysql_servers” table each server can have a status of ONLINE, SHUNNED, OFFLINE_SOFT, OFFLINE_HARD configured • OFFLINE_SOFT is a special status that allows all currently executing connections (statements and transactions) to complete while preventing any new connections to be made • New connections are either sent to another available server in the same destination_hostgroup or held until a server is available

Slide 56

Slide 56 text

The initial state of the healthy MySQL cluster Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. Replication Hostgroup RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)

Slide 57

Slide 57 text

Firstly remove the replication hostgroup to prevent automatic failover as a safety precaution Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)

Slide 58

Slide 58 text

Next set the status of the Primary to OFFLINE_SOFT to allow current statements and transactions to complete their execution Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=OFFLINE_SOFT Hostgroup 1: Replicas (read_only=1)

Slide 59

Slide 59 text

Once completed we’ll see the connection count on hostgroup=0 drop to zero Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=OFFLINE_SOFT Hostgroup 1: Replicas (read_only=1)

Slide 60

Slide 60 text

Now its time to perform the failover either manually or using a tool such as Orchestrator or MHA and promote a Replica to Primary Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1) Previous Primary (read_only=0) - status=OFFLINE_SOFT

Slide 61

Slide 61 text

Traffic will continue to be served from the existing nodes and maintenance activities can be performed Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1) Maintenance can now be performed on the previous Primary

Slide 62

Slide 62 text

Replication Hostgroup At this point the replication hostgroup can be reconfigured to allow for any unplanned failover that may occur Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)

Slide 63

Slide 63 text

Once maintenance is completed we can add the previous Primary to the Replica hostgroup=1 by dynamically updating the configuration Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. Replication Hostgroup RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)

Slide 64

Slide 64 text

Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. The commands to perform the previous example can be executed manually or scripted • Manually executing allows for more control while scripts are more precise and are likely to execute in a smaller time-frame • NOTE: The time between draining connections and promoting a replica must be shorter than “mysql-connect_timeout_server_max” else the new waiting connections will return an error – you can increase this before failover and revert afterwards as needed

Slide 65

Slide 65 text

Failover Considerations © ProxySQL 2013-2019. All rights reserved. • Ensure that the number of available nodes support the maximum throughput required by applications • Whenever possible use the same sized servers for Primary and Replicas which are candidates for promotion (at least one pair should be equal) • Prepare scripts to execute “planned failover” to ensure a smooth and repeatable process • When integrating with MHA and Orchestrator consider implementing ProxySQL planned failover logic in the available hooks

Slide 66

Slide 66 text

Failover Best Practices © ProxySQL 2013-2019. All rights reserved. A few failover best practices: • When building failover mechanisms for small environments it is still recommended to use a tool such as MySQL for MHA • For larger environments it is suggested to use Orchestrator, especially for environments when multiple data centers are involved (consider the RAFT implementation) or complex replication hierarchy (long chains of replication)

Slide 67

Slide 67 text

Failover Best Practices © ProxySQL 2013-2019. All rights reserved. • Carefully consider under what circumstances automatic failover should occur, if failover is too frequent it can lead to unneeded downtime • Manual failover is acceptable if there is a 24/7 on-call team available, in fact even preferred • Properly test failover time for both “local” and “cross-dc” failovers and calculate your ”uptime loss” per failover to create estimates of “number of failovers” that can be supported per year