Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ProxySQL High Availaibility 2019 - Rene Cannao ...

ProxySQL High Availaibility 2019 - Rene Cannao (ProxySQL)

ProxySQL HA features explained by ProxySQL author and CEO Rene Cannao at multiple events in 2019.

ProxySQL LLC

January 21, 2019
Tweet

More Decks by ProxySQL LLC

Other Decks in Technology

Transcript

  1. ProxySQL LLC We provide services to help build, support and

    improve the performance & reliability of your Cloud-Based and On-Premise MySQL infrastructure. © ProxySQL 2013-2019. All rights reserved.
  2. ProxySQL LLC • ProxySQL Development • ProxySQL Support Services •

    ProxySQL, MySQL, DevOps & Outsourcing • Consulting Services © ProxySQL 2013-2019. All rights reserved.
  3. What is ProxySQL? MySQL protocol aware data gateway – Clients

    connect to ProxySQL – Requests are evaluated – Various actions are performed © ProxySQL 2013-2019. All rights reserved.
  4. • High Availability and infinite scalability • Seamless planned and

    unplanned failover • MySQL Session Load balancing • Launching multiple instances on same port • Binding services on different ports ProxySQL features © ProxySQL 2013-2019. All rights reserved.
  5. • Connection pooling and multiplexing • Read caching outside of

    the database server • Complex query routing and read/write split • Query throttling, firewalling and mirroring • On-the-fly query rewrite • Advanced data masking ProxySQL features © ProxySQL 2013-2019. All rights reserved.
  6. • Dynamic runtime reconfiguration • Real time statistics and monitoring

    • Scheduling of external scripts • Causal reads by tracking GTIDs across pools of backend servers • Native ProxySQL Clustering ProxySQL features © ProxySQL 2013-2019. All rights reserved.
  7. • SSL support for frontend & SSL v1.2 • Native

    support Galera / PXC and Group Replication • Integration with Amazon RDS and Aurora • MySQL, MariaDB, Percona and ClickHouse backends • Supports millions of users and tens of thousands of database servers ProxySQL features © ProxySQL 2013-2019. All rights reserved.
  8. Production Deployment © ProxySQL 2013-2019. All rights reserved. The two

    main approaches for deploying ProxySQL are either: - Deploy ProxySQL on your application servers - Deploy ProxySQL in a separate layer / standalone server Each approach has its own advantages and disadvantages
  9. App Server Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL

    is deployed locally • No network overhead • No single point of failure • Isolated configuration • Rolling upgrades 2
  10. App Server Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL

    is deployed locally • No network overhead • No single point of failure • Isolated configuration • Rolling upgrades 2 • DB monitoring overhead • More backend connections • Configuration effort • Query cache isolated Configuration management QC1 QC2 QC3
  11. ProxySQL Layer Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL

    is deployed on a standalone server • DB monitoring overhead • Less backend connections • Configuration effort • Shared Query Cache Configuration management Global QC
  12. ProxySQL Layer Deployment © ProxySQL 2013-2019. All rights reserved. ProxySQL

    is deployed on a standalone server • DB monitoring overhead • Less backend connections • Configuration effort • Shared Query Cache Configuration management Global QC • Additional Round Trip Time • Single point of failure • Shared configuration • No rolling upgrades 2
  13. HA for ProxySQL © ProxySQL 2013-2019. All rights reserved. ProxySQL

    is the ”first contact” for applications connecting to a database services. • ProxySQL Service is CRITICAL – Deployment decisions affect uptime – Architectural considerations affect scalability – Additional layers introduce possible points of failure
  14. HA for ProxySQL © ProxySQL 2013-2019. All rights reserved. Common

    high availability architectures for ProxySQL include: • Keepalived (IP failover / TCP load balancer) • DNS / Consul DNS (other service discovery tools) • ProxySQL Cascading
  15. Keepalived © ProxySQL 2013-2019. All rights reserved. VRRP / LB

    VIP 10.10.10.1 • HA provided by Virtual Router Redundancy Protocol or load balancer
  16. Keepalived © ProxySQL 2013-2019. All rights reserved. VIP 10.10.10.1 •

    HA provided by Virtual Router Redundancy Protocol • Service check fails VRRP / LB
  17. Keepalived © ProxySQL 2013-2019. All rights reserved. VIP 10.10.10.1 •

    HA provided by Virtual Router Redundancy Protocol • Service check fails • Keepalived switches the VIP to a standby ProxySQL instance VRRP / LB
  18. Keepalived © ProxySQL 2013-2019. All rights reserved. VIP 10.10.10.1 •

    HA provided by Virtual Router Redundancy Protocol • Service check fails • Keepalived switches the VIP to a standby ProxySQL instance • All client connections are LOST and re-established by the application VRRP / LB
  19. DNS / Consul DNS © ProxySQL 2013-2019. All rights reserved.

    • HA provided Consul agents running on each ProxySQL instance
  20. DNS / Consul DNS © ProxySQL 2013-2019. All rights reserved.

    • HA provided Consul agents running on each ProxySQL instance • Application servers access the service via DNS
  21. DNS / Consul DNS © ProxySQL 2013-2019. All rights reserved.

    • HA provided Consul agents running on each ProxySQL instance • Application servers access the service via DNS • Provides load balancing as well as HA
  22. DNS / Consul DNS © ProxySQL 2013-2019. All rights reserved.

    • HA provided Consul agents running on each ProxySQL instance • Application servers access the service via DNS • Provides load balancing as well as HA
  23. DNS / Consul DNS © ProxySQL 2013-2019. All rights reserved.

    • HA provided Consul agents running on each ProxySQL instance • Application servers access the service via DNS • Provides load balancing as well as HA • All client connections are LOST and re-established by the application
  24. ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. • ProxySQL

    is deployed on each application server and in a ProxySQL layer • Applications connect to the local ProxySQL server • Provides load balancing as well as HA ProxySQL
  25. ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. • ProxySQL

    is deployed on each application server and in a ProxySQL layer • Applications connect to the local ProxySQL server • Provides load balancing as well as HA ProxySQL
  26. ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. • ProxySQL

    is deployed on each application server and in a ProxySQL layer • Applications connect to the local ProxySQL server • Provides load balancing as well as HA • Open connections are held and retried on an available backend – no connections are lost ProxySQL
  27. ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. • ProxySQL

    is deployed on each application server and in a ProxySQL layer • Applications connect to the local ProxySQL server • Provides load balancing as well as HA • Open connections are held and retried on an available backend – no connections lost • If the PRIMARY MySQL instance is lost while multiplexing is disabled or a transaction is active – the statement will fail (this applies to the ProxySQL layer as well) ProxySQL
  28. ProxySQL Cascading © ProxySQL 2013-2019. All rights reserved. ProxySQL can

    be used to provide HA to itself as it communicates using the MySQL Protocol: • Application layer aware • Provides connection retry • Allows for scale up / scale down without connection loss (during planned maintenance) • Transactions can be lost during edge case unplanned failover
  29. Before looking at High Availability we need to first look

    into how ProxySQL monitors availability of MySQL backends. • ProxySQL is designed to perform various checks against backend servers • The checks target various layers of communication • Checks are built with dependencies, if a lower level check fails there is no need for a higher level check Monitoring Availability © ProxySQL 2013-2019. All rights reserved.
  30. The Monitor Module is responsible for checking backends and ensuring

    that traffic is only sent to operational servers. Monitoring is performed at the following levels: • Connect – tests new connections to backend servers (MySQL connect) • Ping – pings backend servers (MySQL ping) • Replication Lag – checks Seconds_Behind_Master status • Read only – checks read_only status The Monitor Module © ProxySQL 2013-2019. All rights reserved.
  31. Backend server states © ProxySQL 2013-2019. All rights reserved. The

    monitor module will determine the status of a backend server in the “runtime_mysql_servers” table (these states can also be set manually in “mysql_servers”): • ONLINE - backend server is fully operational (manually set / automatically set by monitor) • SHUNNED - backend sever is temporarily taken out of use because of either too many connection errors in a time that was too short, or replication lag exceeded the allowed threshold (automatically set by monitor)
  32. Backend server states © ProxySQL 2013-2019. All rights reserved. •

    OFFLINE_SOFT - when a server is put into OFFLINE_SOFT mode, new incoming connections aren't accepted anymore, while the existing connections are kept until they became inactive. In other words, connections are kept in use until the current transaction is completed. This allows to gracefully detach a backend (manually set). • OFFLINE_HARD - when a server is put into OFFLINE_HARD mode, the existing connections are dropped and no new incoming connections are accepted. This is equivalent to deleting the server from a hostgroup (indefinitely or for maintenance) and is done by ProxySQL on failover (manually / automatically set by monitor)
  33. Async Replication Failover © ProxySQL 2013-2019. All rights reserved. The

    most common topology for MySQL deployments is traditional primary / replica async (or semi-sync): RW Primary for writes RO Replicas for reads
  34. Since there is no native support for failover in MySQL

    external mechanisms are used to promote / demote servers: MHA for MySQL Replication Managers © ProxySQL 2013-2019. All rights reserved. RW Primary for writes RO Replicas for reads
  35. During a failover these tools always perform “roughly” the same

    actions, specifically: Failover Process © ProxySQL 2013-2019. All rights reserved. RW RO
  36. During a failover these tools always perform “roughly” the same

    actions, specifically: Failover Process © ProxySQL 2013-2019. All rights reserved. Switch Primary node to read-only RO RO
  37. During a failover these tools always perform “roughly” the same

    actions, specifically: Failover Process © ProxySQL 2013-2019. All rights reserved. Elect a Replica to become the new Primary and connect Replicas RO Replicas for reads RO RO RO
  38. During a failover these tools always perform “roughly” the same

    actions, specifically: Failover Process © ProxySQL 2013-2019. All rights reserved. Optionally set previous Primary as a Replica of the new Primary RO Replicas for reads RO RO
  39. During a failover these tools always perform “roughly” the same

    actions, specifically: Failover Process © ProxySQL 2013-2019. All rights reserved. RW RO Set the new Primary to read-write
  40. ProxySQL transparently detects and adapts to these topology changes via

    the monitor modules read-only check Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RW RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)
  41. When the status of a Primary server changes to read-only

    ProxySQL will remove the server from the writer hostgroup (OFFLINE_HARD) Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RO RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)
  42. Momentarily there are no RW nodes available, ProxySQL holds on

    to the connections for mysql-connect_timeout_server_max milliseconds Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)
  43. Once a RW is detected it is added to the

    writer_hostgroup (ONLINE) and connections are routed to the new Primary server Replication Hostgroup ProxySQL Failover Detection © ProxySQL 2013-2019. All rights reserved. ProxySQL RO Checks: - select @@read_only RW RO Hostgroup 0: Primary (read_only=0) Hostgroup 1: Replicas (read_only=1)
  44. Extended Replication Support © ProxySQL 2013-2019. All rights reserved. ProxySQL

    was initially built to support regular “async” / “semi-sync” replication topologies (traditional primary / replica), later support for additional types of replication was introduced: • Support for custom replication topologies was available via external scheduler scripts since ProxySQL version 1.x • ProxySQL 1.3.x introduced support for Oracle Group Replication • Native support for Oracle Group Replication was added in ProxySQL 1.4.x • Native support for Galera Replication was added in ProxySQL 2.x • Native support for Amazon Aurora and RDS was added in ProxySQL 2.x
  45. What is Group Replication? © ProxySQL 2013-2019. All rights reserved.

    • Multi-master / Active-Active Clustered MySQL Solution – Single master by default (group_replication_single_primary_mode) • Synchronous Replication (certification based) • Multi-threaded Replication • InnoDB Compliant • Suitable for LAN and low latency networks • State transfer is based on GTID matching
  46. What is Galera Cluster? © ProxySQL 2013-2019. All rights reserved.

    • Multi-master / Active-Active Clustered MySQL Solution • Synchronous Replication (certification based) • Multi-threaded Replication • InnoDB Compliant • Suitable for LAN, WAN and Cloud Solutions • IST (incremental) & SST (full) for state transfer • Auto reconnect mechanism for rejected nodes
  47. ProxySQL 2.x MySQL Clustering © ProxySQL 2013-2019. All rights reserved.

    In ProxySQL 2.0 the concept of “mysql_replication_hostgroup” is extended to allow for multiple hostgroups. • In addition to “reader_hostgroup” and “writer_hostgroup” we also have the following additional concepts and hostgroup types: – max_writers – writer_is_also_reader – max_transactions_behind – backup_writer_hostgroup – offline_hostgroup
  48. Planned Maintenance © ProxySQL 2013-2019. All rights reserved. Lets expand

    to operational activities involving failover: - Graceful failover - Failover considerations - Best practices
  49. Graceful Failover © ProxySQL 2013-2019. All rights reserved. As previously

    discussed: • ProxySQL will automatically re-route “writer” connections to the new Primary after a failover • The failover time is configurable based on the read only monitoring check • If the PRIMARY MySQL instance is lost while multiplexing is disabled or a transaction is active – the statement will fail since the full transaction should be retried.
  50. Graceful Failover © ProxySQL 2013-2019. All rights reserved. Although this

    is acceptable in an “emergency situation” failovers can also be required for common maintenance activities including: • Scaling up hardware and resources • Operating system updates • Upgrading MySQL to a newer release • Executing certain blocking ALTERs e.g. converting a primary key from INT to BIGINT
  51. Graceful Failover © ProxySQL 2013-2019. All rights reserved. ProxySQL facilitates

    “planned failovers” as well • In the “mysql_servers” table each server can have a status of ONLINE, SHUNNED, OFFLINE_SOFT, OFFLINE_HARD configured • OFFLINE_SOFT is a special status that allows all currently executing connections (statements and transactions) to complete while preventing any new connections to be made • New connections are either sent to another available server in the same destination_hostgroup or held until a server is available
  52. The initial state of the healthy MySQL cluster Graceful Failover

    Example © ProxySQL 2013-2019. All rights reserved. Replication Hostgroup RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)
  53. Firstly remove the replication hostgroup to prevent automatic failover as

    a safety precaution Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)
  54. Next set the status of the Primary to OFFLINE_SOFT to

    allow current statements and transactions to complete their execution Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=OFFLINE_SOFT Hostgroup 1: Replicas (read_only=1)
  55. Once completed we’ll see the connection count on hostgroup=0 drop

    to zero Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=OFFLINE_SOFT Hostgroup 1: Replicas (read_only=1)
  56. Now its time to perform the failover either manually or

    using a tool such as Orchestrator or MHA and promote a Replica to Primary Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1) Previous Primary (read_only=0) - status=OFFLINE_SOFT
  57. Traffic will continue to be served from the existing nodes

    and maintenance activities can be performed Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1) Maintenance can now be performed on the previous Primary
  58. Replication Hostgroup At this point the replication hostgroup can be

    reconfigured to allow for any unplanned failover that may occur Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)
  59. Once maintenance is completed we can add the previous Primary

    to the Replica hostgroup=1 by dynamically updating the configuration Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. Replication Hostgroup RW RO Hostgroup 0: Primary (read_only=0) - status=ONLINE Hostgroup 1: Replicas (read_only=1)
  60. Graceful Failover Example © ProxySQL 2013-2019. All rights reserved. The

    commands to perform the previous example can be executed manually or scripted • Manually executing allows for more control while scripts are more precise and are likely to execute in a smaller time-frame • NOTE: The time between draining connections and promoting a replica must be shorter than “mysql-connect_timeout_server_max” else the new waiting connections will return an error – you can increase this before failover and revert afterwards as needed
  61. Failover Considerations © ProxySQL 2013-2019. All rights reserved. • Ensure

    that the number of available nodes support the maximum throughput required by applications • Whenever possible use the same sized servers for Primary and Replicas which are candidates for promotion (at least one pair should be equal) • Prepare scripts to execute “planned failover” to ensure a smooth and repeatable process • When integrating with MHA and Orchestrator consider implementing ProxySQL planned failover logic in the available hooks
  62. Failover Best Practices © ProxySQL 2013-2019. All rights reserved. A

    few failover best practices: • When building failover mechanisms for small environments it is still recommended to use a tool such as MySQL for MHA • For larger environments it is suggested to use Orchestrator, especially for environments when multiple data centers are involved (consider the RAFT implementation) or complex replication hierarchy (long chains of replication)
  63. Failover Best Practices © ProxySQL 2013-2019. All rights reserved. •

    Carefully consider under what circumstances automatic failover should occur, if failover is too frequent it can lead to unneeded downtime • Manual failover is acceptable if there is a 24/7 on-call team available, in fact even preferred • Properly test failover time for both “local” and “cross-dc” failovers and calculate your ”uptime loss” per failover to create estimates of “number of failovers” that can be supported per year