Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL InnoDB ClusterSet - Disaster Recovery sol...

MySQL InnoDB ClusterSet - Disaster Recovery solution for InnoDB Cluster deployments

Disaster tolerance for MySQL InnoDB Cluster deployments

Olivier DASINI

January 20, 2022
Tweet

More Decks by Olivier DASINI

Other Decks in Technology

Transcript

  1. MySQL InnoDB ClusterSet Disaster Recovery solution for InnoDB Cluster deployments

    Olivier Dasini MySQL Cloud Principal Solutions Architect EMEA [email protected] Blogs : www.dasini.net/blog/en : www.dasini.net/blog/fr Linkedin: www.linkedin.com/in/olivier-dasini Twitter : @freshdaz
  2. Copyright © 2022, Oracle and/or its affiliates. All rights reserved.

    2 Me, Myself & I  MySQL Geek  Addicted to MySQL for 15+ years  Playing with databases for 20+ years  MySQL Writer, Blogger and Speaker  Also: DBA, Consultant, Architect, Trainer, ...  MySQL Cloud Principal Solutions Architect EMEA at Oracle  Stay up to date!  Blog: www.dasini.net/blog/en  Linkedin: www.linkedin.com/in/olivier-dasini/  Twitter: @freshdaz Olivier DASINI
  3. IT Disasters & Outages: Primary Causes Copyright © 2022, Oracle

    and/or its affiliates. All rights reserved. 3 On-site power failure is the biggest cause of significant outages
  4. IT Disasters & Outages: Costs are Rising Copyright © 2022,

    Oracle and/or its affiliates. All rights reserved. Over half had experienced an outage costing more than $100,000 4
  5. IT Disasters and Outages: Examples Copyright © 2022, Oracle and/or

    its affiliates. All rights reserved. 5-Hour computer outage cost $150 million. The airline eventually canceled about 1,000 flights on the day of the outage and grounded an additional 1,000 flights over the following days. Millions of websites offline after fire at French cloud services firm. The fire is expected to cost the company more than €105 million. Tens of thousands of passengers were stranded in cities around the world due to cancellation of about 130 flights and the delay of 200. Millions of bank customers were unable to access online accounts. The bank took almost 2 days to recover and get back to normal functioning. 5
  6. Copyright © 2022, Oracle and/or its affiliates. All rights reserved.

    6 MySQL Database Architectures 6 Past, Present & Future
  7. Concepts – RTO & RPO • RTO: Recovery Time Objective

    • How long does it take to recover from a single failure • RPO: Recovery Point Objective • How much data can be lost when a failure occurs Business Requirements Copyright © 2022, Oracle and/or its affiliates. All rights reserved. Types of Failure • High Availability: Single Server Failure, Network Partition • Disaster Recovery: Full Region/Network Failure • Human Error: Little Bobby Tables 7
  8. • Setting up Replication topology was usually done manually, taking

    many steps • Including user management, restoring backups, configuring replication... • MySQL only offered the technical pieces, leaving it up to the user to setup an (always customized) architecture • Even required other software ... bringing lot's of work for DBA's and experts, who spent their time automating and integrating their customized architecture MySQL Database Architectures – Past: Manual Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 8
  9. 2020 MySQL InnoDB ReplicaSet • Easy to use! • 'Classic',

    'asynchronous' Replication based Solution, fully integrated • MySQL Shell to provide a powerful interface that helps in automating and integrating all components • InnoDB CLONE to automatically provision members, fully integrated in InnoDB • MySQL Router • MySQL Server MySQL Database Architectures – Present: Solutions! RPO != 0 RTO = minutes (manual failover) Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 9
  10. 2016 MySQL InnoDB Cluster • Easy to use! • MySQL

    Group Replication: Automatic membership changes, network partition handling, consistency... • MySQL Shell to provide a powerful interface that helps in automating and integrating all components • InnoDB CLONE to automatically provision members, fully integrated in InnoDB • MySQL Router • MySQL Server MySQL Database Architectures – Present: Solutions! RPO = 0 RTO = seconds (automatic failover) Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 10
  11. Copyright © 2022, Oracle and/or its affiliates. All rights reserved.

    11 MySQL InnoDB ClusterSet 11 Disaster tolerance for InnoDB Cluster deployments
  12. One or more REPLICA MySQL InnoDB Clusters attached to a

    PRIMARY MySQL InnoDB Cluster High Availability (Failure within a Region) • RPO=0 • RTO=seconds (automatic failover) Disaster Recovery (Region Failure) • RPO !=0 • RTO = mintues or more (manual failover) • No write performance impact Features • Easy to use! • Familiar interface and usability mysqlsh, CLONE,… • Add/remove nodes/clusters online • Router integration, no need to reconfigure application if the topology changes MySQL InnoDB ClusterSet Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 12
  13. MySQL InnoDB ClusterSet – 3 Data Centers Copyright © 2022,

    Oracle and/or its affiliates. All rights reserved. 13
  14. MySQL InnoDB ClusterSet – Not every Cluster has to be

    3 nodes Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 14
  15. Copyright © 2022, Oracle and/or its affiliates. All rights reserved.

    15 MySQL InnoDB ClusterSet Configuration Commands
  16. Environment, 3 regions, 3 mysql databases each, listening on different

    ports: • BRU: – localhost:3331 – localhost:3332 – localhost:3333 • LIS: – localhost:4441 – localhost:4442 – localhost:4443 • ROM: – localhost:5551 – localhost:5552 – localhost:5553 ClusterSet Environment Example Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 16
  17. Start with setting up a regular MySQL InnoDB Cluster: $

    mysqlsh root@localhost:3331 JS> \sql create schema sbtest; JS> bru = dba.createCluster("BRU") JS> bru.addInstance('localhost:3332') JS> bru.addInstance('localhost:3333') JS> bru.status() Create MySQL InnoDB Cluster Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 17
  18. JS> clusterset = bru.createClusterSet('clusterset') A new ClusterSet will be created

    based on the Cluster 'BRU'. * Validating Cluster 'BRU' for ClusterSet compliance. * Creating InnoDB ClusterSet 'clusterset' on 'BRU'... * Updating metadata... ClusterSet successfully created. Use ClusterSet.createReplicaCluster() to add Replica Clusters to it. <ClusterSet:clusterset> Create ClusterSet Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 18
  19. JS> clusterset.status() { "clusters": { "BRU": { "clusterRole": "PRIMARY", "globalStatus":

    "OK", "primary": "127.0.0.1:3331" } }, "domainName": "clusterset", "globalPrimaryInstance": "127.0.0.1:3331", "primaryCluster": "BRU", "status": "HEALTHY", "statusText": "All Clusters available." } Check ClusterSet Status Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 19
  20. • Supports incremental recovery (binlog) & full recovery (CLONE) JS>

    lis = clusterset.createReplicaCluster('localhost:4441', 'LIS') JS> lis.addInstance('localhost:4442') JS> lis.addInstance('localhost:4443') JS> lis.status() Add Replica Cluster Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 20
  21. JS> clusterset.status() { "clusters": { "BRU": { "clusterRole": "PRIMARY", "globalStatus":

    "OK", "primary": "127.0.0.1:3331" }, "LIS": { "clusterRole": "REPLICA", "clusterSetReplicationStatus": "OK", "globalStatus": "OK" } }, "domainName": "clusterset", "globalPrimaryInstance": "127.0.0.1:3331", "primaryCluster": "BRU", "status": "HEALTHY", "statusText": "All Clusters available." } Check ClusterSet Status Copyright © 2022, Oracle and/or its affiliates. All rights reserved. Or, to get everything in one command: JS> clusterset.status({extended:1}) 21
  22. JS> rom = clusterset.createReplicaCluster( 'localhost:5551', 'ROM') JS> rom.addInstance('localhost:5552') JS> rom.addInstance('localhost:5553’)

    JS> rom.status() JS> clusterset.status() Add second Replica Cluster Copyright © 2022, Oracle and/or its affiliates. All rights reserved. { "clusters": { "ROM": { "clusterRole": "REPLICA", "clusterSetReplicationStatus": "OK", "globalStatus": "OK" }, "BRU": { "clusterRole": "PRIMARY", "globalStatus": "OK", "primary": "127.0.0.1:3331" }, "LIS": { "clusterRole": "REPLICA", "clusterSetReplicationStatus": "OK", "globalStatus": "OK" } }, "domainName": "clusterset", "globalPrimaryInstance": "127.0.0.1:3331", "primaryCluster": "BRU", "status": "HEALTHY", "statusText": "All Clusters available." } 22
  23. Copyright © 2022, Oracle and/or its affiliates. All rights reserved.

    23 MySQL InnoDB ClusterSet Router Integration
  24. Configure your application to connect to a local MySQL Router

    to connect to the ClusterSet Router Integration Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 24
  25. Router target modes: • Follow the PRIMARY cluster – Writes

    & Reads go to the PRIMARY Cluster • Connect to the configured target cluster – When target cluster is PRIMARY • Write port opens – When target cluster is not PRIMARY: • Only read traffic is open • Writes are blocked Features: • Configuration can be changed online using mysqlsh • Configurable per Router instance • Deploy 2 types of routers: – Target PRIMARY to send writes to the primary node – Define target cluster to keep read traffic local • INVALIDATED clusters can still be used for read traffic (configurable) Router Integration Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 25
  26. Router Integration – 3 Data Centers Copyright © 2022, Oracle

    and/or its affiliates. All rights reserved. 26
  27. Copyright © 2022, Oracle and/or its affiliates. All rights reserved.

    27 MySQL InnoDB ClusterSet Router Integration Commands
  28. $ mysqlrouter --bootstrap [email protected]:3331 --account=myRouter1 --name='<router name>' --force • [email protected]:3331

    is the URI-like connection string for any member server instance that is online in the InnoDB ClusterSet deployment • myRouter1 is the user name for a MySQL Router administrator account that was set up using the cluster.setupRouterAccount() command on the primary cluster • --name can be used to assign a non-default name to the MySQL Router instance, to make it easily identifiable in the output from InnoDB ClusterSet status commands • --force is required if you are bootstrapping MySQL Router again for an existing InnoDB Cluster where it was previously bootstrapped Integrating MySQL Router With InnoDB ClusterSet - Bootstrap Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 28 https://dev.mysql.com/doc/mysql-shell/8.0/en/innodb-clusterset-router.html
  29. JS> clusterset.listRouters() … JS> clusterset.routingOptions() { "domainName": "clusterset", "global": {

    "invalidated_cluster_policy": "drop_all", "target_cluster": "primary" }, "routers": { "<router name>": {} } } Integrating MySQL Router With InnoDB ClusterSet - Check Options Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 29 https://dev.mysql.com/doc/mysql-shell/8.0/en/innodb-clusterset-router.html
  30. Set the global routing policy for the target_cluster: JS> clusterset.setRoutingOption('target_cluster',

    'primary') Set a routing policy for an instance: JS> clusterset.setRoutingOption('<router name>', 'target_cluster', 'BRU') JS> clusterset.setRoutingOption('<router name>', 'target_cluster', 'primary') Change the invalidated_cluster_policy: When the target_cluster cluster is invalidated, should it still accept reads, knowing that they will be stale reads or should all traffic be dropped? JS> clusterset.setRoutingOption('<router name>', 'invalidated_cluster_policy', 'accept_ro') JS> clusterset.setRoutingOption('<router name>', 'invalidated_cluster_policy', 'drop_all') target_cluster is also configurable with mysqlrouter --bootstrap using --conf-target-cluster or –-conf-target-cluster-by-name Clear a routing policy for an instance: JS> clusterset.setRoutingOption('<router name>', 'target_cluster', null) Integrating MySQL Router With InnoDB ClusterSet - Configuration Options Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 30 https://dev.mysql.com/doc/mysql-shell/8.0/en/innodb-clusterset-router.html
  31. Copyright © 2022, Oracle and/or its affiliates. All rights reserved.

    31 MySQL InnoDB ClusterSet Managing Commands
  32. mysqlsh> bru.setPrimaryInstance('localhost:3332') Setting instance 'localhost:3332' as the primary instance of

    cluster 'BRU'... Instance '127.0.0.1:3331' was switched from PRIMARY to SECONDARY. Instance '127.0.0.1:3332' was switched from SECONDARY to PRIMARY. Instance '127.0.0.1:3333' remains SECONDARY. WARNING: The cluster internal session is not the primary member anymore. For cluster management operations please obtain a fresh cluster handle using dba.getCluster(). The instance 'localhost:3332' was successfully elected as primary. Change PRIMARY member in PRIMARY cluster Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 32
  33. mysqlsh> lis.setPrimaryInstance('localhost:4442') Setting instance 'localhost:4442' as the primary instance of

    cluster 'LIS'... Instance '127.0.0.1:4442' was switched from SECONDARY to PRIMARY. Instance '127.0.0.1:4443' remains SECONDARY. Instance '127.0.0.1:4441' was switched from PRIMARY to SECONDARY. WARNING: The cluster internal session is not the primary member anymore. For cluster management operations please obtain a fresh cluster handle using dba.getCluster(). The instance 'localhost:4442' was successfully elected as primary. Change PRIMARY member in REPLICA cluster Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 33
  34. mysqlsh> clusterset.setPrimaryCluster('LIS') Switching the primary cluster of the clusterset to

    'LIS' - Verifying clusterset status -- Checking cluster BRU - Cluster 'BRU' is available -- Checking cluster ROM - Cluster 'ROM' is available -- Checking cluster LIS - Cluster 'LIS' is available - Refreshing replication account of demoted cluster - Synchronizing transaction backlog at 127.0.0.1:4442 - Updating metadata - Updating topology -- Changing replication source of 127.0.0.1:3331 to 127.0.0.1:4442 -- Changing replication source of 127.0.0.1:3333 to 127.0.0.1:4442 -- Changing replication source of 127.0.0.1:3332 to 127.0.0.1:4442 - Acquiring locks in replicaset instances -- Pre-synchronizing SECONDARIES -- Acquiring global lock at PRIMARY & SECONDARIES - Synchronizing remaining transactions at promoted primary - Updating replica clusters -- Changing replication source of 127.0.0.1:5552 to 127.0.0.1:4442 -- Changing replication source of 127.0.0.1:5553 to 127.0.0.1:4442 -- Changing replication source of 127.0.0.1:5551 to 127.0.0.1:4442 Cluster 'LIS' was promoted to PRIMARY of the clusterset. The PRIMARY instance is '127.0.0.1:4442' Switchover - Changing PRIMARY Cluster - setPrimaryCluster() Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 34
  35. mysqlsh> \c root@localhost:3331 mysqlsh> clusterset=dba.getClusterSet() mysqlsh> clusterset.forcePrimaryCluster('BRU') Failing-over primary cluster

    of the clusterset to 'BRU' - Verifying primary cluster status None of the instances of the PRIMARY cluster 'LIS' could be reached. - Verifying clusterset status -- Checking cluster BRU Cluster 'BRU' is available -- Checking cluster ROM Cluster 'ROM' is available -- Checking whether target cluster has the most recent GTID set - Promoting cluster 'BRU' - Updating metadata -- Changing replication source of 127.0.0.1:5552 to 127.0.0.1:3331 -- Changing replication source of 127.0.0.1:5553 to 127.0.0.1:3331 -- Changing replication source of 127.0.0.1:5551 to 127.0.0.1:3331 PRIMARY cluster failed-over to 'BRU'. The PRIMARY instance is '127.0.0.1:3331' Former PRIMARY cluster was INVALIDATED, transactions that were not yet replicated may be lost. Failover to another Cluster Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 35
  36. mysqlsh> clusterset.removeCluster('LIS') Removing a Cluster from the ClusterSet Copyright ©

    2022, Oracle and/or its affiliates. All rights reserved. 36
  37. PRIMARY Cluster PRIMARY member Crash/Partition - Automatic Copyright © 2022,

    Oracle and/or its affiliates. All rights reserved. • When there is newly elected PRIMARY member in a cluster • Works on failures in PRIMARY and REPLICA clusters Automatic Handling of InnoDB Cluster state changes • Asynchronous replication is automatically reconfigured after primary change 38
  38. REPLICA Cluster PRIMARY member Crash/Partition - Automatic Copyright © 2022,

    Oracle and/or its affiliates. All rights reserved. • When there is newly elected PRIMARY member in a cluster • Works on failures in PRIMARY and REPLICA clusters Automatic Handling of InnoDB Cluster state changes • Asynchronous replication is automatically reconfigured after primary change 39
  39. Changing Primary - Change Primary Cluster on Healthy System Copyright

    © 2022, Oracle and/or its affiliates. All rights reserved. Switchover • One command that does it all: setPrimaryCluster() • Asynchronous replication channels between clusters are automatically reconfigured • Consistency guaranteed • All routers will immediately redirect if needed (depending on target mode) 40
  40. Changing Primary - Change Primary Cluster on Healthy System Copyright

    © 2022, Oracle and/or its affiliates. All rights reserved. 41 Switchover • One command that does it all: setPrimaryCluster() • Asynchronous replication channels between clusters are automatically reconfigured • Consistency guaranteed • All routers will immediately redirect if needed (depending on target mode)
  41. Datacenter Crash/Partition Copyright © 2022, Oracle and/or its affiliates. All

    rights reserved. 42 Failover to another Cluster • One command to invalidate the PRIMARY cluster and promote a new PRIMARY cluster: forcePrimaryCluster() • Other REPLICA clusters replication will be recon Split Brain Warning • Local Routers that cannot connect to other clusters will not learn about new topology • If datacenter is network partitioned, it will continue to operate as PRIMARY
  42. Datacenter Crash/Partition - forcePrimaryCluster() Copyright © 2022, Oracle and/or its

    affiliates. All rights reserved. Failover to another Cluster • One command to invalidate the PRIMARY cluster and promote a new PRIMARY cluster: forcePrimaryCluster() • Other REPLICA clusters replication will be recon Split Brain Warning • Local Routers that cannot connect to other clusters will not learn about new topology • If datacenter is network partitioned, it will continue to operate as PRIMARY 43
  43. Datacenter Crash/Partition – Router Integration Copyright © 2022, Oracle and/or

    its affiliates. All rights reserved. Router Integration • Routers will learn about new topology and redirect traffic • Routers that come back, will learn about new topology and abandon the old PRIMARY Cluster (e.g failed DC comes back online) 44
  44. Group Replication Crash/Partition Copyright © 2022, Oracle and/or its affiliates.

    All rights reserved. 45 Router Integration When GR is offline: • Network partition • No quorum • Full cluster lost (e.g. power outage) Failover to another Cluster • One command to invalidate the PRIMARY cluster and promote a new PRIMARY cluster: forcePrimaryCluster() • Router instances will follow PRIMARY (depending on target mode)
  45. Group Replication Crash/Partition - forcePrimaryCluster() & Router Copyright © 2022,

    Oracle and/or its affiliates. All rights reserved. Router Integration When GR is offline: • Network partition • No quorum • Full cluster lost (e.g. power outage) Failover to another Cluster • One command to invalidate the PRIMARY cluster and promote a new PRIMARY cluster: forcePrimaryCluster() • Router instances will follow PRIMARY (depending on target mode) 46
  46. Copyright © 2022, Oracle and/or its affiliates. All rights reserved.

    47 Business Requirements What is the right solution for you?
  47. Concepts – RTO & RPO • RTO: Recovery Time Objective

    • How long does it take to recover from a single failure • RPO: Recovery Point Objective • How much data can be lost when a failure occurs Business Requirements Copyright © 2022, Oracle and/or its affiliates. All rights reserved. Types of Failure • High Availability: Single Server Failure, Network Partition • Disaster Recovery: Full Region/Network Failure • Human Error: Little Bobby Tables 48
  48. MySQL InnoDB Cluster • RPO = 0 • RTO =

    Seconds High Availability – Single Region Copyright © 2022, Oracle and/or its affiliates. All rights reserved. MySQL InnoDB ReplicaSet • RPO ! = 0 • RTO = Minutes + (manual failover) 🙂 Best write performance 😕 Manual Failover 49
  49. MySQL InnoDB Cluster • RPO = 0 • RTO =

    Seconds Disaster Recovery – Cross Data Center Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 🙂 Multi-Region Multi-Primary 😕 3 DC 😕 Requires very stable WAN 😕 Write performance affected by latency between DCs 50
  50. MySQL InnoDB Cluster ClusterSet • RPO != 0 • RTO

    = Minutes + (manual failover) Disaster Recovery - Multi Region Copyright © 2022, Oracle and/or its affiliates. All rights reserved. 😕 Write performance (no sync to other region required) 😕 Higher RTO: Manual failover 😕 RPO != 0 when region fails MySQL InnoDB ClusterSet 51 🙂 RPO = 0 & RTO = seconds within Region (HA)
  51. Copyright © 2022, Oracle and/or its affiliates. All rights reserved.

    52  MySQL InnoDB ClusterSet Documentation https://dev.mysql.com/doc/mysql-shell/8.0/en/innodb-clusterset.html  Tutoriel – Déployer MySQL 8.0 InnoDB Cluster http://dasini.net/blog/2019/09/03/tutoriel-deployer-mysql-8-0-innodb-cluster-09-2019/  MySQL InnoDB Cluster – Easy Recovering and provisioning http://dasini.net/blog/2019/09/10/mysql-innodb-cluster-easy-recovering-and-provisioning/  MySQL InnoDB Cluster – Recovering and provisioning with MySQL Enterprise Backup http://dasini.net/blog/2019/07/11/mysql-innodb-cluster-recovering-and-provisioning-with-mysql-enterprise-backup/  MySQL InnoDB Cluster Documentation https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-cluster-userguide.html  MySQL Group Replication Documentation https://dev.mysql.com/doc/refman/8.0/en/group-replication.html  MySQL InnoDB ReplicaSet https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-innodb-replicaset.html  MySQL Router HA: Pacemaker https://lefred.be/content/mysql-router-ha-with-pacemaker/  MySQL Router HA: Keepalived https://lefred.be/content/mysql-router-ha-with-keepalived/  MySQL Router HA: DNS SRV https://www.slideshare.net/Grypyrg/mysql-connectors-8019-dns-srv  MySQL Shell https://dev.mysql.com/doc/mysql-shell/8.0/en/ Resources – ClusterSet | Cluster | ReplicaSet | Router | Shell High Availability & Disaster Recovery solutions for MySQL
  52. Copyright © 2022, Oracle and/or its affiliates. All rights reserved.

    54 Merci! Q&R Olivier Dasini MySQL Cloud Principal Solutions Architect EMEA [email protected] Blogs : www.dasini.net/blog/en : www.dasini.net/blog/fr Linkedin: www.linkedin.com/in/olivier-dasini Twitter : @freshdaz